Considering the fact that all of Atlas is completely down for more than 24 hrs., I find the silence a bit deafening. No status updates, nothing. Weird. Not up to RIPE standards. Regards, Ernst J. Oud
Considering the fact that all of Atlas is completely down for more than 24 hrs., I find the silence a bit deafening. No status updates, nothing. Weird.
been down so long it looks like up to me [0] as you are probably too young for that reference, how about it looks pretty up from here. perhaps a more specific symptom might help with diagnosis. randy [0] https://en.wikipedia.org/wiki/Been_Down_So_Long_It_Looks_Like_Up_to_Me
Consumption delay according to the main page is up to 16+ hours so something is indeed very wrong. Regards, Peter Potvin | Executive Director ------------------------------------------------------------------------------ *Accuris Technologies Ltd.* On Tue, Sep 19, 2023 at 6:06 PM Randy Bush <randy@psg.com> wrote:
Considering the fact that all of Atlas is completely down for more than 24 hrs., I find the silence a bit deafening. No status updates, nothing. Weird.
been down so long it looks like up to me [0]
as you are probably too young for that reference, how about it looks pretty up from here. perhaps a more specific symptom might help with diagnosis.
randy
[0] https://en.wikipedia.org/wiki/Been_Down_So_Long_It_Looks_Like_Up_to_Me
-- ripe-atlas mailing list ripe-atlas@ripe.net https://lists.ripe.net/mailman/listinfo/ripe-atlas
Consumption delay according to the main page is up to 16+ hours so something is indeed very wrong.
aha! a symptom. thanks. indeed, an issue randy
I don’t think I fully understand what you are saying. Do you imply that Atlas works for you? I doubt it since all of it is down, see the status page at ripe.net, I guess worldwide. No results are processed, no tests are running, even Magellan is down. Or were you joking? My Dutch sense of humor might be different :-) Ernst
On 20 Sep 2023, at 00:06, Randy Bush <randy@psg.com> wrote:
Considering the fact that all of Atlas is completely down for more than 24 hrs., I find the silence a bit deafening. No status updates, nothing. Weird.
been down so long it looks like up to me [0]
as you are probably too young for that reference, how about it looks pretty up from here. perhaps a more specific symptom might help with diagnosis.
randy
[0] https://en.wikipedia.org/wiki/Been_Down_So_Long_It_Looks_Like_Up_to_Me
On 19 Sep 2023, at 17:04, Ernst J. Oud <ernstoud@gmail.com> wrote:
Considering the fact that all of Atlas is completely down for more than 24 hrs., I find the silence a bit deafening. No status updates, nothing. Weird.
The status update is Degraded Performance, for a non-critical service. https://atlas.ripe.net/ acknowledges there is a consumption delay, hardly silence. f
Yes, the status page shows a bit of info. But “degraded performance” does not cover the real situation since all of Atlas was or still is down. Measurements don’t report data, all built-ins don’t show data, probe tags are lost, Magellan - used for streaming - does not give any data etc. Yes, it is a non-critical service, but people like me do rely on data from my probes to monitor my network. It is a bit of give and take but it appears I need another service for this… Regards, Ernst J. Oud
On 20 Sep 2023, at 03:09, Fearghas Mckay <fearghas@gmail.com> wrote:
On 19 Sep 2023, at 17:04, Ernst J. Oud <ernstoud@gmail.com> wrote:
Considering the fact that all of Atlas is completely down for more than 24 hrs., I find the silence a bit deafening. No status updates, nothing. Weird.
The status update is Degraded Performance, for a non-critical service. https://atlas.ripe.net/ acknowledges there is a consumption delay, hardly silence.
f
On 2023-09-19 23:04, Ernst J. Oud wrote:
Considering the fact that all of Atlas is completely down for more than 24 hrs., I find the silence a bit deafening. No status updates, nothing. Weird.
Not up to RIPE standards.
Regards,
Ernst J. Oud
Good morning, I'm sad to report that indeed there's still an issue with result processing - which is still reflected on the status page. Specifically, the HBase backend that is responsible for storing and retrieving the new (and historic) results is struggling to store the data form the last ~24 hours. The teams have been working on solving this basically 24/7 since the issue occurred but haven't been successful yet. All else (continuing to run existing measurements, creating new ones, real-time streaming of the results, APIs, UI, ...) are running undisturbed. I hope this helps understanding the extent of the problem, and we'll of course let you know when there's progress. Regards, Robert
Robert et al, In contrast to your statement below, streaming of results from new or existing measurements using Magellan currently does *not* work… no results are obtained, see below. —- [32mLooking good! Measurement 60182213 was created and details about it can be found here: https://atlas.ripe.net/measurements/60182213/[0m [32mConnecting to stream...[0m [32mDisconnected from stream[0m —- Regards, Ernst J. Oud
On 20 Sep 2023, at 07:43, Robert Kisteleki <robert@ripe.net> wrote:
On 2023-09-19 23:04, Ernst J. Oud wrote: Considering the fact that all of Atlas is completely down for more than 24 hrs., I find the silence a bit deafening. No status updates, nothing. Weird. Not up to RIPE standards. Regards, Ernst J. Oud
Good morning,
I'm sad to report that indeed there's still an issue with result processing - which is still reflected on the status page.
Specifically, the HBase backend that is responsible for storing and retrieving the new (and historic) results is struggling to store the data form the last ~24 hours. The teams have been working on solving this basically 24/7 since the issue occurred but haven't been successful yet.
All else (continuing to run existing measurements, creating new ones, real-time streaming of the results, APIs, UI, ...) are running undisturbed.
I hope this helps understanding the extent of the problem, and we'll of course let you know when there's progress.
Regards, Robert
Hi, On 2023-09-20 11:47, Ernst J. Oud wrote:
Robert et al,
In contrast to your statement below, streaming of results from new or existing measurements using Magellan currently does *not* work… no results are obtained, see below.
—- [32mLooking good! Measurement 60182213 was created and details about it can be found here:
https://atlas.ripe.net/measurements/60182213/[0m
[32mConnecting to stream...[0m
[32mDisconnected from stream[0m
The answer is the same here as for Stephane - no probes were selected for the measurement, hence no answers are coming back on the streaming interface either. Regards, Robert
Robert, I checked my code that calls Magellan. It does not add the “system-ipv4-works“ tag to the request for probes in measurements. Does Magellan add that tag automatically? Regards, Ernst J. Oud
On 20 Sep 2023, at 12:22, Robert Kisteleki <robert@ripe.net> wrote:
Hi,
On 2023-09-20 11:47, Ernst J. Oud wrote: Robert et al, In contrast to your statement below, streaming of results from new or existing measurements using Magellan currently does *not* work… no results are obtained, see below. —- [32mLooking good! Measurement 60182213 was created and details about it can be found here: https://atlas.ripe.net/measurements/60182213/[0m [32mConnecting to stream...[0m [32mDisconnected from stream[0m
The answer is the same here as for Stephane - no probes were selected for the measurement, hence no answers are coming back on the streaming interface either.
Regards, Robert
On 20/09/2023 12:57, Ernst J. Oud wrote:
I checked my code that calls Magellan. It does not add the “system-ipv4-works“ tag to the request for probes in measurements.
Does Magellan add that tag automatically? Hi Ernst,
The system-ipv4-works (and system-ipv6-works) tags are specified in the configuration in the ripe-atlas-tools settings file. If you run: ripe-atlas configure --editor you can find a tags block with various default tag sets to use for different measurement types. *However*, as a workaround we have re-populated the system-ipv4-works and system-ipv6-works tags, so that the default behaviour of Magellan should be sane once again. This should allow scheduling of new measurements and streaming the results. Note: The probe sets for these tags are currently just copied from the system-ipv4-stable-1d and system-ipv6-stable-1d tags, respectively, which were not affected by this issue because they use a different data backend. The actual probe sets are therefore not exactly the same, but they are functionally very similar if your goal is "restrict the participating probes to ones that are likely to work".
Chris, Thanks for your input. That clarifies a lot! Regards, Ernst J. Oud
On 20 Sep 2023, at 13:22, Chris Amin <camin@ripe.net> wrote:
On 20/09/2023 12:57, Ernst J. Oud wrote:
I checked my code that calls Magellan. It does not add the “system-ipv4-works“ tag to the request for probes in measurements. Does Magellan add that tag automatically? Hi Ernst,
The system-ipv4-works (and system-ipv6-works) tags are specified in the configuration in the ripe-atlas-tools settings file. If you run:
ripe-atlas configure --editor
you can find a tags block with various default tag sets to use for different measurement types.
*However*,
as a workaround we have re-populated the system-ipv4-works and system-ipv6-works tags, so that the default behaviour of Magellan should be sane once again. This should allow scheduling of new measurements and streaming the results.
Note: The probe sets for these tags are currently just copied from the system-ipv4-stable-1d and system-ipv6-stable-1d tags, respectively, which were not affected by this issue because they use a different data backend. The actual probe sets are therefore not exactly the same, but they are functionally very similar if your goal is "restrict the participating probes to ones that are likely to work".
-- ripe-atlas mailing list ripe-atlas@ripe.net https://lists.ripe.net/mailman/listinfo/ripe-atlas
On Wed, Sep 20, 2023 at 07:43:20AM +0200, Robert Kisteleki <robert@ripe.net> wrote a message of 42 lines which said:
All else (continuing to run existing measurements, creating new ones, real-time streaming of the results, APIs, UI, ...) are running undisturbed.
This is not what I observe. For instance, asking for 100 probes in France yields a "No suitable probes" (see measurement #60181887). So, "completely down" seems a fair summary to me.
Stephane, Yes, I also don’t understand Robert’s remark. All user defined measurements, new or existing do not give any results. If Robert’s remark is based on some Atlas performance monitoring function then that function should/could be improved since it does not reflect reality. Regards, Ernst J. Oud
On 20 Sep 2023, at 11:56, Stephane Bortzmeyer <bortzmeyer@nic.fr> wrote:
On Wed, Sep 20, 2023 at 07:43:20AM +0200, Robert Kisteleki <robert@ripe.net> wrote a message of 42 lines which said:
All else (continuing to run existing measurements, creating new ones, real-time streaming of the results, APIs, UI, ...) are running undisturbed.
This is not what I observe. For instance, asking for 100 probes in France yields a "No suitable probes" (see measurement #60181887). So, "completely down" seems a fair summary to me.
On Wed, Sep 20, 2023 at 12:02:19PM +0200, Ernst J. Oud <ernstoud@gmail.com> wrote a message of 34 lines which said:
Yes, I also don’t understand Robert’s remark. All user defined measurements, new or existing do not give any results.
May be the situation is complicated and not fully understood yet. Good luck for the technical team, anyway.
Hi, On 2023-09-20 11:56, Stephane Bortzmeyer wrote:
On Wed, Sep 20, 2023 at 07:43:20AM +0200, Robert Kisteleki <robert@ripe.net> wrote a message of 42 lines which said:
All else (continuing to run existing measurements, creating new ones, real-time streaming of the results, APIs, UI, ...) are running undisturbed.
This is not what I observe. For instance, asking for 100 probes in France yields a "No suitable probes" (see measurement #60181887). So, "completely down" seems a fair summary to me.
In that measurement you're asking specifically for probes tagged with "system-ipv4-works" - which, as you can read in the related thread this morning, is a causality of the current issue we're facing. We have a fix for this particular slice of the problem that we can apply once the system has recovered. Regards, Robert
participants (7)
-
Chris Amin
-
Ernst J. Oud
-
Fearghas Mckay
-
Peter Potvin
-
Randy Bush
-
Robert Kisteleki
-
Stephane Bortzmeyer