Thanks Chris! Atlas now looks like it’s behaving the way it did before December 24 — stopping one-off measurements about five to ten minutes after they start — which suits my purposes nicely. As far as manual ‘deletes' go, it doesn’t look like my efforts to ‘delete' measurements as soon as I’ve been able to pick up results are working. There’s no need to fix this on my account — now that measurements are stopped automatically again, I’ll probably delete the attempted work-around from my code — but here are details in case they’re otherwise useful: The process www.globaltraceroute.com follows is this: - Create a measurement. Get a measurement ID. - Immediately begin asking for a result every five seconds until it gets one. - Display the result to the user. - New, as of yesterday, send a ‘delete’ in an attempt to stop the measurement. - Exit Yesterday, the measurements my code attempted to stop this way kept on running indefinitely, just as if it hadn’t sent a ‘delete’ request. However, if I waited five or ten minutes and ran the function that sends the delete, the measurement would stop. It was a little hard to tell that it was working because measurements would take several minutes to show up as stopped, but when they did the timestamp for the end of the measurement would match the time I ran the ‘delete’ function. Today, it’a again a little hard to tell what’s doing what. The measurements are all showing as stopped eventually. But if they had been stopped by the delete my script sent, based on what I saw yesterday I assume the stop timestamp would be within a minute or two after the start timestamp. Instead, the delete timestamp is five to ten minutes after the start timestamp, suggesting that they’re continuing to run until Atlas with your latest fix decides they’re finished. Measurement 23732704 is a random example of this — a measurement that was sent a ‘delete’ 50 seconds after creation (Dec 30 21:25:14 UTC), but didn’t stop until roughly five minutes later - 2019-12-30 21:30 per https://atlas.ripe.net/measurements/ . Alternatively, if you have a real time view into the Atlas API, you could go to www.globaltraceroute.com and create a measurement. It should then show up immediately in Atlas under the scg@gibbard.org username and go through the process outlined above. Thanks, Steve
On Dec 30, 2019, at 2:24 AM, Chris Amin <camin@ripe.net> wrote:
Hi Steve,
There was indeed a problem where measurements were not being automatically updated with a "stopped" status. This should now be fixed, but please let me know if you notice any lingering issues.
Can you confirm that the issue with manually DELETEing not having an effect still persists? If so, can you give me an example measurement ID?
Thanks, Chris Amin RIPE NCC
On 30/12/2019 06:53, Steve Gibbard wrote:
An update: I was able to ‘delete' my stuck measurements via the API, so they’re stopped now and I’m back up and running for the moment.
I also added an API command to my code to ‘delete’ measurements as soon as the results have been picked up, which I hoped would make this fix sustainable, but so far that doesn’t seem to be doing anything. Perhaps a longer delay is required between creating the measurement and sending the ‘delete’ command?
Thanks, Steve
On Dec 28, 2019, at 3:20 PM, Steve Gibbard <scg@gibbard.org> wrote:
Hi Atlas folks,
I hope you’re having a good holiday season. Sorry to interrupt it by complaining about issues.
On Christmas Eve my time (early Christmas morning your time) there was an Atlas issue where any attempt at reading measurements failed with an HTTP 500 status error. That appears to have gotten fixed on Christmas (a really big thank you to whoever worked on that) but since then it appears that while most of the one-off measurements we’ve created have delivered results very quickly, none of the measurements created since 17:00 UTC on 2019-12-25 have stopped running. As shown in the Atlas portal:
23722197 Traceroute www.globaltraceroute.com (AS13335) Test Traceroute 1 one-off 2019-12-25 22:24 Never 23722089 Traceroute archive.ubuntu.com (AS41231) Test Traceroute 1 one-off 2019-12-25 19:16 Never 23722088 Traceroute sps.prima.com.ar (AS10318) Test Traceroute 1 one-off 2019-12-25 19:14 Never 23721915 Traceroute www.globaltraceroute.com (AS13335) Test Traceroute 1 one-off 2019-12-25 17:00 Never
And on for every measurement between then and now.
Previously, the typical one-off measurement was listed with start and stop times less than 10 minutes apart.
When a user has 100 measurements running concurrently, creation of new measurements fails, which is happening for me now.
If somebody could take a look at this, I’d really appreciate it.
Thanks, Steve
-- Steve Gibbard scg@gibbard.org +1 415 717-7842 (cell)