An update: I was able to ‘delete' my stuck measurements via the API, so they’re stopped now and I’m back up and running for the moment. I also added an API command to my code to ‘delete’ measurements as soon as the results have been picked up, which I hoped would make this fix sustainable, but so far that doesn’t seem to be doing anything. Perhaps a longer delay is required between creating the measurement and sending the ‘delete’ command? Thanks, Steve
On Dec 28, 2019, at 3:20 PM, Steve Gibbard <scg@gibbard.org> wrote:
Hi Atlas folks,
I hope you’re having a good holiday season. Sorry to interrupt it by complaining about issues.
On Christmas Eve my time (early Christmas morning your time) there was an Atlas issue where any attempt at reading measurements failed with an HTTP 500 status error. That appears to have gotten fixed on Christmas (a really big thank you to whoever worked on that) but since then it appears that while most of the one-off measurements we’ve created have delivered results very quickly, none of the measurements created since 17:00 UTC on 2019-12-25 have stopped running. As shown in the Atlas portal:
23722197 Traceroute www.globaltraceroute.com (AS13335) Test Traceroute 1 one-off 2019-12-25 22:24 Never 23722089 Traceroute archive.ubuntu.com (AS41231) Test Traceroute 1 one-off 2019-12-25 19:16 Never 23722088 Traceroute sps.prima.com.ar (AS10318) Test Traceroute 1 one-off 2019-12-25 19:14 Never 23721915 Traceroute www.globaltraceroute.com (AS13335) Test Traceroute 1 one-off 2019-12-25 17:00 Never
And on for every measurement between then and now.
Previously, the typical one-off measurement was listed with start and stop times less than 10 minutes apart.
When a user has 100 measurements running concurrently, creation of new measurements fails, which is happening for me now.
If somebody could take a look at this, I’d really appreciate it.
Thanks, Steve