An article/FAQ entry about probe troubleshooting
Hi all, it seems that failing probes are getting more and more to be an issue, possibly due to USB flash drive failure. I think it would be nice if there was an article / FAQ entry with some official troubleshooting procedures, so we could just send a link to anybody complaining about non functional probe. Is there something like this on the roadmap? Perhaps we as the community could make something up: - how to check whether the port probe is connected to has proper IP and DNS parameters and is allowed to establish a SSH session to TCP port 443 in the Internet (beware of DPI firewalls that allow HTTPS but not SSH over 443) - how to check the LED status and the SOS list at the probe page (the current LED explanation in the FAQ is too techy and even power users like me still don't understand all presented states :) ) - how to diagnose USB flash drive failure, how to force reformatting the flash drive and how to replace it with a new one What do you think? Cheers, Ondřej Caletka CESNET
+1 ! Sarah ----- Mail original ----- De: "Ondřej Caletka" <Ondrej.Caletka@cesnet.cz> À: ripe-atlas@ripe.net Envoyé: Lundi 25 Janvier 2016 16:20:50 Objet: [atlas] An article/FAQ entry about probe troubleshooting Hi all, it seems that failing probes are getting more and more to be an issue, possibly due to USB flash drive failure. I think it would be nice if there was an article / FAQ entry with some official troubleshooting procedures, so we could just send a link to anybody complaining about non functional probe. Is there something like this on the roadmap? Perhaps we as the community could make something up: - how to check whether the port probe is connected to has proper IP and DNS parameters and is allowed to establish a SSH session to TCP port 443 in the Internet (beware of DPI firewalls that allow HTTPS but not SSH over 443) - how to check the LED status and the SOS list at the probe page (the current LED explanation in the FAQ is too techy and even power users like me still don't understand all presented states :) ) - how to diagnose USB flash drive failure, how to force reformatting the flash drive and how to replace it with a new one What do you think? Cheers, Ondřej Caletka CESNET
An issue with having a FAQ for the failing USBs is that people will perform the procedure when it is not necessary. I think though that there should still be more debugging info for connections. On Monday, January 25, 2016, <sarah.wassermann@student.ulg.ac.be> wrote:
+1 !
Sarah
----- Mail original ----- De: "Ondřej Caletka" <Ondrej.Caletka@cesnet.cz <javascript:;>> À: ripe-atlas@ripe.net <javascript:;> Envoyé: Lundi 25 Janvier 2016 16:20:50 Objet: [atlas] An article/FAQ entry about probe troubleshooting
Hi all,
it seems that failing probes are getting more and more to be an issue, possibly due to USB flash drive failure. I think it would be nice if there was an article / FAQ entry with some official troubleshooting procedures, so we could just send a link to anybody complaining about non functional probe.
Is there something like this on the roadmap? Perhaps we as the community could make something up:
- how to check whether the port probe is connected to has proper IP and DNS parameters and is allowed to establish a SSH session to TCP port 443 in the Internet (beware of DPI firewalls that allow HTTPS but not SSH over 443)
- how to check the LED status and the SOS list at the probe page (the current LED explanation in the FAQ is too techy and even power users like me still don't understand all presented states :) )
- how to diagnose USB flash drive failure, how to force reformatting the flash drive and how to replace it with a new one
What do you think?
Cheers, Ondřej Caletka CESNET
Hi, first, yes, a FAQ article would be nice. The USB stick of my probe broke in November and I wrote to atlas@ripe.net because I was not able to find anything about this problem. After a little ping pong mailing and a to small USB stick I was finally able to get the probe back online. [1] helped to find out that the USB stick needs a capacity of about 3 GB at minimum. But I also suggest that the error tag has to be more specific. In my case it was simple "Hardware problem supspected" (yes, with typo, I think this is corrected now). If the system is able to detect that the USB stick is read only it surely can report this. And then it will be only a little step to automatically send a mail to the owner. That would be great. Greetings, Christian Estelmann [1] https://www.mdsec.co.uk/2015/09/an-introduction-to-hardware-hacking-the-ripe... Am 25.01.2016 um 16:28 schrieb Tanner Ryan:
An issue with having a FAQ for the failing USBs is that people will perform the procedure when it is not necessary.
I think though that there should still be more debugging info for connections.
On Monday, January 25, 2016, <sarah.wassermann@student.ulg.ac.be <mailto:sarah.wassermann@student.ulg.ac.be>> wrote:
+1 !
Sarah
----- Mail original ----- De: "Ondřej Caletka" <Ondrej.Caletka@cesnet.cz <javascript:;>> À: ripe-atlas@ripe.net <javascript:;> Envoyé: Lundi 25 Janvier 2016 16:20:50 Objet: [atlas] An article/FAQ entry about probe troubleshooting
Hi all,
it seems that failing probes are getting more and more to be an issue, possibly due to USB flash drive failure. I think it would be nice if there was an article / FAQ entry with some official troubleshooting procedures, so we could just send a link to anybody complaining about non functional probe.
Is there something like this on the roadmap? Perhaps we as the community could make something up:
- how to check whether the port probe is connected to has proper IP and DNS parameters and is allowed to establish a SSH session to TCP port 443 in the Internet (beware of DPI firewalls that allow HTTPS but not SSH over 443)
- how to check the LED status and the SOS list at the probe page (the current LED explanation in the FAQ is too techy and even power users like me still don't understand all presented states :) )
- how to diagnose USB flash drive failure, how to force reformatting the flash drive and how to replace it with a new one
What do you think?
Cheers, Ondřej Caletka CESNET
Hi, On Mon, Jan 25, 2016 at 04:34:57PM +0100, Estelmann, Christian wrote:
If the system is able to detect that the USB stick is read only it surely can report this.
+1 - well, it actually *does* report it as part of the SOS DNS queries, but the whole process "there is a SOS that looks very typical for FAQ item 37 -> send note to user pointing to that FAQ item" is, uh, somewhat underdeveloped right now. (Two of my probes have been through the USB dance, and it took something like two weeks of e-mailing back and forth to resolve the issue, which is a tad long for a FAQ issue...) gert -- have you enabled IPv6 on something today...? SpaceNet AG Vorstand: Sebastian v. Bomhard Joseph-Dollinger-Bogen 14 Aufsichtsratsvors.: A. Grundner-Culemann D-80807 Muenchen HRB: 136055 (AG Muenchen) Tel: +49 (0)89/32356-444 USt-IdNr.: DE813185279
Hi, Am 25.01.2016 um 16:41 schrieb Gert Doering:
Hi,
On Mon, Jan 25, 2016 at 04:34:57PM +0100, Estelmann, Christian wrote:
If the system is able to detect that the USB stick is read only it surely can report this.
+1 - well, it actually *does* report it as part of the SOS DNS queries, but the whole process "there is a SOS that looks very typical for FAQ item 37 -> send note to user pointing to that FAQ item" is, uh, somewhat underdeveloped right now.
I have not the technical equipment at home to sniff the traffic*. So I am not able to see which DNS queries are performed. But simple tagging the probe with something like "USB stick read-only" will surely help to find the problem. The current "something is wrong" tag is not very helpful (imho). *OK, I could boot Knoppix on my laptop (probe connected to Ethernet port, traffic forwarded via WLAN to the internet) to sniff the traffic. Or something like this... Greetings, Christian Estelmann
Hi, On Mon, Jan 25, 2016 at 04:49:54PM +0100, Estelmann, Christian wrote:
On Mon, Jan 25, 2016 at 04:34:57PM +0100, Estelmann, Christian wrote:
If the system is able to detect that the USB stick is read only it surely can report this.
+1 - well, it actually *does* report it as part of the SOS DNS queries, but the whole process "there is a SOS that looks very typical for FAQ item 37 -> send note to user pointing to that FAQ item" is, uh, somewhat underdeveloped right now.
I have not the technical equipment at home to sniff the traffic*. So I am not able to see which DNS queries are performed. But simple tagging the probe with something like "USB stick read-only" will surely help to find the problem. The current "something is wrong" tag is not very helpful (imho).
This is what I'm saying. The probe is clearly communicating the problem back - these DNS requests hit the RIPE DNS resolvers, and they know what is wrong. The SOS messages are listed on the atlas web site, just the "draw automated conclusion and notify user what to do next" bit is missing.
*OK, I could boot Knoppix on my laptop (probe connected to Ethernet port, traffic forwarded via WLAN to the internet) to sniff the traffic. Or something like this...
"Not being able to sniff the traffic" is a somewhat lame excuse, but missing the point :-) Gert Doering -- NetMaster -- have you enabled IPv6 on something today...? SpaceNet AG Vorstand: Sebastian v. Bomhard Joseph-Dollinger-Bogen 14 Aufsichtsratsvors.: A. Grundner-Culemann D-80807 Muenchen HRB: 136055 (AG Muenchen) Tel: +49 (0)89/32356-444 USt-IdNr.: DE813185279
Dne 25.1.2016 v 16:28 Tanner Ryan napsal(a):
An issue with having a FAQ for the failing USBs is that people will perform the procedure when it is not necessary.
The question is how much such procedure harms the probe. AFAIK it should not make any harm to the Atlas system as the measurement results are streamed to the Atlas C&C servers in near real time. Plus, when the probe is offline due to USB flash brokenness it brings no value to the Atlas system at all. -- Ondřej
Hi,
I think though that there should still be more debugging info for connections.
A very similar discussion happened almost exactly two weeks ago on this list, where I responded:
However, we're working on a feature to give probe hosts more guidance about what's going on (and especially what's going wrong) with their probe (*), and here we will make it clear if the USB replacement is in order.
Besides this, I'm sure we'll happily endorse any kind of community-written troubleshooting guides :) Regards, Robert
Hi,
I think though that there should still be more debugging info for connections.
However, we're working on a feature to give probe hosts more guidance about what's going on (and especially what's going wrong) with their probe (*), and here we will make it clear if the USB replacement is in order.
In the interim, the status feature has been launched, but it doesn't help when the probe does not connect to the atlas server.
Besides this, I'm sure we'll happily endorse any kind of community-written troubleshooting guides :)
For want of a better place I've started a troubleshooting guide on my wikipedia sandbox and would invite the community to help bring it to fruition. Your contributions are appreciated! https://en.wikipedia.org/wiki/User:Mhi/sandbox/Troubleshooting_RIPE-Atlas_Pr... Regards, Michael Sent via RIPE Forum -- https://www.ripe.net/participate/mail/forum
On 01/05/2016 10:56, Michael Ionescu wrote:
Hi,
I think though that there should still be more debugging info for connections. However, we're working on a feature to give probe hosts more guidance about what's going on (and especially what's going wrong) with their probe (*), and here we will make it clear if the USB replacement is in order. In the interim, the status feature has been launched, but it doesn't help when the probe does not connect to the atlas server.
Besides this, I'm sure we'll happily endorse any kind of community-written troubleshooting guides :) For want of a better place I've started a troubleshooting guide on my wikipedia sandbox and would invite the community to help bring it to fruition. Your contributions are appreciated! https://en.wikipedia.org/wiki/User:Mhi/sandbox/Troubleshooting_RIPE-Atlas_Pr...
Very nice! Thanks, Hank
Regards, Michael
Sent via RIPE Forum -- https://www.ripe.net/participate/mail/forum
Useful idea (and made a small edit). Although some troubleshooting can be carried out via the probe's web datapage the only other option still appears to be the physical 'remove card, reformat, replace' one. I've noted in the past that the probe can be showing itself as 'down' when the link is demonstrably up (because I'm using it!) which puzzles me, and again begs the question of whether the probe can be 'nudged' or rebooted via a local IP call of some sort (cf. wakeup packet) if it appears to have a problem but the probe is remote from the owner. Sent via RIPE Forum -- https://www.ripe.net/participate/mail/forum
participants (9)
-
Alison Wheeler
-
Estelmann, Christian
-
Gert Doering
-
Hank Nussbacher
-
Michael Ionescu
-
Ondřej Caletka
-
Robert Kisteleki
-
sarah.wassermann@student.ulg.ac.be
-
Tanner Ryan