Problem with UDM
Hi, I have just activated my first sw probe, on a dedicated CentOS7 VM, id 1000408. Out of habit, I did configure CentOS the way I usually configure my servers, notably by running OpenSSH on a non-default port and by activating only the ed25519 host key in sshd_config. I noticed 3 issues after having let the atlas-probe service run for a while, which included a couple of service restarts : 1. The SOS History table for the probe on the probe web page was not being populated with any data after the restarts. 2. The ./status/ssh_err.txt file showed the following message: « RSA host key for IP address '178.63.8.31' not in list of known hosts. » 3. I tried to set up a test one-off measurement using my sw probe by manually selecting its ID on the measurement setup screen, but it would not register the probe. I also noted after 12+ hours that no UDM was active for the probe from other users (from my experience with a hardware probe I have been operating for a while, it should be put at use by the network pretty quickly unless I am too impatient here ;-) ). My first action was to revert OpenSSH to default settings (use port 22, activate the RSA host key). This apparently contributed to solving problem 1 as the SOS History data started to appear on subsequent service restarts. Concerning the SSH error logged for the probe, the RSA Key for 'ctr-fsn01.atlas.ripe.net' was present in /var/atlas-probe/.ssh/known_hosts, but there was no entry in the file for a key for the corresponding IP address 178.63.8.31. I tried to add it to the probe known_hosts file but it disappeared after restarting the service. I guess the file is overwritten on service restart. I then added an entry for the RSA key for the IP address in the global /etc/ssh/ssh_known_hosts. The SSH error disappeared from the probe log file on the following service restart. So problem 2 solved for me, but that raises the question of whether there is a key missing for the probe known_hosts file for that IP address ? Now I am left with problem 3 and I have no clue as to why I can’t create a UDM which uses my sw probe ID. The built-in measurements seem to work fine if I read the probe web page info correctly. Any help welcome. Thierry
Hi, On 2020-04-16 17:59, Thierry Montigneaux wrote:
Hi,
I have just activated my first sw probe, on a dedicated CentOS7 VM, id 1000408.
Out of habit, I did configure CentOS the way I usually configure my servers, notably by running OpenSSH on a non-default port and by activating only the ed25519 host key in sshd_config.
I noticed 3 issues after having let the atlas-probe service run for a while, which included a couple of service restarts :
1. The SOS History table for the probe on the probe web page was not being populated with any data after the restarts.
I believe firmware 5020, which is being rolled out now, adds this to software probes. In your case the SOS messages were just delivered a bit later.
2. The ./status/ssh_err.txt file showed the following message: « RSA host key for IP address '178.63.8.31' not in list of known hosts. »
This is safe to ignore. No host keys need to be added manually.
3. I tried to set up a test one-off measurement using my sw probe by manually selecting its ID on the measurement setup screen, but it would not register the probe. I also noted after 12+ hours that no UDM was active for the probe from other users (from my experience with a hardware probe I have been operating for a while, it should be put at use by the network pretty quickly unless I am too impatient here ;-) ).
This one was interesting, so we tracked it down. It turns out that the probes which have bandwidth limits set but don't report their bandwidth usage were not scheduled to do UDMs. This only affects SW probes where bandwidth reporting is opt-in. We are addressing this with improved documentation on bandwidth limits and changing the logic of how this is applied to software probes.
My first action was to revert OpenSSH to default settings (use port 22, activate the RSA host key). This apparently contributed to solving problem 1 as the SOS History data started to appear on subsequent service restarts.
The firmware only uses ssh for outgoing connections, so while these changes may have seemed to help, they probably didn't :-) Cheers, Robert
Concerning the SSH error logged for the probe, the RSA Key for 'ctr-fsn01.atlas.ripe.net <http://ctr-fsn01.atlas.ripe.net/>' was present in /var/atlas-probe/.ssh/known_hosts, but there was no entry in the file for a key for the corresponding IP address 178.63.8.31. I tried to add it to the probe known_hosts file but it disappeared after restarting the service. I guess the file is overwritten on service restart. I then added an entry for the RSA key for the IP address in the global /etc/ssh/ssh_known_hosts. The SSH error disappeared from the probe log file on the following service restart.
So problem 2 solved for me, but that raises the question of whether there is a key missing for the probe known_hosts file for that IP address ?
Now I am left with problem 3 and I have no clue as to why I can’t create a UDM which uses my sw probe ID. The built-in measurements seem to work fine if I read the probe web page info correctly. Any help welcome.
Thierry
I believe firmware 5020, which is being rolled out now, adds this to software probes. In your case the SOS messages were just delivered a bit later.
ok, false positive then ;-)
2. The ./status/ssh_err.txt file showed the following message: « RSA host key for IP address '178.63.8.31' not in list of known hosts. »
This is safe to ignore. No host keys need to be added manually.
Noted.
3. I tried to set up a test one-off measurement using my sw probe by manually selecting its ID on the measurement setup screen, but it would not register the probe. I also noted after 12+ hours that no UDM was active for the probe from other users (from my experience with a hardware probe I have been operating for a while, it should be put at use by the network pretty quickly unless I am too impatient here ;-) ).
This one was interesting, so we tracked it down. It turns out that the probes which have bandwidth limits set but don't report their bandwidth usage were not scheduled to do UDMs. This only affects SW probes where bandwidth reporting is opt-in.
We are addressing this with improved documentation on bandwidth limits and changing the logic of how this is applied to software probes.
Thanks for the feedback. I have now remove the bandwidth limit I had initially put, have created a new measurement, and confirm that it includes the sw probe. One last question though: you mention bandwidth reporting being "opt-in" for sw probes, but I can't seem to find where the setting can be adjusted on the probe web page. Is it user-modifiable somewhere ?
The firmware only uses ssh for outgoing connections, so while these changes may have seemed to help, they probably didn't :-)
Thanks for the confirmation. It was also my understanding that the probe does not accept incoming connections (which would not go through my firewall as it is configured now anyway), but with the SSH error log, I reverted the whole OpenSSH config back to defaults "just in case" ;-) Regards, Thierry
Hi,
We are addressing this with improved documentation on bandwidth limits and changing the logic of how this is applied to software probes.
Thanks for the feedback. I have now remove the bandwidth limit I had initially put, have created a new measurement, and confirm that it includes the sw probe.
One last question though: you mention bandwidth reporting being "opt-in" for sw probes, but I can't seem to find where the setting can be adjusted on the probe web page. Is it user-modifiable somewhere ?
It's documented here: https://github.com/RIPE-NCC/ripe-atlas-software-probe#runtime-configuration-... Granted, this is only visible if you're looking at the repo, which is not something everyone does. I'll improve our installation instructions to include this. Regards, Robert
participants (3)
-
Robert Kisteleki
-
Thierry M.
-
Thierry Montigneaux