Software probe keeps going to disconnected
Hello all, I have 2 software probes. One is on a datacenter VPS and the other at home (runs on Proxmox). The one at home keeps going to disconnected. The probe is Debian 12. Probe version is 5080-1. This machine has v6 as well - so I was thinking maybe something with temp/privacy addresses getting reaped but no, I only have one good old EUI-64 address on there. (Knock on wood) - no other VMs on my Proxmox machine have any issues like this. The only thing that looks remotely related (and this happened a bit before Nagios sent out an alert) is this: Jun 17 12:34:41 atlas-probe ATLAS[123749]: /usr/local/atlas/bin/ATLAS: 99: kill: No such process Jun 17 12:34:41 atlas-probe ATLAS[123749]: no ssh client matching . cleanup state files. for next restart Jun 17 12:34:41 atlas-probe ATLAS[123749]: RESULT 9006 done 1718642081 bc2411ca94c6 no reginit.vol start registration Jun 17 12:34:41 atlas-probe ATLAS[123749]: /var/atlas-probe/status/reginit.vol does not exist try new reg Jun 17 12:34:45 atlas-probe ATLAS[123749]: Ping works Jun 17 12:34:45 atlas-probe ATLAS[123749]: /usr/local/atlas/bin/ATLAS: 92: kill: No such process Jun 17 12:34:45 atlas-probe ATLAS[123749]: start reg Jun 17 12:34:45 atlas-probe ATLAS[144349]: ATLAS registration starting Jun 17 12:34:45 atlas-probe ATLAS[144349]: REREG_TIMER 1718416507 expired now is 1718642085 Jun 17 12:34:45 atlas-probe ATLAS[144349]: REREG_TIMER_EXPIRED go re register REREG_TIMER 1 , now is 1718642085 Jun 17 12:34:45 atlas-probe ATLAS[144349]: REGHOSTS reg03.atlas.ripe.net 193.0.19.246 2001:67c:2e8:11::c100:13f6 reg04.atlas.ripe.net 193.0.19.247 2001:67c:2e8:11::c100:13f7 Jun 17 12:34:45 atlas-probe ATLAS[144349]: ssh -p 443 atlas@reg03.atlas.ripe.net INIT Jun 17 12:34:48 atlas-probe ATLAS[144349]: Got good controller info Jun 17 12:34:48 atlas-probe ATLAS[144349]: check cached controller info from previous registration Jun 17 12:34:48 atlas-probe ATLAS[144349]: NO cached controller info. NO REMOTE port info Jun 17 12:34:48 atlas-probe ATLAS[144349]: Do a controller INIT Jun 17 12:34:48 atlas-probe ATLAS[144349]: Controller init -p 443 atlas@ctr01.atlas-prod.aws.ripe.net INIT I believe I should see an persistent ssh outbound ? The only ESTAB is me ssh’ing in, and a 443 to RIPE: root@atlas-probe:/home/vom# ss -planto | grep ESTAB ESTAB 0 0 192.168.69.111:22 192.168.64.21:65409 users:(("sshd",pid=145083,fd=4),("sshd",pid=145059,fd=4)) timer:(keepalive,115min,0) ESTAB 0 0 [2603:6011:6300:1f02:be24:11ff:feca:94c6]:37176 [2a13:27c0:1c:d01:cc61:f3a3:920d:cd88]:443 users:(("ssh",pid=144380,fd=3)) timer:(keepalive,103min,0) Let me know what else I can look at. I assume there is some debug I can crank up ? Just going to reboot this for now to get it happy and get Nagios to shut up. Thanks.
Hi, Sorry you’re having issues. Can you please share the probe id with me? It should be in /var/atlas-probe/status (top of my head). reg_init_reply.txt Is this an RPM install or did you compile this by yourself? Regards, Michel
On 17 Jun 2024, at 20:56, vom513 <vom513@gmail.com> wrote:
Hello all,
I have 2 software probes. One is on a datacenter VPS and the other at home (runs on Proxmox).
The one at home keeps going to disconnected. The probe is Debian 12. Probe version is 5080-1.
This machine has v6 as well - so I was thinking maybe something with temp/privacy addresses getting reaped but no, I only have one good old EUI-64 address on there.
(Knock on wood) - no other VMs on my Proxmox machine have any issues like this.
The only thing that looks remotely related (and this happened a bit before Nagios sent out an alert) is this:
Jun 17 12:34:41 atlas-probe ATLAS[123749]: /usr/local/atlas/bin/ATLAS: 99: kill: No such process Jun 17 12:34:41 atlas-probe ATLAS[123749]: no ssh client matching . cleanup state files. for next restart Jun 17 12:34:41 atlas-probe ATLAS[123749]: RESULT 9006 done 1718642081 bc2411ca94c6 no reginit.vol start registration Jun 17 12:34:41 atlas-probe ATLAS[123749]: /var/atlas-probe/status/reginit.vol does not exist try new reg Jun 17 12:34:45 atlas-probe ATLAS[123749]: Ping works Jun 17 12:34:45 atlas-probe ATLAS[123749]: /usr/local/atlas/bin/ATLAS: 92: kill: No such process Jun 17 12:34:45 atlas-probe ATLAS[123749]: start reg Jun 17 12:34:45 atlas-probe ATLAS[144349]: ATLAS registration starting Jun 17 12:34:45 atlas-probe ATLAS[144349]: REREG_TIMER 1718416507 expired now is 1718642085 Jun 17 12:34:45 atlas-probe ATLAS[144349]: REREG_TIMER_EXPIRED go re register REREG_TIMER 1 , now is 1718642085 Jun 17 12:34:45 atlas-probe ATLAS[144349]: REGHOSTS reg03.atlas.ripe.net 193.0.19.246 2001:67c:2e8:11::c100:13f6 reg04.atlas.ripe.net 193.0.19.247 2001:67c:2e8:11::c100:13f7 Jun 17 12:34:45 atlas-probe ATLAS[144349]: ssh -p 443 atlas@reg03.atlas.ripe.net INIT Jun 17 12:34:48 atlas-probe ATLAS[144349]: Got good controller info Jun 17 12:34:48 atlas-probe ATLAS[144349]: check cached controller info from previous registration Jun 17 12:34:48 atlas-probe ATLAS[144349]: NO cached controller info. NO REMOTE port info Jun 17 12:34:48 atlas-probe ATLAS[144349]: Do a controller INIT Jun 17 12:34:48 atlas-probe ATLAS[144349]: Controller init -p 443 atlas@ctr01.atlas-prod.aws.ripe.net INIT
I believe I should see an persistent ssh outbound ? The only ESTAB is me ssh’ing in, and a 443 to RIPE:
root@atlas-probe:/home/vom# ss -planto | grep ESTAB ESTAB 0 0 192.168.69.111:22 192.168.64.21:65409 users:(("sshd",pid=145083,fd=4),("sshd",pid=145059,fd=4)) timer:(keepalive,115min,0) ESTAB 0 0 [2603:6011:6300:1f02:be24:11ff:feca:94c6]:37176 [2a13:27c0:1c:d01:cc61:f3a3:920d:cd88]:443 users:(("ssh",pid=144380,fd=3)) timer:(keepalive,103min,0)
Let me know what else I can look at. I assume there is some debug I can crank up ?
Just going to reboot this for now to get it happy and get Nagios to shut up.
Thanks.
-- ripe-atlas mailing list ripe-atlas@ripe.net https://lists.ripe.net/mailman/listinfo/ripe-atlas
On Jun 19, 2024, at 6:09 AM, Michel Stam <mstam@ripe.net> wrote:
Hi,
Sorry you’re having issues.
Can you please share the probe id with me? It should be in /var/atlas-probe/status (top of my head). reg_init_reply.txt
1008117
Is this an RPM install or did you compile this by yourself?
Cloned from: https://github.com/RIPE-NCC/ripe-atlas-software-probe ...and followed instructions to build a .deb. Thanks.
participants (2)
-
Michel Stam
-
vom513