Re: [atlas] My experiences with a SW probe…
Hey Ernst, Thanks for your email. Regarding the busybox implementation, there’s history here. As I understand it, the first probes were implemented on a system without a memory management unit. This created the problem that starting and restarting processes caused memory fragmentation. That lead to the system running out of memory, which is why an approach was chosen with a multi-call binary: Only one program loaded into memory, and busybox already was implemented and available on the system. Of course, since the version 3 probes (the TPLINK), there is no need anymore. There is a memory management unit taking care of this. However, the code was retained because it was already implemented. Later on, the decision was made to offer an installable software probe package for regular Linux systems, which is in essence the same code running on the hardware probes, which brings us to today. Today you find busybox even on regular Linux systems, but then typically in the initrd environment, prior to the system booting from the root filesystem. We’re currently refactoring parts of the probes to make things easier to maintain. This may mean, without guaranteeing it, that this can change in the future. Why ATLAS starts properly using /etc/init.d and not on boot, I cannot say. It can be as simple as the networking not being available or a race condition. I would suggest to touch base with the author of this code and ask? The debate whether a Windows version would be beneficial is something that we should have with a wider audience, and should cover such topics as 24/7 availability, usability and convenience for the user. I merely want to point out that the engineering effort may be significant. If other people would like to weigh in on this matter, that would be welcome. Will keep you posted. Regards, Michel
On 6 Jul 2022, at 16:18, Ernst J. Oud <ernstoud@gmail.com> wrote:
Michel,
Thanks for the concise answer.
Good to know that my problems with the CentOS installation can be reproduced. Turns out I am not an idiot after all :-)
In a Linux container in Docker Desktop for Windows even the standard Linux traceroute command of Ubuntu or CentOS does not work. Ping does but traceroute does not. As per my previous email, it seems that this a vpnkit bug in Docker. Not a lot that can be done, on GitHub I asked for progress on this issue, no response yet.
On my Windows 10 PC I measure a pretty constant 940/940 speed using Ookla SpeedTest. Only when VMware was installed also on that PC in bridge mode, those measurements went haywire. Ofcourse a bridge for networking between the host and a virtual machine requires a network driver in VMware doing some trickery. That might explain why. Ofcourse this has nothing to do with the SW probe itself.
Thanks for the input on telnetd and busybox. When I had a look at the SW probe after installation I indeed wondered what the heck was going on with a busybox binary, which in embedded Linux environments and in small Linux distributions such as Alpine, has its place. Did not understand why RIPE created a multi-call binary for the probe. And calling it busybox is even more confusing…
Still doesn’t explain why stopping the ATLAS service and restarting it does correctly load the RIPE telnetd daemon but starting the service at boot does not.
For me a Windows version only would mean saving disk space for VMWare and a virtual machine on that mini PC. Also I noticed that the traceroute results of the SW probe are around 1 ms. higher in the first hop, due to the probe running in a virtual machine. Both issues are not a big deal. What I meant more is that the amount of people willing to install the probe, if a Windows service would be available, would be potentially much, much larger. The Windows client community is so much larger than the Linux client community…
Take care,
Met vriendelijke groet / Regards,
Ernst J. Oud
Met vriendelijke groet,
Ernst J. Oud
On 6 Jul 2022, at 14:07, Michel Stam <mstam@ripe.net> wrote:
Hi Ernst,
First of all, thanks for the logs.
I will group all the individual subtopics of this email :) I hope I caught all of them. If not, you know where to find me.
Centos VM install: =============== I just did a clean install on a VM here, it seems there’s indeed something wrong with the repository on our end. YUM does not seem to find the atlasswprobe package. I’ll have a look why this is and get back to you.
Traceroute in Docker for Windows: =========================== I have not seen this combination before. I do know that we’re doing very low-level networking communication to get such as traceroute & ping to work. We could try and see if this works, but there is a chance that this is reaching a limitation within Linux containers under Windows. Are you aware of any that may affect this? (Raw IP sockets).
Speed on VMs: ============= I’m not sure its related, but in a previous life I did quite a few tests on the LAN with iPerf. I noticed that performance reports with Windows (hosts) were consistently lower than with Linux hosts. No VMs involved, running on the host itself (flat point-to-point without firewalls etc). Maximum speed I’ve ever observed with 2 Linux laptops and iPerf over a 1G switch is around the 900-930Mbps. If Windows laptops were involved (same hardware), typically 700-800 Mbps.
I’ll add that these were my observations at the time, and mileage may vary on other hardware/OS combinations. I used a Dell Latitude and a Dell Precision laptop. I never investigated further than this, because my goal was to test the infrastructure, not the endpoints. It may be relevant here, though.
OpenWRT: ========= There’s a bit of history here. The measurement code of Atlas was derived from a Busybox release at some point in the distant past. The telnetd from Busybox was repurposed to host a process for the backend to communicate with the probe. So while its called telnetd, its not the telnetd that you’d expect in a regular Linux system, it works quite differently from the original purpose of telnetd (a shell). I believe software probes typically install it in /usr/local/atlas/bb-13.3/bin.
It seems the community probe installs files in /usr/libexec/atlas-probe, so there’s the first difference. The startup scripts also do things quite a bit different than the RIPE NCC version does. This may be a second indication. There’s also a rpcd process added that works differently.
We do have a cleanup of the way the software probe works on our roadmap, and it will be good to touch base with the author of the community version to align. As yet, it seems the community version and the RIPE NCC version have diverged somewhat. As yet, it is probably better to contact the author of the community version for this particular issue.
Windows: ======== A Windows port would mean looking at what is feasible on the Windows platform, which may involve significant recoding, if not rewriting. Even libraries such as MINGW32 might not fully resolve this. We would appreciate if someone would be willing to put time into such an effort, otherwise I don’t think we will be able to create such a version in the near future.
I am curious though, how would a Windows version of the probe software work better for you than the existing Linux version?
Will keep you posted on the CentOS bug.
Regards,
Michel
On 5 Jul 2022, at 18:50, Ernst J. Oud <ernstoud@gmail.com <mailto:ernstoud@gmail.com>> wrote:
Michel,
Twice (once in Docker Desktop for Windows and once in VMware Player 16) I exactly followed the instructions as in https://github.com/RIPE-NCC/ripe-atlas-probe-doc/blob/master/manuals/CentOS-... <https://github.com/RIPE-NCC/ripe-atlas-probe-doc/blob/master/manuals/CentOS-7-binary.en.md> and twice in the last step, the actual installation of the binary package (yum install atlasswprobe), yum complained that there was no such package. I then downloaded it directly from https://ftp.ripe.net/ripe/atlas/software-probe/centos7/noarch/ripe-atlas-rep... <https://ftp.ripe.net/ripe/atlas/software-probe/centos7/noarch/ripe-atlas-repo-1-3.el7.noarch.rpm> and did a yum install of that package which worked fine.
Traceroute in Linux containers on Docker Desktop for Windows is known to not work (see: https://github.com/moby/vpnkit/issues/194 <https://github.com/moby/vpnkit/issues/194>) and in VMware it only works in bridge mode.
However on the Windows 10 PC on which I run hourly SpeedTests and ping tests those tests were impacted after installation of VMWare, CentOS and the probe on that PC. I have gigabit fiber and instead of 940/940 Mbps. the speeds fluctuated, between 900/900 and 900/800. Perhaps an effect of VMware bridge mode on a physical Ethernet port shared with the Windows host.
When using NAT mode in VMware on that PC again traceroute did not work, only the first hop is shown, subsequent hops showed * * *.
Not to worry, I installed the probe on CentOS on VMware on another dedicated mini Windows 10 PC using bridged network mode and all is well. (Another PC in my home network…).
Earlier On OpenWRT 21.01 I installed the probe from the Luci GUI, i.e. https://openwrt.org/packages/pkgdata/atlas-probe <https://openwrt.org/packages/pkgdata/atlas-probe>. This is the community version. It ran fine but I didn’t want to sacrifice a router just for this purpose.
Why the ATLAS script fails to load telnetd in that OpenWRT environment is beyond me. Manual installs of telnetd worked fine so the executable is found. No errors in dmesg or anywhere else. Logging in via SSH and doing an “/etc/init.d/atlas stop” and then an “/etc/init.d/atlas start” worked fine. Perhaps telnetd is started from init.d with priority S30 too early in the boot process?
As stated above, the install of the SW probe requires on Windows a hypervisor or VMware (perhaps Virtualbox will also work) which means that Windows users must be able to know how that all works and must have basic Linux knowledge. Perhaps a Windows version of the probe would mean a lot more probes in future? Hard to estimate whether it is worth the development effort for RIPE. Ping, traceroute and scheduling work fine in Windows so a Windows app is technically possible but a lot of work perhaps.
Let me know if the issue of step 4 of the install under CentOS cannot be reproduced, then I will do it again with a log of all commands issued, but I cut/pasted the commands from the installation instructions mentioned above. So I am pretty sure there is a small error on the RIPE end of things. I see there are 3 versions of the binary on the RIPE FTP site mentioned above, perhaps yum as instructed in the instructions loads the wrong version?
Thanks for your time.
Met vriendelijke groet / Regards,
Ernst J. Oud
On 5 Jul 2022, at 13:26, Michel Stam <mstam@ripe.net <mailto:mstam@ripe.net>> wrote:
Hi Ernst,
Sorry to hear you are having issues with installing the probe software.
Can you maybe provide a step by step instruction how you got to the yum atlasswprobe error? I would like to check that everything is ok on our end.
Secondly, regarding the bandwidth problems, can you explain what kind of issues you experience in non-bridged modes? What have you tried, and what was the result? Traceroute is known to work through NAT connections, which is typically what hypervisors use to share the connection (if bridging is not used).
On your OpenWRT probe installation attempt, it seems there are 2 versions in the field; A version maintained by RIPE NCC used solely for the hardware probes. A community package written by the OpenWRT community, which uses the software probe implementation You mentioned having startup problems with the telnet daemon on the OpenWRT community package, I cannot say why this is, but the most likely culprit is the software not being able to find the location of the binary. Do you get any errors during startup?
On a Windows implementation, there has not been a decision to do so as yet, I’m not sure whether the raw packets that are sent by the various measurements would work with Windows, but to be honest, this has not been investigated. Can you explain your use case for a Windows based implementation?
Regards,
Michel
On 2 Jul 2022, at 19:23, Ernst J. Oud <ernstoud@gmail.com <mailto:ernstoud@gmail.com>> wrote:
Don’t know whether this is the place to share experiences but …
I installed a SW probe in Docker for Windows (the Alpine version). Turns out traceroute doesn’t work in that setup. So I installed VMware player on that PC (the latest version can coexist with Hyper-V which Docker requires) and CentOS and the binary of the probe. Of which the instructions on the Ripe Atlas site doesn’t work in the last step (yum complains no package atlasswprobe found in the repo). Manual download and install did work however. Hurrah.
Turns out traceroute in that setup only works with VMware in bridged network mode. Fine. But… all my bandwidth tests on that same box suddenly showed strange impacts… so either a working probe and accept incorrect bandwidth tests (using Ookla SpeedTest) or find another solution. Apparently the bridge in VMware and Windows on the same physical Ethernet port influence each other.
Tried the probe on OpenWRT on a small router. Works fine but I don’t want to dedicate this router to just probe work. Also the ATLAS service from init.d refuses to run the telnetd daemon at startup. Weird.
So installed VMware on a separate Windows 10 headless box (Minix Z83) that I had spare with CentOS and the binary. Runs fine. Finally.
Will kill the other two probes once I am convinced it now finally works.
Thanks Ripe for this excellent tool! A Windows version would help however :-)
Regards,
Ernst -- ripe-atlas mailing list ripe-atlas@ripe.net <mailto:ripe-atlas@ripe.net> https://lists.ripe.net/mailman/listinfo/ripe-atlas <https://lists.ripe.net/mailman/listinfo/ripe-atlas>
participants (1)
-
Michel Stam