Hi Wilfried, some more details. Each probe have two tasks: measure and keep connection to the controllers. These tasks are some what independent, and the idea is to make the probe more autonomous. The connection to the controller controls the probe and exchange results. It is a TCP session. The experience shows a session stay up from several minutes to about a month. I think the longest we have seen so far is 5 weeks. My guess is on average it is several hours to days. Even when the session drops the measurements continues. The results will be overwritten after a while. However, when a connection is established the probe will send the old results. After a connection reset the probe will try to reconnect. From a experience typical reconnect time are between 1-10 minutes. So, in short disconnects are not bad if probe reconnect again immediately. For example there some probes, in DSL networks in Germany, reconnects daily, when the modem get a different IP address. These re-connects are like a clockwork. Also, I hope, with experience we can tune the tcp keep-alive and timeout parameters to make the sessions more resilient intermittent changes. One more detail if the probe run out resources, such as contiguous RAM (memory fragmentation), it will reboot itself. Reboot is quick, takes some seconds. regards, -antony On Fri, Mar 04, 2011 at 03:29:22PM +0100, Robert Kisteleki wrote:
Hi,
On 2011.03.04. 14:26, Wilfried Woeber, UniVie/ACOnet wrote:
[my apologies to the Team for the duplicate, I 1st picked the wrong address]
Q to the Team and to the Probe Hosts:
My feeling is that the probes do register too many "disconnects". I just had a look at the two under my control and the numbers of transitions are -
466: 25 events between 2010-12-21 07:14:30 UTC and 2011-03-01 10:24:11 UTC 414: 25 events between 2010-11-22 12:31:02 UTC and 2011-02-20 20:18:24 UTC
While there may be a local reason for some of the down-periods, I doubt that I had so many outages on my ends. Is anyone else seeing a similar pattern?
Disconnects and reconnects are expected; the probes are somewhat sensitive in this regard. Most of these downtimes should be less than 5, at most 10 minutes. This is normal, don't worry about them.
Regarding presentation, again, could we please have an additional item of "Total Downtime" (next to the "Total Uptime"), and the duration of the downs in the 25 lines with the time stamps? This would make it much more straight- forward to spot patterns....
Yes, this is in fact already working in our test environment! ETA for public rollout is next week.
Minor cosmetics, relevant when reporting stuff, could the probe's number be displayed somewhere in the Probe Conf panel (e.g. in the line with the Firmware Version)?
Sure.
Cheers, Robert
Thanks for your consideration, have a nice weekend Wilfried