Hi all, I noticed that one of my DNS UDMs gave back a negative RTT: {"af":4,"dst_addr":"199.19.57.1","from":"171.98.64.220","fw":4680,"group_id":1957244,"lts":39041,"msm_id":1957244,"msm_name":"Tdig","prb_id":22726,"proto":"UDP","result":{"ANCOUNT":1,"ARCOUNT":1,"ID":8401,"NSCOUNT":0,"QDCOUNT":1,"abuf":"INGEAAABAAEAAAABA29yZwAABgABwAwABgABAAADhAAzAmEwA29yZwthZmlsaWFzLW5zdARpbmZvAANub2PAKHfkW2sAAAcIAAADhAAJOoAAAVGAAAApEAAAAAAAACYAAwAibnMwMDBiLmFwcDI3LmlhZDEuYWZpbGlhcy1uc3QuaW5mbw==","answers":[{"MNAME":"a0.org.afilias-nst.info.","NAME":"org.","RNAME":"noc.afilias-nst.info.","SERIAL":2011454315,"TTL":900,"TYPE":"SOA"}],"rt":-14281.576,"size":133},"src_addr":"192.168.1.51","timestamp":1428948041,"type":"dns"} -14281.576ms was standing out like a sore thumb on the graph that I create out of the results from these UDMs. This is quite interesting, maybe messages from the future? :) Anyone else ran into one of these? ~paul
Hi Paul, On 2015/04/13 23:58 , Paul Vlaar wrote:
I noticed that one of my DNS UDMs gave back a negative RTT:
{"af":4,"dst_addr":"199.19.57.1","from":"171.98.64.220","fw":4680,"group_id":1957244,"lts":39041,"msm_id":1957244,"msm_name":"Tdig","prb_id":22726,"proto":"UDP","result":{"ANCOUNT":1,"ARCOUNT":1,"ID":8401,"NSCOUNT":0,"QDCOUNT":1,"abuf":"INGEAAABAAEAAAABA29yZwAABgABwAwABgABAAADhAAzAmEwA29yZwthZmlsaWFzLW5zdARpbmZvAANub2PAKHfkW2sAAAcIAAADhAAJOoAAAVGAAAApEAAAAAAAACYAAwAibnMwMDBiLmFwcDI3LmlhZDEuYWZpbGlhcy1uc3QuaW5mbw==","answers":[{"MNAME":"a0.org.afilias-nst.info.","NAME":"org.","RNAME":"noc.afilias-nst.info.","SERIAL":2011454315,"TTL":900,"TYPE":"SOA"}],"rt":-14281.576,"size":133},"src_addr":"192.168.1.51","timestamp":1428948041,"type":"dns"}
-14281.576ms was standing out like a sore thumb on the graph that I create out of the results from these UDMs.
This can be fixed by switching the measurement code to one of the alternative time sources in the Linux kernel that is not subject to these jumps. However that requires touching all time related code in the measurement code. Philip
On 14/4/15 10:58 AM, Philip Homburg wrote:
This can be fixed by switching the measurement code to one of the alternative time sources in the Linux kernel that is not subject to these jumps. However that requires touching all time related code in the measurement code.
Is there an established way to work around this on the receiving end? ~paul
On 2015-04-14 11:05, Paul Vlaar wrote:
On 14/4/15 10:58 AM, Philip Homburg wrote:
This can be fixed by switching the measurement code to one of the alternative time sources in the Linux kernel that is not subject to these jumps. However that requires touching all time related code in the measurement code.
Is there an established way to work around this on the receiving end?
~paul
RIPE Atlas is way beyond the point where you can trust *all* results blindly. Even if the measurement code and time keeping is 100% correct, there are network glitches, funny middle boxes, power brownouts, solar flares, bit flips, packet eating midgets, etc. When interpreting results, we strongly encourage users to filter out outliers (e.g. top/bottom X percentile). Regards, Robert
participants (3)
-
Paul Vlaar
-
Philip Homburg
-
Robert Kisteleki