Introducing: the new result parsing library (Sagan)
Hi everyone, in advance of the RIPE Meeting, I thought I'd let you all know about the new parsing library we've developed for use against measurement results. This was mentioned in a recent RIPE Labs article[1] but I thought it worth detailing here for those of you who aren't avid Labs followers. The idea is that you run every result you get through this handy Python library and out comes programmer-friendly Python objects. It supports the various changes in result format over time, and even does some of the heavy lifting for you like calculating ping median rtt, parsing the DNS abuf, or counting the traceroute hops. The project code is published on GitHub[2], and pull requests are welcome. Installation can be done via Pypi[3], and documentation is included in the module and available online[4] as well If you have any questions feel free to post to the list and one of us will see what we can do for you. [1] RIPE Labs article: https://labs.ripe.net/Members/suzanne_taylor_muzzin/ripe-atlas-latest-result... [2] Project: https://github.com/RIPE-NCC/ripe.atlas.sagan [3] Pypi: https://pypi.python.org/pypi/ripe.atlas.sagan/ [4] Documentation: https://atlas.ripe.net/docs/sagan/
On Mon, May 12, 2014 at 12:10:01PM +0200, dquinn <dquinn@ripe.net> wrote a message of 28 lines which said:
The idea is that you run every result you get through this handy Python library and out comes programmer-friendly Python objects. It supports the various changes in result format over time, and even does some of the heavy lifting for you like calculating ping median rtt, parsing the DNS abuf, or counting the traceroute hops.
Unfortunately, it seems it have been made for brains brighter than mine. I tried the following script: import sys from ripe.atlas.sagan import PingResult if len(sys.argv) <= 1: raise Exception("Usage: %s filename ..." % sys.argv[0]) for filename in sys.argv[1:]: results = open(filename).read() result = PingResult(results) print filename print result.rtt_median print "" And I give it the result of a measurement (#1666654 if you want to check): Traceback (most recent call last): File "sagan-ping.py", line 19, in <module> result = PingResult(results) File "/usr/local/lib/python2.7/dist-packages/ripe.atlas.sagan-0.1.14-py2.7.egg/ripe/atlas/sagan/ping.py", line 54, in __init__ Result.__init__(self, data, **kwargs) File "/usr/local/lib/python2.7/dist-packages/ripe.atlas.sagan-0.1.14-py2.7.egg/ripe/atlas/sagan/base.py", line 118, in __init__ "measurement: {raw_data}".format(raw_data=self.raw_data)) ripe.atlas.sagan.base.ResultParseError: This does not look like a RIPE Atlas measurement: [{u'af': 6, u'prb_id': 10031, u'result': [{u'rtt': 56.194}, {u'rtt': 54.074}, {u'rtt': 54.47}], u'ttl': 54, u'avg': 54.9126666667, u'size': 48, u'from': u'2a01:1e8:e13f:0:a2f3:c1ff:fec4:5fab', u'proto': u'ICMP', u'timestamp': 1401114806, u'dup': 0, u'type': u'ping', u'sent': 3, u'msm_id': 1666654, u'fw': 4610, u'max': 56.194, u'step': None, u'src_addr': u'2a01:1e8:e13f:0:a2f3:c1ff:fec4:5fab', u'rcvd': 3, u'msm_name': u'Ping', u'lts': 16, u'dst_name': u'2001:4b98:dc0:41:216:3eff:fece:1902', u'min': 54.074, u'group_id': 1666654, u'dst_addr': u'2001:4b98:dc0 [...] The JSON file (attached) does seem correct. I also tried with pre-parsing: for filename in sys.argv[1:]: results = json.loads(open(filename).read()) result = PingResult(results) print filename and got the same result.
On 26/05/14 16:50, Stephane Bortzmeyer wrote: import sys from ripe.atlas.sagan import PingResult if len(sys.argv) <= 1: raise Exception("Usage: %s filename ..." % sys.argv[0]) for filename in sys.argv[1:]: results = open(filename).read() result = PingResult(results) print filename print result.rtt_median print "" The thing to remember is that this is a /result/ parser, not a /result*s*/ parser. In other words, you have to pass each result individually into Sagan for parsing and it will in turn return a Python object for that single result. Your code here takes /the entire file/, multiple results in a JSON list, and dumps it into Sagan, which explodes because you're passing multiple results. Generally, this is bad practise since it's entirely possible that your one file could be bigger than a few GB, which would definitely crash your system if you tried to load the entire file into memory. Instead, I suggest the following: |for filename in sys.argv[1:]: with open(filename) as my_results: for result in json.loads(my_results): result = Result.get(result) print(result.rtt_median) | or skip the manual JSON parsing by using the fragmented file format (|?format=txt|): | for filename in sys.argv[1:]: with open(filename) as my_results: for result in my_results.readlines(): result = Result.get(result) print(result.rtt_median) | The key is to remember that you need to break up that Great Big File into separate result blobs, either by looping over a fragmented file's lines, or by parsing the whole JSON file into memory first and parsing out each result. That step is up to you. Sagan only takes over once you've got a single result to work with.
On Mon, May 26, 2014 at 05:12:59PM +0200, Daniel Quinn <daniel.quinn@ripe.net> wrote a message of 141 lines which said:
The key is to remember that you need to break up that Great Big File into separate result blobs, either by looping over a fragmented file's lines, or by parsing the whole JSON file into memory first and parsing out each result.
OK, thanks, it works now.
participants (3)
-
Daniel Quinn
-
dquinn
-
Stephane Bortzmeyer