A colleague asks me if it is possible to have bulk access to public measurement data. (Properly: otherwise, he could dor a for loop over all measurement IDs but I suspect it is not nice.) I find nothing in the Atlas documentation. Is it because I do not search hard enough or because this possibility does not exist? #BigData
On 2013/11/13 10:04 , Stephane Bortzmeyer wrote:
A colleague asks me if it is possible to have bulk access to public measurement data. (Properly: otherwise, he could dor a for loop over all measurement IDs but I suspect it is not nice.)
I find nothing in the Atlas documentation. Is it because I do not search hard enough or because this possibility does not exist?
No, there is no API for that. I wonder what it should look like. Assuming that one bulk data download gets 100 Mbps (just a guess, you don't want bulk data downloads to slow down the rest of the system) then downloading 1 Tbyte of data takes about a day. So for the entire Atlas data set that may easily be a week or more. Connections that last for more than a week are not nice, so I guess the client should maintain a cursor and issue multiple requests. (That is very close to just iterating over all measurements) One other issue is whether the data gets stored or not. If you are going to store data, then an API that can be used to update the stored data set make most sense. If you are not going to store the data, then running map/reduce on the server side make most sense, but I'm not sure if there exist proper sandbox environments to allow that to be done on the hadoop cluster directly. Philip
On 13.11.2013, at 11:16 , Philip Homburg <philip.homburg@ripe.net> wrote:
... I wonder what it should look like. ...
I would be happy to get all measurements for a 24-hour period at a time for measurements of the previously finished UTC day or earlier. This might be a good way to start and it would allow people to test their methods. We could add the most recent measurements later. In order to discourage frivolous downloads we could attach a price to it in credits. Daniel
On Wed 13 Nov 2013 10:04:56 CET, Stephane Bortzmeyer wrote:
A colleague asks me if it is possible to have bulk access to public measurement data. (Properly: otherwise, he could dor a for loop over all measurement IDs but I suspect it is not nice.)
I think we'd need more information regarding what is meant by "bulk access". At present, all measurement *result* data is only available by way of explicit measurement id, but the *metadata* is available as a list: The results have to be fetched one measurement at a time: /api/v1/measurement/MSM_ID/result/ But you can get basic information about every measurement too: /api/v1/measurement/?is_public=1&limit=100 That call will give you a list of the first 100 of all public measurements and their metadata. You can tweak the limit (be kind) and/or loop over some limit/offset values to get them all. This won't give you all of the *result* data, but it may be what you're looking for.
participants (4)
-
Daniel Karrenberg
-
Daniel Quinn
-
Philip Homburg
-
Stephane Bortzmeyer