I was one of the people that urged caution with regard to HTTP probes. There's some ambiguity that needs to be resolved first, and then some security risks to consider.
The basic question is "what is being measured"? There are several possible answers:
1. Transport-layer connectivity
2. HTTP header information
3. HTTP body information
Each of these has some risks that would beed to be controlled.
(1) is obviously the safest; that's part of what the SSL cert measurement does. But (1) is not really an HTTP measurement, it's a TCP measurement, and it would be better to cast it that way.
(2) could probably be implemented pretty safely by sending a HEAD request. However, there's still a risk that private user information would leak in such requests. For example, if a web site is doing IP-address based access control, and the probe is behind the same NAT as a user's laptop, then even a HEAD request might return user information (e.g., session cookies).
(3) is a huge security risk, because of the wide variety of things that are done with HTTP requests. For simplicity, let's assume the probe would send a GET request, and not anything more sophisticated (POST, PUT, DELETE, etc.). You could use a GET request to download a file, but you can also a GET request to do things to supply responses to HTTP forms. Want to make sure your favorite band wins the EuroVision Song Contest? Just task the Atlas network have 1000 probes vote for them every 5 minutes. There's also the question of what you do with the downloaded content. Returning it to the measurement owner would raise huge security issues, not to mention bandwidth issues. But if you don't return it, then the system will need to constrain the questions the experimenter can ask, e.g., "How many bytes were received?" "What was the SHA-1 digest of the file?".
--Richard