Hi Stephane, On 2015/05/15 6:50 , Stephane Bortzmeyer wrote:
The DNS, as we know, is a jungle. People put anything in their TXT records, even control characters.
Atlas faithfully returns this content in the JSON files (see measurement #2004707). Then they are no longer legal JSON (RFC 7159, section 7) and crash the JSON libraries.
Who should fix it? My first guess is that Atlas should not produce illegal JSON files and therefore should escape such texts.
We already noticed various issues in this area. What goes wrong is the following. The probes avoid sending binary data by converting characters outside the printable ASCII range the an escape sequence using the '\u' notation. So for example for a control-Z the probe sends '\u001A'. The problem with this approach is that this escape sequence is actually defined by the JSON specs. So some of our backend code recognizes the escape sequence and converts it back to the control character. After that, there is a bug somewhere. And escaped characters sometimes end up in result downloads. After internal discussions, the best way forward seems to be to change the escape sequence. This way we don't have to rely on JSON parsers to leave the escape sequence unchanged. One option is to double the backslash and have probes send '\\u001A'. But maybe anybody has a better idea. Philip