Re: [atlas] RIPE Atlas SMTP Measurement

11 Nov 2022

      Hi Simon,

This seems to have gotten a bit idle since RIPE85. Let me give a bit of an update:

Adding TLSv1.3 is gonna be tricky since the SSL measurement implements the first stages of the TLS handshake only. This means adding that complexity to the code, which as Niall commented to me is not trivial. Note: the SSL measurement currently does not use the OpenSSL library.
Other than retrieving the certificate from the peer, no other validation happens in the current SSL measurement. This includes not validating the certificate chain, which may be a self-signed certificate.
Adding STARTTLS the way OpenSSL has done it involves issuing the appropriate command after receiving certain output from the remote end, then starting the TLS handshake. This should be doable.

Hope this helps, have a nice weekend!

Regards,

Michel
...
On 20 Oct 2022, at 17:30, Michel Stam <mstam@ripe.net> wrote:
Hello Simon,
I’ll first have a look at OpenSSL to gauge the amount of effort required. I’ll also look at the existing SSL measurement to see what that offers. That will likely provide the best path forward. Lastly, I’ll have an internal discussion on measuring SMTP/STARTTLS/etc.
Ripe 85 is up next week, I’ll be attending there, so my response may be delayed somewhat.
Please bear with me.
Regards,
Michel
...
On 16 Oct 2022, at 03:37, ripe.net@toppas.net <mailto:ripe.net@toppas.net> wrote:
Hi Michael,
...
Both netcat and openssl wait for the 220 to continue. If a timeout would occur during the STARTTLS phase, or before the 220, would this differ from a conclusion perspective? In other words, is it necessary to test once for the 220 to appear (or a timeout), then another time to see if the STARTTLS completes?
If you have a timeout while waiting for the initial 220 response (service ready), the service is not available. Maybe the SMTP daemon is not running or not answering for some reason, or there's a network issue. If a timeout occurs later during the STARTTLS phase, the server is available and also the underlying network connection seems to be fine. So yes, the conclusion would be a different. But we still don't have to run two separate tests, i think.
...
Could this be folded into a single test that does a 220, then the STARTTLS and will report error when there’s a fail in the process?
Yes. Since we do not really want to send an e-mail, we can probably use OpenSSL for everything in a single run. If you use the -debug parameter, you'll get *very* detailed output which contains all informations we want (except for response-times, i think). There might be a more elegant way to get a better-looking output from OpenSSL. But I don't know how, to be honest. I haven't read the whole man-page :)
...
$ openssl s_client -starttls smtp -connect mahimahi.ripe.net:25 <http://mahimahi.ripe.net:25/> -debug
Most work is probably to study the OpenSSL documentation, to find out the different error messages we have to expect, depending on the problems we might face:
- TCP handshake not successfull
- Server does not reply with 220 (timeout)
- Server does not reply with 220 (server replies with another 4xx or 5xx code)
- Server is not ESMTP capable**
- Connection successfull, but the server does not offer 250-STARTTLS (not supported or suppressed because of MITM attack)
- 250-STARTTLS was offered, but establishing encryption was still not successful for some reason
and maybe other typical certificate problems like:
- certificate invalid (self-signed)
- certificate invalid (expired)
- certificate invalid (broken chain)
**
SMTP Encryption is optional, but it is not supported by the original SMTP protocoll (RFC 821). To use STARTTLS, the mailserver has to support SMTP "Service Extensions" aka. ESMTP. ESMTP has been introduced 13 years later by RFC 1869. From a communication perspective, the main difference is, that the initiator of the SMTP connection (client) is using EHLO instead of HELO (EHLO = Enhanced HELO). If the server does support ESMTP, it will tell the client it's features. If the server does not support ESMTP, it will reply with an error. I don't know what the OpenSSL output looks like, if you try to connect to a server which does not support ESMTP. It will probably output some error message. This error should be evaluated by the Atlas SMTP measurement too. 99.9% of all mailservers nowadays should support ESMTP, but there might be some usecases... "special application blabla". It could be possible that someone would start a Atlas SMTP measurement for a non-ESMTP compliant target. That's why i am mention this.
BR,
Simon
On 05.10.22 17:55, Michel Stam wrote:
...
Hi Simon,
Thanks for the rundown, that helps.
The Atlas measurement code uses something different than nc, but that isn’t really relevant, the process is roughly the same.
I have a question, though. Both netcat and openssl wait for the 220 to continue. If a timeout would occur during the STARTTLS phase, or before the 220, would this differ from a conclusion perspective? In other words, is it necessary to test once for the 220 to appear (or a timeout), then another time to see if the STARTTLS completes? Could this be folded into a single test that does a 220, then the STARTTLS and will report error when there’s a fail in the process?
As to the TCP traceroute, this is something used by people to measure service availability using Atlas. It isn’t something I came up with perse, but yes its to measure response time as well as availability of the service at the TCP level.
With regard to additional check such as DANE, these lie somewhere between DNSSEC and TLS measurement. I’ll make a note of this, thanks for the suggestion.
As to measurements in general, all currently support IPv6 to my knowledge. I agree that new ones should support this too.
Regards,
Michel
...
On 1 Oct 2022, at 17:11, ripe.net@toppas.net <mailto:ripe.net@toppas.net> wrote:
Hi Michel,
...
That would would indeed mean a combination of TCP and SSL measurement to achieve all 3 required functions. Is it problematic if the result comes from multiple steps? If so, can you explain how?
The intent of the measurement would be to validate whether an SMTP server is:
reachable
responsive
capable of secured transmissions
First, let's define the testmethod. In my opinion:
- reachable
3-way TCP Handshake with target on tcp/25 successful?
- responsive
when establishing and SMTP connection, does the smtp-server signalize readiness of the service (SMTP 220)?
- capable of secured transmissions
when sending an EHLO, the server will tell us his features. 250-STARTTLS should be there.
For all three checks, it's the easiest to use netcat.
Reachability:
...
$ nc -vz mahimahi.ripe.net <http://mahimahi.ripe.net/> 25
mahimahi.ripe.net <http://mahimahi.ripe.net/> [193.0.19.114] 25 (smtp) open
Note, that we have not measured the response time. That's why you wanted to use TCP Traceroute, right? We can also go with TCP Traceroute here.
Responsiveness (wait for 220):
...
$ nc -C mahimahi.ripe.net <http://mahimahi.ripe.net/> 25
220 mahimahi.ripe.net <http://mahimahi.ripe.net/> ESMTP Sat, 01 Oct 2022 15:25:22 +0200
quit
221 mahimahi.ripe.net <http://mahimahi.ripe.net/> closing connection
You might want to use the -w option here, to specify a timeout.
capable of secured transmissions (send EHLO and check response):
...
$ nc -C mahimahi.ripe.net <http://mahimahi.ripe.net/> 25
220 mahimahi.ripe.net <http://mahimahi.ripe.net/> ESMTP Sat, 01 Oct 2022 15:54:04 +0200
EHLO p123456.probes.atlas.ripe.net <http://p123456.probes.atlas.ripe.net/>
250-mahimahi.ripe.net <http://250-mahimahi.ripe.net/> Hello p123456.probes.atlas.ripe.net <http://p123456.probes.atlas.ripe.net/> [123.123.123.123]
250-SIZE 52428800
250-8BITMIME
250-ETRN
250-PIPELINING
250-PIPE_CONNECT
250-STARTTLS
250 HELP
To check the Certificate validity and if encryption is indeed successful, we can use OpenSSL:
...
$ openssl s_client -starttls smtp -connect mahimahi.ripe.net:25 <http://mahimahi.ripe.net:25/>
(output to long, i stripped it)
...
You’re correct, the current SSL measurement does not support any form of STARTTLS, this is something that would have to be considered for implementation. I assume, much like with SMTP, similar cases could be made for IMAP4/POP3 or XMPP.
Yeah, as far as i know, STARTTLS is also used for imap, pop3, xmpp and ftp (ftps, not sftp).
As i said before, there are additional e-mail security features that we could check. There's MTA-STS, where we would have to perform a combination of HTTP and SSL check. Also, there is DANE, where we would perform a combination of DNS and SSL check (including DNSSEC). But DANE can be used for other protocols as well, not only SMTP. DNSSEC/DANE are perhaps worth a separate check type.
Last but no least, we should check for Forward-confirmed reverse DNS and matching SMTP banner, which is a combination of DNS and netcat check. This would be a reasonable part of every smtp measurement.
Please note, that the creator of the measurement should either specify the exact mailserver FQDN, or the target Domain. In the latter case, an MX record lookup has to be performed before the measurement starts, not while the measurement is running. Otherwise it could cause credit consumption trouble, if suddenly multiple mx records are added the domain, while the measurement is running.
Oh, and please make the SMTP measurement IPv6 capable :)
BR,
Simon
On 29.09.22 11:50, Michel Stam wrote:
...
Hi Simon,
...
>Can we achieve the first 2 items of this measurement by doing a TCP traceroute on port 25?
I would say no. Using TCP Traceroute, you can may check for reachability/responsiveness of the host, but not the actual service (smtp).
That would would indeed mean a combination of TCP and SSL measurement to achieve all 3 required functions. Is it problematic if the result comes from multiple steps? If so, can you explain how?
I just noticed that the SSL measurement offers a time to connect, response time, certificates as well as SSL alerts which may be leveraged, see here: https://atlas.ripe.net/docs/apis/result-format/#version-4610 <https://atlas.ripe.net/docs/apis/result-format/#version-4610>, under "Version 4610 TLS (SSL) GET Cert”. TCP traceroute may not be necessary in this case, although I understand it is typically used to determine service availability.
...
>Does the SSL measurement cover the intended use cases?
I would say no. Correct me if am am wrong. Usually (for example HTTPS or LDAPS) the SSL/TLS encryption starts right after the TCP 3-way Handshake was successfull. For SMTP, that doesn't work. That's because regular SMTP communication starts first, so both sides can negotiate if SSL/TLS encryption is possible (via Enhanced SMTP Status Codes). However, as far as i know, OpenSSL does support SMTP and STARTTLS. So you could probably modify the existing SSL measurement.
Keep in mind that there's also MTA-STS and DANE, which are really enhancing SMTPs security. A dedicated SMTP measurement would be a good thing.
You’re correct, the current SSL measurement does not support any form of STARTTLS, this is something that would have to be considered for implementation. I assume, much like with SMTP, similar cases could be made for IMAP4/POP3 or XMPP.
I would like to understand if there are particulars you are looking for that need to be considered outside of STARTTLS support?
Regards,
Michel
...
On 23 Sep 2022, at 17:08, ripe.net@toppas.net <mailto:ripe.net@toppas.net> wrote:
Hi Michel,
>Are we monitoring the Internet or monitoring a service using the proposed SMTP measurement?
First of all, we are monitoring the service of a specific target. Same as http or ntp measurements, just another protocol. But we also monitor the Internet. Using an SMTP measurement, we could identify censorship or discover Man-in-the-middle attacks (downgrade attack by suppressing the STARTTLS command).
>Can we achieve the first 2 items of this measurement by doing a TCP traceroute on port 25?
I would say no. Using TCP Traceroute, you can may check for reachability/responsiveness of the host, but not the actual service (smtp).
>Does the SSL measurement cover the intended use cases?
I would say no. Correct me if am am wrong. Usually (for example HTTPS or LDAPS) the SSL/TLS encryption starts right after the TCP 3-way Handshake was successfull. For SMTP, that doesn't work. That's because regular SMTP communication starts first, so both sides can negotiate if SSL/TLS encryption is possible (via Enhanced SMTP Status Codes). However, as far as i know, OpenSSL does support SMTP and STARTTLS. So you could probably modify the existing SSL measurement.
Keep in mind that there's also MTA-STS and DANE, which are really enhancing SMTPs security. A dedicated SMTP measurement would be a good thing.
BR,
Simon
On 23.09.22 16:04, Michel Stam wrote:
> Hi everyone,
> 
> Great that this request sparked a good discussion on the merits of a measurement, as well as its potential impact on servers around the world. Good to see this!
> 
> So I’m going to do a quick recap here, hoping that I capture the intent and the concerns correctly. Please correct me if I err.
> 
> The intent of the measurement would be to validate whether an SMTP server is:
> reachable
> responsive
> capable of secured transmissions
> 
> The concern is that such a check would trigger one of a variety of anti spam measures in place around the world, and/or cause undue traffic to SMTP server operators.
> 
> With this in mind, I am wondering: 
> Are we monitoring the Internet or monitoring a service using the proposed SMTP measurement? 
> Can we achieve the first 2 items of this measurement by doing a TCP traceroute on port 25?
> Does the SSL measurement cover the intended use cases?
>  Is it worth exploring STARTTLS support as an extension and what would the implications be?
> 
> Have a good weekend!
> 
> Best regards,
> 
> Michel
> 
>> On 21 Sep 2022, at 00:11, Avamander <avamander@gmail.com <mailto:avamander@gmail.com>> wrote:
>> 
>> > Making arguments based upon extreme cases, assumptions, or potential-for-collateral-damage is not scientific. "I know one that even [...]" Anecdotal  evidence isn't scientific.
>> 
>> From the perspective of your previous sentences that's kinda humorous. "To avoid unnecessary costs incurred from disruption of service, excessive traffic, annoyances using up *my* time, and countless other reasonable rationale from *my* point of view." Because sure, a few (hypothetical RIPE probe) connections are exactly that, zero exaggeration, right?
>> 
>> In the end such fail2ban-fueled (or similar) behaviour I initially addressed, is exactly a non-scientific extreme-case assumption-based approach. There are better and even more standard ways. 
>> 
>> Crutch solutions out in the wild shouldn't be a showstopper for measuring the ecosystem. (That is already quite neglected)
>> 
>> > What _objective_ risk/benefit analysis are you basing your opinions upon?
>> 
>> And you? What's the implication here about systems being as trigger-happy as previously described?
>> 
>> Because sure, at some point rate limits make total sense, but certainly not at the point where it would ban any potential RIPE probes.
>> 
>> >  Are you a systems administrator?
>> 
>> Let's not get into such measuring contests, even if it is the RIPE Atlas mailing list.
>> 
>> On Tue, Sep 20, 2022 at 11:42 PM Paul Theodoropoulos via ripe-atlas <ripe-atlas@ripe.net <mailto:ripe-atlas@ripe.net>> wrote:
>> On 9/20/2022 10:45 AM, Avamander wrote:
>>> Great to hear it works for you, but the potential unfortunate collateral from such a blanket action is not really RIPE Atlas' problem. There are more fine-grained methods against bruteforce attempts and open relay probes, than triggering on a few connections.
>> What _objective_ risk/benefit analysis are you basing your opinions upon? Are you a systems administrator? My responsibility is to avoid unnecessary costs incurred from disruption of service, excessive traffic, annoyances using up *my* time, and countless other reasonable rationale from *my* point of view.  
>> 
>> You suggest that it is "not really RIPE Atlas' problem". That's very true. And it is not really my problem if I bounce yoinky, pointless probes of my server, and ruthlessly block them from contacting my server ever again. My server, my choice, my wallet, nobody's business but my own.
>>> Some webmasters ban IP's for simply visiting a domain, I know one that even dispatches an email to your ISP's abuse@ address upon visit. Should RIPE Atlas probes then not probe any HTTP servers? The answer is obviously no, they shouldn't care.
>> Making arguments based upon extreme cases, assumptions, or potential-for-collateral-damage is not scientific. "I know one that even [...]" Anecdotal  evidence isn't scientific.
>> 
>> Note, I run a probe myself. I don't block any RIPE Atlas traffic on my separate servers hosted on AWS, Oracle, and GCE. 
>> 
>> -- 
>> Paul Theodoropoulos
>> anastrophe.com <https://www.anastrophe.com/>-- 
>> ripe-atlas mailing list
>> ripe-atlas@ripe.net <mailto:ripe-atlas@ripe.net>
>> https://lists.ripe.net/mailman/listinfo/ripe-atlas <https://lists.ripe.net/mailman/listinfo/ripe-atlas>
>> -- 
>> ripe-atlas mailing list
>> ripe-atlas@ripe.net <mailto:ripe-atlas@ripe.net>
>> https://lists.ripe.net/mailman/listinfo/ripe-atlas <https://lists.ripe.net/mailman/listinfo/ripe-atlas>
> 
>
-- 
ripe-atlas mailing list
ripe-atlas@ripe.net
https://lists.ripe.net/mailman/listinfo/ripe-atlas