Email authentication: SPF, DKIM and DMARC out in the wild

Email authentication has had a turbulent history - SMTP did not have a native form of authentication when it was designed, and all modern authentication methods are built on top of that system. This was not a problem in the 1980s because there were simply too few people emailing - the only ones using it were universities and corporations actively involved in building the internet. Since then we've got a variety of tools to attempt to verify emails, including SPF, DKIM, and DMARC, and I wanted to explore the actual usage of these authentication methods by the most popular sites and companies in the world - specifically, the top 100 domains retrieved from Alexa and the Fortune 500 companies.

Authentication methods

There are three generally accepted email authentication methods - SPF, DKIM, and DMARC. They are all set through DNS, and hinge on proof of domain ownership. If ownership of a domain is lost, the majority of email security is lost as well.

SPF

SPF, or Sender Policy Framework, is a protocol that allows the owner of a domain to specify which mail servers they'll be sending mail from. When another email provider receives an email, they perform an SPF check, which entails verifying that the sender is an allowed sender based on the rule sets of the sending domain.

The receiving server compares the IP address of the sending MTA (Message Transfer Agent) with the allowed IPs from the SPF record. The are a few results that can occur, but the most common are accept, reject, and accept but mark (soft fail).

SPF in action (credit to https://postmarkapp.com/guides/spf)

Note that an SPF failure doesn't mean that the message is immediately discarded - most of the time an SPF failure is a good signal that the mail is bad, but it's not a guarantee. Companies will rotate IPs or domains often enough that it can't be an end all be all security check.

How it works

SPF for a domain is set up through a TXT DNS record. All spf records start with v=spf1, and then are followed by a list of allowed IPs (or domains). These IPs are retrieved using various mechanisms, outlined below.

Mechanism	How it works

all	All IPs fall under this qualification (should always be last in the record)
ip4:ip	An IPv4 address, or subnet
ip6:ip	An IPv6 address, or subnet
a:domain	All the A records for the domain are tested. If the domain is not specified, the current domain is used.
mx:domain	All the A records for all the MX records for the domain are tested. If the domain is not specified, the current domain is used.
ptr:domain	The hostname for the client IP are looked up using a PTR query. At least one of the A records for a PTR hostname must match the original client IP.
exists:domain	If the domain does not resolve, the result is fail. If it does resolve, this results in a match (and pass).
include:domain	The specified domain is searched for a match

Each of these mechanisms is prefixed by a qualifier. If there is no qualifier specified, + is the default (pass).

Qualifier	Result

+	Pass (default)
-	Fail
~	SoftFail
?	Neutral

So for instance, a sample SPF record might look like:

v=spf1 ip4:35.190.247.0/24 ip4:216.239.32.0/19 ~all

This says that it should pass any IP in either the 35.190.247.0/24 or 216.239.32.0/19 range, and soft fail all others.

v=spf1 ip4:66.3.159.0/24 include:mail.zendesk.com include:_spf.google.com -all

This says that it should pass any IP in the 66.3.159.0/24 range, include all valid domains for mail.zendesk.com and _spf.google.com and fail all others.

I won't delve too deep into the SPF specification - if you'd like to know more you can read the RFC.

SPF in the wild

I wanted to see how many domains set up SPF and rely on it, and specifically how many of them have too permissive rule sets (either a +all or a ~all).

I used the checkdmarc python package to query the Top 100 domains and the domains of the Fortune 500 companies.

Top 100 Domains

For the top 100 domains, 50 of them have all set to softfail, whose intended action is accept but mark - this effectively says that any email received from a non-designated sender should be accepted but "marked" (which has no specific action associated - some providers drop the email, others send to junk, and others send to inbox).

The "(INCLUDE)" above means that the SPF record also has an include:domain in it, which might include an explicit, more restrictiveall directive. However, due to the way that SPF works, all you need to do is pass one mechanism to be considered a pass - so a softfail will still be counted as a softfail, even if there is a fail later on (due to -all).

Additionally, 4 out of the 100 actually had invalid or no SPF records.

Domain	Failure

businessinsider.com	`v=spf1 include:_spf.googlemail.com include:mail.zendesk.com include:aspmx.pardot.com -all` The domain _spf.googlemail.com does not exist
go.com	go.com has no SPF record
office.com	office.com has no SPF record
apartments.com	`v=spf1 ip4:66.231.92.15 ip4:50.28.24.60 ip4:65.222.181.0/24 ip4:204.253.48.0/24 ip4:65.210.23.128/25 ip4:65.222.180.0/24 include:salesforce.com include:s._spf.pardot.com ~all` The domain s._spf.pardot.com does not exist

Specifically, go.com and office.com have no SPF records, meaning that any email originating from an @go.com or @office.com can't be verified.

Fortune 500 Domains

For the domains of the Fortune 500 companies, 250 of them have all set to softfail.

Only 419 have valid SPF tags, although the majority of invalid tags are due to non-resolvable includes and missing SPF records.

50 of the Fortune 500 companies have no SPF record at all set for the main domain. This is slightly misleading, as some of their main domains are pure redirects (for instance, Hilton's main domain is hiltonworldwide.com, which actually redirects to hilton.com. However, SPF does not respect redirects, and as such any email from @hiltonworldwide.com will not be able to be marked as a fail.

The only one of note to have an actually malformed SPF tag is huntingtoningalls.com, a military shipbuilding company, which has an infinite redirect (the SPF entry for huntingtoningalls.com has include:huntingtoningalls.com in it).

DKIM

If SPF is just concerned with the sender of the message, DKIM, or DomainKeys Identified Mail also cares about certain parts of the actual contents of the message. The following description is courtesy of Wikipedia.

A DKIM-compliant domain administrator generates one or more pairs of asymmetric keys, then hands private keys to the signing MTA, and publishes public keys on the DNS. The DNS labels are structured as selector._domainkey.example.com, where selector identifies the key pair, and _domainkey is a fixed keyword, followed by the signing domain's name so that publication occurs under the authority of that domain's ADMD. Just before injecting a message into the SMTP transport system, the signing MTA creates a digital signature that covers selected fields of the header and the body (or just its beginning). The signature should cover substantive header fields such as From:, To:, Date:, and Subject:, and then is added to the message header itself, as a trace field. Any number of relays can receive and forward the message and at every hop, the signature can be verified by retrieving the public key from the DNS.[

Unfortunately due to the way that DKIM is implemented there is no easy way to detect if a domain is using DKIM. You have to know the selector a specific domain uses, and that exploration is outside the scope of this blog post.

DMARC

DMARC, or Domain-based Message Authentication, Reporting and Conformance, actually extends SPF and DKIM. It allows the owner of a domain to publish a policy in their DNS records to specify which mechanism (DKIM, SPF or both) is employed when sending email from that domain; how to check the From: field presented to end users; how the receiver should deal with failures - and a reporting mechanism for actions performed under those policies.

DMARC introduces a concept called alignment, which has to do with the match of the top level domain and "Organization Domain". Under DMARC a message can fail even if it passes SPF or DKIM, but fails alignment. The following description and example is courtesy of Wikipedia.

Alignment may be specified as strict or relaxed. For strict alignment, the domain names must be identical. For relaxed alignment, the top-level "Organizational Domain" must match. The Organizational Domain is found by checking a list of public DNS suffixes, and adding the next DNS label. So, for example, "a.b.c.d.example.com.au" and "example.com.au" have the same Organizational Domain, because there is a registrar that offers names in ".com.au" to customers. Albeit at the time of DMARC spec there was an IETF working group on domain boundaries, nowadays the organizational domain can only be derived from the Public Suffix List.

DMARC records are published in DNS with a subdomain label _dmarc, for example _dmarc.example.com. Compare this to SPF at example.com, and DKIM at selector._domainkey.example.com.

The content of the TXT resource record consists of name=value tags, separated by semicolons, similar to SPF and DKIM. For example:

"v=DMARC1;p=none;sp=quarantine;pct=100;rua=mailto:[email protected];"

Here, v is the version, p is the policy, sp the subdomain policy, pct is the percent of emails that the DMARC policy should be applied to, and rua is the URI to send aggregate reports to. In this example, the entity controlling the example.com DNS domain intends to monitor SPF and/or DKIM failure rates and doesn't expect emails to be sent from subdomains of example.com. Note that a subdomain can publish its own DMARC record; receivers must check it out before falling back to the organizational domain record.

Note on the pct name - the RFC defines it as:

Percentage of messages from the Domain Owner's mail stream to which the DMARC policy is to be applied. However, this MUST NOT be applied to the DMARC-generated reports, all of which must be sent and received unhindered.

The goal of pct is to allow full visibility in the reports as to what would have been done if pct=100.¹

DMARC in action (credit to https://campus.barracuda.com/product/sentinel/doc/78157593/configuring-spf-dkim-and-dmarc/)

Name types

The DMARC specification provides three options for domain owners to say how they want failed emails to be treated. These p= policies are:

Value	Result

none	Treat the mail the same as it would be without any DMARC validation
quarantine	Accept but do not send to inbox
reject	Do not accept the message at all

The only required tags are v and p. There are quite a few other optional tags, but the only ones we'll be concerned with are sp, adkim, and aspf.

The sp= specifies the intended behavior around authentication failures for subdomains - for instance, mail.example.com can fail and the DMARC entry for example.com will specify the intended behavior for that subdomain. Note that subdomains can also have their own DMARC policies (in the example above, at _dmarc.mail.example.com). The policy options are the same as the "p" tag listed above.

adkim indicates strict or relaxed DKIM identifier alignment. The default is relaxed, which means that only the root domain needs to match. aspf indicates strict or relaxed SPF identifier alignment. The default is also relaxed.

Top 100 Domains

Of the top 100, only 74 of them have DMARC policies set up.

27 of them have their policy set to none - this doesn't necessarily mean to accept the email, as it can still fail the SPF check, but it does indicate that, even on a DMARC failure, the receiver should do nothing.

Only 2 of the top 100 domains have strict adkim or aspf policies - mayoclinic.org and retailmenot.com.

Fortune 500 Domains

Of the fortune 500, only 329 of them have DMARC policies set up, or a 66% implementation rate.

74 of them have their policy set to none.

Even more amazingly, only one domain has their adkim or aspf policy set to strict! That domain is pmi.com, or Philip Morris International.

DNSSEC in the wild

Part of the data that was returned in these queries was also whether the domain was using DNSSEC. While not strictly related to email authentication, it does provide a stronger level of trust for the results of a DNS query, which is what all the email authentication above is based on.

For the Top 100 sites, only 6 of them are using DNSSEC (a 6% implementation rate). For the Fortune 500, it's even worse, at 14, or a 2.8% implementation rate.

What does this mean?

I was fairly surprised at the low adoption rates for the various security features above. If I had to guess it's because of the haphazard nature of email systems growth. When there wasn't one standardized security methodology from the beginning, each system will slowly build out its own, becoming a frankenstein monster of proprietary software and incorrectly implemented standards.

It doesn't mean that the domains above that aren't implementing SPF or DKIM are vulnerable to attack - there are still other ways to prove authentication, including more sophisticated algorithms used by the big players (looking at you Gmail and Outlook). Enterprises often have their own rulesets inside their network which will prevent phishing emails from reaching their targets based on the originating MTA regardless of any SPF failures. It does mean, however, that everyone has to "roll their own" layer of email authentication security, whether that means relying on a third party like Google or purchasing other outside software.

Email spam and phishing are still a problem due to there not being a perfect solution, and it looks like it will remain that way for quite a while.

Conclusion

The ramifications of not building authentication and security into your protocol when it's designed can be far reaching. This isn't to blame the creators of SMTP or email - no one knew how big it would get when it was originally designed. However, it's not hard to imagine a world with significantly less spam and email attacks, where the attack surface is much smaller due to a well designed, cryptographically secure authentication system. Even today, the most popular sites visited by billions of people a day are using patches and half baked solutions, and fall prey to these attacks on a daily basis. Even the official solutions listed above have pretty bad adoption rates among some of the largest sites and companies in the world.

The code and all the data can be found on GitHub in the SPF-Research repo.

Footnotes

Thanks to Kurt A. for the correction and background. ↩