Ten Days of Email Hell
Mik talks about DKIM, SPF, DMARC and other email geek stuff
This is a very long article, meant mostly to be a note to other email admins out there, but also to you, my email customers, so you can know what has been happening, and that I take this issue very seriously -- like a heart attack, serious. We have spent ten days working almost exclusively on this issue.
Please take the time to read this article so you may understand how complicated, and difficult it is to serve email these days. You should know that some of Montague WebWorks' competitors in Franklin County, and many global web hosting companies like Wix, Weebly and Square Space, do not offer email hosting. They just send all their customers to Gmail. And that's really too bad. MWW will never do that. We've spent too much time, and way too much money, to give up like that.
How It Began
Three weeks ago on a Wednesday, a couple of my clients contacted me to say that their email wasn't working very well. Some of their customers were not receiving important (or any) emails. So I started looking into it.
The set up and maintenance of a basic email server is easy enough. Just as easy as installing and using any piece of software. But email is not a "basic" service. You're not just running something locally -- you're running software that interacts directly with thousands of other similar servers out on the Internet, all who now have a deep distrust of your server due to the persistant evolution of spammers and their software.
It would be easy enough if we only had to worry about direct spamming, where all the emails came from one or two email servers. All we'd have to do to block those servers would be to refuse to receive any emails from them, and voila, no more spam.
But no. There are now "spam bots", which are small, hidden, rouge installations on unsuspecting people's computers, mining the address books they find there and sending emails "from" the host computer to every address they found in the address book.
There are MILLIONS of these infected machines, all sending variations on a particular spam message, which makes it harder to track as the text isn't always the same from message to message.
So, no wonder email server admins (like me) don't trust any email servers out there. In recent years we've ramped up our vigilance and put up simple walls by subscribing to RBLs, or Realtime Blackhole Lists. This helps keep away most spam. We also checked some other things, like does the IP address of the sending server have an rDNS entry, which is a Reverse Domain Name Server entry. If the IP number is registered, it's less likely to be a spammer.
A New World Order -- An Arms Race
Yeah, but that just doesn't cut it anymore. Spam bots now know not to deluge a server with millions of emails per hour... they trickle them out at a couple hundred per hour -- from millions of different machines who start and stop at different times, all trying to disguise themselves as legitimate email senders.
Case in point: our own server was almost compromised by a user who had some bad software on their machine, and who was sending (or trying to send) 250 spam emails PER MINUTE. Their attempts were blocked pretty quickly, and the offending spam emails were trapped and stored in a holding folder. Of course this wasn't discovered until the "lost week" which, after finding and counting, came to 950,000 emails. Yes, nearly a million spam emails blocked from sending through our servers since January 17th of this year. Ugh!
So, it's an arms race. Spammers come up with a new trick, and email server admins figure out how to deal with it, and add extra security features to preemptively block the next wave. Then it's the spammers turn.
The Lost Week
Being a server admin for over a decade now, I've had my basic understanding of what needed to be done, and then I simply let the mail server do its thing and focused on my website business, which is what I'm in the game for.
Well, the game changed a couple months ago when apparently a new wave of spam swept across the Internet. Gmail, Comcast, Yahoo, AOL, Hotmail, all of them, suddenly ramped up their security requirements for email, and those admins who hadn't upgraded their bona fides in a while took it on the chin.
But, of course, there's no manual for all the things you need to do. Things like DKIM, SPF, DMARC, etc. It takes patience to learn about some of these things, learn how to implement them, test them to see that they're working, and adjust them or work around them when there's a bug or undocumented "feature". And even testing is difficult and time consuming, because some changes take up to an hour to propagate across the Internet, and email, as you know, isn't instantaneous. Sure, it's quick, but it's not instant. So you make a change, sit back and wait for up to ten minutes, then test the change. If it's not right, do it again.
Thus, after beginning work on Montague WebWork's mail server issue back around Wednesday, February 26th, I dropped all business -- stopped developing websites or features in the RocketFusion code -- and devoted the entire following two weeks, all 40 hours of each, to figuring out why email was delivered to some addresses but not others, and if the email did work from a customer's email program, why it wouldn't work from the Contact Us web form for that customer. It was maddening.
The Technical Stuff -- Stay With Me...
Montague WebWorks serves email for 95 customers, which means that any change we decide needs to be made, has to be done 95 times. Some are easy to do and can be batched or scripted, and the changes are almost immediate and positive. Some take longer, and others longer still.
Here's the short list of things we did to the email server, and our website code for when emails are sent through the Contact Us form, or when you do a bulk mailing through the Send Email page.
Stay with me, it's a little technical, but I'll go easy on you. It's good to know how this stuff works.
Host Headers, and some simple stuff
The first thing I noticed, when I started my initial testing, was that the "Host Header" that my mail server presents itself as when sending out an email, did not match what the rDNS resolved as when a mail server did an rDNS check on my IP number. This was simple to fix, and had an immediate effect for some email service receivers.
Other obvious things were to ensure I was not an "open relay" which means anyone out there could use my mail server to send email. Montague WebWorks has never been an open relayer, so this was an easy pass.
SPF, or Sender Policy Framework
This is a line of code (a "record") that I put into your domain name's "zone file" in my DNS (Domain Name Server) to help the other mail servers out there figure out whether the email they are receiving from me actually came from my email server. For example, the email you are reading now from me... did it actually come from my email server, or did someone else send it pretending it came from my mail server.
An SPF record looks like this:
@ TXT ( "v=spf1 ip4:22.214.171.124 ip4:126.96.36.199 include:comcast.net -all" )
The "ip4:" parts mean email from your domain can only originate at the IP addresses 188.8.131.52 (the personal email server, for Outlook, Eudora, Thunderbird, etc) and 184.108.40.206 (the bulk server, for all emails that originate from any website Montague WebWorks hosts).
The "include:" part means that some emails may originate from comcast.net's server, which would include any customer who is required to use Comcast's email server for sending.
The part that says "-all" at the end means that if the email purports to come from this domain, yet didn't originate from either of those IP numbers or Comcast's mail server, consider it spam and bounce it. Do not deliver. Other options are "~all", which means it might be spam, so look closer, or "+all", which means accept the email anyway.
For years we have had SPF records in all zone files for any domain we host email, but apparently we had a typo in the second IP declaration, so for mail servers out there that looked at SPF records to help determine if email is spam or not, our SPF records had errors in them, and so it actually hurt our reputation.
The problem is there aren't many tools out there these days to test whether the SPF records are properly formatted. I didn't find any until last week. And, wow did it have some great information.
Just a quick shout-out to the website BrandonChecketts.com, which has a fantastic tool called the DomainKeys, DKIM, SPF, SpamAssassin Email Validator. This tool is a must have for every email server administrator out there.
Anyway, this website's email validator not only alerted me to the fact that my SPF records were all wrong, it also reminded me that I still needed to add DKIM records to all the domain names. For years I'd had domainKeys installed on most of the more important email domains I host, but not DKIM, which was more stringent and tougher to spoof.
DKIM is King
DKIM, DomainKeys Identified Mail, lets an organization take responsibility for a message by attaching a "public key" to an email that is sent, and through rDNS (reverse Domain Name Server) lookups, can contact the indicated sending mail server to ask for the "private key" which it converts through encryption and then compares to the public key. This then involves a double-check.
Forget about almost everything else in a domain's DNS file. DKIM is king. It is the most important aspect of an email domain's bona fide set of records. SPF is important, too, but if you don't have a DKIM record, some of the larger email hosting services out there (gmail, AOL, etc) simply will not deliver your email.
So I began adding DKIM to the first set of very important domains, including WebWorksServer.com, which is the "From:" email domain for all emails that come from the Contact Us form on everyone's website.
This is the main reason why emails that originate from the Contact Us form were not getting to Hotmail, Gmail, etc. It was maddening to try to figure out why some people were getting emails, and others were not.
So, back to me adding DKIM to the domains. I was adding them, and in the proper format, yet the emails were still not getting through. I'd run checks internally in the mail server's interface (we use SmarterMail, and it has DKIM management built right in) and it kept telling me the DKIM certificates were passing. But the emails weren't getting through.
I went to SmarterMail's support forums and found several threads about this very same issue. Some people suggested going to the website www.Port25.com (a reference to the standard SMTP port that used to be the norm, but now due to Comcast and Verizon blocking that port, we now all use 587), creating a new DKIM certificate there, and manually adding it to SmarterMail's domainConfig.xml file, and then rebooting SmarterMail. THEN also adding it to the DKIM record in the domain's DNS zone file, saving, and rebooting the DNS server.
dkim._domainKey TXT ("p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQChdXkd UxbXMzsb4S7enLBL4sZv6YdJPpV2hYNX0b7WKAONcSvUcASF p0kcz1XckUdrWkdRCpxW76B4j6FSD/9Xd7fJtyHXS1zk07q/ fu8ImsAy3bLZG8mjDJ/ZcCuSvPoQC1Aem3rkdybKQml8UGGu WYuM9XvyB9kFzfXODU6kfQIDAQAB" )
May I remind you that Montague WebWorks hosts 95 email domains. This is onerous work, and it wasn't guaranteed to work.
So, Why Was I Getting Errors for a Couple of Days...?
Well, apparently signing every email that goes out from a list on the website takes a bit of time. If your list has 1000 names on it, it takes a very long time to sign them all in one shot. And so the server was "timing out", meaning, the process was taking too long and so it just gave up, thinking something had locked up.
As of this past weekend (March 15, 2014) we implemented a better spooling system so you can initiate an email blast and get on your way without having to wait five minutes for the page to run, or get an error. In fact you can now schedule the email to go out any time you like.
In the end, DKIM research, discovery, creation and testing, code modifications and testing took up three days of my week alone. Keep in mind that any change to a domain's DNS zone file takes a minimum of five minutes to propagate out to the Internet, and sometimes it takes days. This is very slow going work, but very important. As of this writing I am still deploying DKIM to every domain. It will likely take until the end of the month to complete.
And then, there's DMARC. Oh yes...
Even after you do all this work, once you send an email out, you can never know what's going to happen to it. Will it ever reach the receiving mail server? What will the server do with the email? How does it regard the reputation of the sending server? It's impossible to know, because what they do with your email is an internal policy. The only thing a mailserver admin can ever know if whether the sent email appeared at the door of the receiving server by reviewing the mail server logs. Once it enters the receiving mail server's "door", so to speak, and enters the "house", who knows. Maybe it was delivered. Maybe it went to the Junk or Spam folder. Or maybe it was just deleted. Sometimes they bounce the email, but most large providers simply delete the email if they think it's spam.
Only a couple of years ago, the major Internet companies suggested a way to publish at least some of their internal policy information about how they regard other email servers. It's called DMARC, for Domain-based Message Authentication, Reporting & Conformance, and HOLY COW, is this the Holy Grail of information about what happens to our email out there, and how other mail servers regard my mail server, based on DKIM and SPF. This is a very hard, rigorous test of a mail server's bona fides, and failure is easy. Failure means no email is delivered at most major email services, such as gmail, etc.
When an email is sent, our mail server now adds the DKIM signature to the headers of the email. The receiving server then compares the signature against the DKIM certificate on our server, and checks IP address of the sending server against the SPF record to determine if the server is allowed to send email for the "From:" domain. If it passes this test the email goes on. If it fails, it is blocked, and they store the failure information in a database to send out to the indicated postmaster email address for review.
BTW, the DMARC record looks like this:
_dmarc IN TXT ( "v=DMARC1; p=none; rua=mailto:firstname.lastname@example.org" )
So now, my top ten sending domains have DKIM, SPF and DMARC records, and now I get tons of email reports back in cryptic, nearly-unreadable XML files. I really had no idea how to read the report, and figure out what they thought of my server. (BTW, I'm hiding my email address above to avoid getting spam there.)
DMARCian to the Rescue
This company, DMARCian.com, offers a service whereby they receive the zipped up XML reports sent from every company that supports DMARC reporting, injests them, databases them, and provides an incredible interface that pulls the data apart so you can drill down to see who is accepting your emails or not, and whether they think you're a spammer, or if anyone is sending spam out pretending to be you.
You can also get a historical view of of your email domains to see how much email is being sent out, and how it all fared. It's almost overwhelming, and is the absolute best, and only way, to get a global view of your email server's reputation.
But wait, we're not done....
Stepping away from DNS-based email forensics, there are other things to consider when sending email, like if a recipient at AOL or Yahoo thinks your email is spam.
Even though people know they are signing up to get emails when they fill out your Contact Us form and check the "send me emails" checkboxes, they still tend to forget they did that, even the next day, and inexplicably click the giant red "Spam" button when they receive your email. Especially the users at AOL and Yahoo. They're terrible, really.
This can wreak havok on a mail server's reputation, and AOL, Yahoo, etc will block all email from that mail server for hours, perhaps even days, if enough people click the spam button, or if you continue to send email to someone who has already clicked the spam button from one of your previous emails.
So how does a valid, legal, compliant mail server admin stay on top of this kind of behavior? We sign up for something called a Feedback Loop, or Complaint Loop, where we enter in a bunch of information about our servers, pass through several checks to make sure we are who we say we are, and that we do in fact control the email server we say we do.
Then we start getting email reports, (yes, more reports) telling us when someone complaines about an email they received that you or I sent. At this point I am required to stop what I am doing and take care of it or, so they swear, they will stop accepting emails from my server.
Some of you, if you do a lot of email marketing from your websites through my mail server, may sometimes get an email from me indicating someone has complained about an email they received from you (through me). You should stop what you're doing and remove them from your email list(s). Right then and there so you don't forget. If the user continues to get emails from you after complaining about it, AOL or Yahoo or whoever they use will block my email server for days, even weeks, until the emails stop.
This is very serious business, and this is the one thing that you can do to help make sure every Montague WebWorks client has fully functional email when sending to their (your) customers. We're all in this together.
The Importance of Email Headers for Bulk Mail
In addition to staying in the loop about the emails you are sending when someone clicks on the Spam button, there are yet more things to consider about the emails, namely the actual headers attached to the email itself.
For instance, there are the mandatory From:, To:, Subject:, and Date: headers. But there are other important ones (including the DKIM signed header) such as Problems-to:, Return-path: and, if you are sending bulk mail, Precedence: and List-unsubscribe:, which is a new and expanding standard, and is supported by more and more large email hosting companies.
This is not something you can do. It's on the server end, and in specific, in the website code that generates the outgoing bulk emails. We have had many of the special headers in all out-going emails for years, but standards are always changing, and so we've reviewed our practices and removed some out-dated headers, and added the new standards and policies. This should improve our standings over time, and also improve the recipient's ability to understand that they can simply unsubscribe from your email lists instead of just calling it spam. Very helpful, indeed.
And, Is There an HTML Tag in that HTML Email?
Believe it or not, this was a silly overlooked thing. I'd been sending out emails in an HTML format (meaning there are colors and images and other website-like features in the emails) but none of them have ever been wrapped in the HTML tag. It simply never occurred to me that it was required because every email receiving system I had tested this early feature on rendered the emails perfectly.
Unbeknownst to me, some email receivers consider HTML emails that are not wrapped in that tag to be spam. Who knew? So, now that's fixed, too.
And In The End...
Is all this work worth it? Hell yes. We deeply believe that every business deserves a vanity email address, instead of being forced to use gmail or Hotmail, or whatever. It looks better on your business card, and it makes us proud to provide this service. We know it's very difficult, but we are willing to tackle the challenges. Otherwise we all end up with @gmail.com addresses, or paying them an extra $5/month for our vanity email address.
Thanks for taking the time to read this. Let me know what you think.
Posted: to Mik's Blog, sort of... on Sun, Mar 9, 2014
Updated: Sun, Mar 9, 2014