I had a client with an interesting problem this week. Their T1 connection had gone down for a few hours and suddenly their emails were bouncing, being delivered to their webmail instead of siphoning through their onsite router, and life was just generally difficult for them. They were also no longer receiving their web-based forms email.
It took a bit of research (and calling in favors from some of my most talented fellow techies) to resolve the issue. These things are never as simple as they should be, are they?
In case it’s of any assistance to anyone else, I wanted to deconstruct the problem here… and share the solution.
First, the webmail issue was a result of a recent “cleaning out” of stale email addresses. In the process of cleaning out the “bogus” email addresses, the firstname.lastname@example.org had been deleted. Re-adding this email address to the router configuration solved the “no mail from our website” issue.
The webmail sitting on the server and the bounce messages were a bit hairier…
When an email MX record is set up, it’s often set up with a primary and a secondary — the same as DNS records — to have a “backup” should the primary address go down. In this particular situation, that’s exactly what had happened. When the T1 connection went down, the MX record pointing at their onsite router no longer worked. So the secondary MX record was used. The secondary MX record routed email to the server’s webmail, for safe keeping.
All that is well and fine…
But once the T1 was back up and working, some emails (but not all emails) were still going to the webmail bin. There seemed to be no rhyme or reason as to why these particular emails landed in webmail. But, a techie friend of mine suggested that individual email programs (those of the senders) may not be refreshing the MX record in a timely fashion.
That means that if I sent an email during the T1 downtime, my email was rerouted automatically to the webmail bin… and it would continue to go there until my own email program refreshed the MX record by refetching that information.
The solution was easy — we removed the secondary MX record to force the router only delivery. This will work well, as long as the connection doesn’t go down for more than a few days. Emails will continue to try to be sent for several days before experiencing a permanent failure (and even with that the sender will get an error message back and will KNOW that the email was never received).
Finding the problem was, as is often the case in the tech world, the difficult part.
If you want to learn more about how MX records work to solve your own email delivery nightmares (or those of your clients), visit wikipedia.org’s article on MX Records for a great overview and links to indepth articles on the topic.