A Distributed Architecture Proposal for Dispatching and Tracking Emails using Message Queues

Brian Di Croce
6 min readJan 10, 2022

The Idea

You have an application that needs not only to dispatch emails to users, but to keep track of any activity status pertaining to the dispatched emails.

If this idea tickles your Peter Tingle, then you might be interested in reading the following lines.

The Context

While designing Cloudgenda, a digital platform for organizations to easily manage their daily business operations, there was a functional need that sprung out after talking with customers: sending emails to users from within the application.

But that wasn’t the whole story. The application should also persist email activity statuses from whichever third-party email provider (i.e., SendGrid, Postmark, Amazon Simple Email Service, etc.) that supports tracking activity statuses.

An activity status pertaining to an email helps us to answer the following questions:

  • Was the email dispatched?
  • Was the email opened? If so, when?
  • Were any links inside the email clicked?
  • Was the email bounced?

For example, an administrator that logs in to their Cloudgenda account can easily view the activities pertaining to an email sent to a user or a group of users.

When you think about it, sending an email is pretty much like posting a letter to the mailbox. It’s a fire-and-forget, asynchronous process. You send it and assume that all will go well after that point. But what if we can receive some kind of response to as where the email is in its digital adventure? That would help us to answer quite a few important questions: Did my user received the email? How did the user interacted with my email?

The Proposed Solution

Solving this problem isn’t too complex. You mix a few message queues with a spoonful of background services and a pinch of an API (more specifically a webhook), and you’re pretty much done.

The proposed solution is shown in the following diagram:

Architecture diagram to dispatch and track emails in a distributed way.

The object in green is the database and it can be any kind of persistent technology (a relational database such as PostgreSQL or a document-oriented database like MongoDB would be ideal for this scenario).

The object in purple represents a third-party email service provider. One of the requirements for the chosen provider is that it should support tracking mail activities/statuses by providing it a webhook URL of our own. This is pretty much the Hollywood principle in action: Don’t call us, we’ll call you.

The objects in orange represent message queues. In my case, I’m using RabbitMQ both locally for development purposes and on the cloud for the production environment. It’d be best that each queue provides a FIFO mechanism for the input/output of messages. You can also choose another technology for these: Azure Queue Storage, AWS SQS, or even Kafka if you have the time and a need for it. Regardless, there are three queues in action here:

  1. A Mailbox queue which acts as a drop box for mail messages. Think of it as a real mailbox on your street. It’s pretty much the same concept. You drop a mail message on it, and you continue living your beautiful life.
  2. A Dispatching queue which is used to store mail messages that will be eventually dispatched by another component (an Email Dispatcher).
  3. A Tracking queue which is used to store messages related to mail activities. A mail activity is mostly a status change associated with a mail message that is brought to our attention by the third-party email provider.

The objects in blue are yours to manage, and I’ll explain each one starting from the top:

  • You have your web application or API that needs to display or return a history of activities pertaining to a user’s email. For example, Cloudgenda has a dashboard component that lists an organization’s members. When an administrator clicks on a member to view its detailed profile, he or she has access to all emails sent to that member, as well as a history of activities/statuses pertaining to each email.
  • The next component is the Email Operator which is basically a service worker (or a background service) responsible to: 1) pickup mail messages from the Mailbox queue, 2) record an entry to the database with a first activity for which its status is set to Accepted, 3) send the mail message to the Dispatching queue, 4) record an entry to the database with a status set to Queued, and 5) record an entry to the database for every tracking notifications received in the Tracking queue. Of all components in the architecture (besides the custom app or API), the Email Operator is the only one with access to the database. In the architecture, I show that component as overlapped with others, and this basically is to indicate that the Mail Operator can scale horizontally when either the mailbox gets filled faster than usual, and/or when the volume for tracking activities increases unexpectedly. If you use a relational database for the data store, take advantage of transactions to create an ACID transaction for recording an entry to the MailMessageHistory table and another one to the MailMessageActivities table with the record’s status set to Accepted.
  • The following component is the Email Dispatcher which is responsible to actually dispatch the email using a third-party email service provider. A default service provider can be dynamically configured for this service. Once an email is dispatched, the Email Dispatcher sends a message to the Tracking queue with an activity status set to Dispatched. And just like the Email Operator, this component can also horizontally scale when needed.
  • Our last component is the one acting as a webhook sink for a third-party email provider to call us whenever an activity occurs with an email. In Cloudgenda’s case, the webhook is provided as a façade from within our API (i.e., https://api.cloudgenda.com/webhooks/mail-tracking). After being poked by the email service provider, it sends a mail activity status message to the Tracking queue. As an example, this page from SendGrid lists all the supported activity status pertaining to a mail message. Also, keep in mind that the webhooks should and must be secured because they’re publicly accessible. Your third-party email provider will recommend you a few strategies to implement this well in its documentation. If you prefer to use a cloud-based alternative, a service like AWS EventBridge or Azure EventGrid could easily substitute a custom implementation for the webhook.

Sealing the Envelope

That’s pretty much it.

The distributed aspect of the solution makes the email workflow highly-available and scalable thanks to its stateless nature. You can opt to provision the Email Operator and Email Dispatcher components in the same virtual network/VPC as your message queues, so as to keep both costs and latency at a minimum.

I highly recommend that you also put in place some kind of telemetry (i.e., tracing, logging, and/or monitoring) in all this so that you can track how many emails were dispatched successfully, how many were not, etc. You can also implement retries in case one of the components isn’t responding for whatever reason. In my case, I’m using the excellent EasyNetQ library which provides lots of goodness out of the box for interacting with RabbitMQ. I tried using MassTransit, but found it a bit too complex to set up. Being a one-man development team for the time being, I need to work with a tool that helps me to be fast and simplify things as much as possible. EasyNetQ does this, and does it well.

During development, you might want to test out your webhook(s). For this, you can use Ngrok, but lately I found an excellent alternative in Hookdeck. The latter provides a way to keep a static endpoint to test our webhook. I love that a lot. Plus, the people behind Hookdeck work in the same beautiful city that I live in, so that’s pretty cool.

You can also implement some kind of scheduler on either the Email Operator or Email Dispatcher component if you have a need to schedule emails à la Gmail. EasyNetQ and MassTransit provides this service out of the box too by making use of delayed messages.

Furthermore, you can implement some kind of logic to make sure that the limit of emails being sent by a customer for given period doesn’t go over what’s permitted for its account level. Personally, I would do that at the custom app or API level, that way we keep the email workflow simple: whatever goes to the mailbox gets dispatched eventually. No questions asked. But as every architecture is opinionated, there are obviously no right or wrong answers…just tradeoffs.

Lastly, if you’re going full speed ahead on the cloud, you can replace the Email Operator and Email Dispatcher for Lambda functions too. Sure, this might bind you to a specific cloud provider, but the tradeoff is that you’ll have less code to maintain and might save some costs too.

Honestly, there are so many ways to spin the solution around for your present and future needs once its built on a sound architecture.

Just like a house.

With a mailbox on the other side of the street.

--

--

Brian Di Croce

I’m a software engineer based in Montreal, Canada, and the founder of Cloudgenda. I tweet at https://twitter.com/bdicroce. 🍁