Amazon Simple Queue Service (SQS) – 15 Years and Still Queueing! • Lucian Systems

Time sure does fly! I wrote about the production launch of Amazon Simple Queue Service (SQS) back in 2006, with no inkling that I would still be blogging fifteen years later, or that this service would still be growing rapidly while remaining fundamental to the architecture of so many different types of web-scale applications.

The first beta of SQS was announced with little fanfare in late 2004. Even though we have added many features since that beta, the original description (“a reliable, highly scalable hosted queue for buffering messages between distributed application components”) still applies. Our customers think of SQS as an infinite buffer in the cloud and use SQS queues to implement the connections between the functional parts of their application architecture.

Over the years, we have worked hard to keep the SQS interface simple and easy to use, even though there’s a lot of complexity inside. In order to make SQS scalable, durable, and reliable, messages are stored in a fleet that consists of thousands of servers in each AWS Region. Within a region, we save three copies of each message, taking care to distribute the messages across storage nodes and Availability Zones. In addition to this built-in redundant storage, SQS is self-healing and resilient to host failures & network interruptions. Even though SQS is now 15 years old, we continue to improve our scaling and load management applications so that you can always count on SQS to handle your mission-critical workloads.

SQS runs at an incredible scale. For example:

Amazon Product Fulfillment – During Prime Day 2021, traffic to SQS set a new record, peaking at 47.7 million messages per second.

Rapid7 – Amazon customer Rapid7 makes extensive use of SQS. According to Jeff Myers (Platform Software Architect):

Amazon SQS provides us with a simple to use, highly performant, and highly scalable messaging service without any management headaches. It is a crucial component of our architecture that has allowed us to scale to handle 10s of billions of messages per day.

You can visit the Amazon SQS home page to read about other high-scale use cases from NASA, BMW Group, Capital One, and other AWS customers.

Serverless Office Hours
Be sure to join us today (July 13th) for Serverless Office Hours at Noon PT on Twitch.tv. Rumor has it that there will be cake!

Back in Time
Here’s a quick recap of some of the major SQS milestones:

2006 – Production launch. An unlimited number of queues per account and items per queue, with each item up to 8 KB in length. Pay as you go pricing based on messages sent and data transferred.

2007 – Additional functions including control of the visibility timeout and access to the approximate number of queued messages.

2009 – SQS in Europe, and additional control over queue access via AddPermission and RemovePermission. This launch also marked the debut of what we then called the Access Policy Language, which has evolved into today’s IAM Policies.

2010 – A new free tier (100K requests/month then, since expanded to 1 million requests/month), configurable maximum message length (up to 64 KB), and configurable message retention time.

2011 – Additional CloudWatch Metrics for each SQS queue. Later that year we added batch operations (SendMessageBatch and DeleteMessageBatch), delay queues, and message timers.

2012 – Support for long polling, along with SDK support for request batching and client-side buffering.

2013 – Support for even larger payloads (256 KB) and a 50% price reduction.

2014 – Support for dead letter queues, to accept messages that have become stuck due to a timeout or to a processing error within your code.

2015 – Support for extended payloads (up to 2 GB) using the Extended Client Library.

2016 – Support for FIFO queues with exactly-once processing and deduplication and another price reduction.

2017 – Support for server-side encryption of messages and cost allocation tags.

2018 – Support for Amazon VPC Endpoints using AWS PrivateLink and the ability to invoke Lambda functions.

2019 – Support for Tag-on-Create and X-Ray tracing.

2020 – Support for 1-minute metrics for more fine-grained queue monitoring, a new console experience, and result pagination for the ListQueues and ListDeadLetterSourceQueues functions.

2021 – Tiered pricing so that you save money as your usage grows, and a High Throughput mode for FIFO queues.

Today, SQS remains simple, scalable, cost-effective, and highly reliable.

From the AWS Heroes
We asked some of the AWS Heroes to reflect on the success of SQS and to share some of their success stories. Here’s what we learned:

Eric Hammond (Serverless Hero) uses AWS Lambda Dead Letter Queues. They put alarms on the queues and send internal emails as alerts when problems need to be investigated.

Tom McLaughlin (Serverless Hero) has been using SQS since 2015. He said “My favorite use case is anytime someone wants a queue and I don’t want to manage a queuing platform. Which is always.”

Ken Collins (Serverless Hero) was not entirely sure how long he had been using SQS, and offered a range of 2 to 8 years! He uses it to power the Lambdakiq gem, which implements ActiveJob on SQS & Lambda.

Kyuhyun Byun (Serverless Hero) has been using SQS to power a push system that must sustain massive amounts of traffic, and tells us “Thanks to SQS, I don’t consider building a queuing system anymore.”

Prashanth HN (Serverless Hero) has been using SQS since 2017, and considers himself “late to the party.” He used SQS as part of his first serverless application, and finds that it is ideal for connecting services with differing throughput.

Ben Kehoe (Serverless Hero) told us that “I first saw the power of SQS when a colleague showed me how to retain state in a fleet of EC2 Spot instances by storing the state in SQS when an instance was getting shut down, and having new instances check the queue for state on startup.”

Jeremy Daly (Serverless Hero) started using SQS in 2010 as a lightweight queue that fed a facial recognition process on user-supplied images. Today, he often uses it as a buffer to throttle requests to downstream services that are not yet serverless.

Casey Lee (Container Hero) also started to use SQS in 2010 as a replacement for ActiveMQ between loosely coupled Java services. Casey implements auto scaling based on queue depth, and has found it to be an accurate way to handle the load.

Vlad Ionecsu (Container Hero) began his AWS journey with SQS back in 2014. Vlad found that the API was very easy to understand, and used SQS to power his thesis project.

Sheen Brisals (Serverless Hero) started to use SQS in 2018 while building a proof-of-concept that also used Lambda and S3. Sheen appreciates the ability to adjust the characteristics of each queue to create a good match for the message processing functions, and also makes use of both high and low priority queues.

Gojko Adzic (Serverless Hero) began to use SQS in 2013 as a task dispatch for exporters in MindMup. This online mind-mapping application allows large groups of users to collaborate, and requires strict ordering of updates to each document. Gojko used FIFO queues to allow them to process messages for different documents in parallel, with sequential order within each document.

Sebastian Müller (Serverless Hero) started to use SQS in 2016 by building a notification center for website-builder jimdo.com . The center ensures that customers are kept aware of events (orders, support messages, and contact requests) on a timely basis.

Luca Bianchi (Serverless Hero) first used SQS in 2012. He decoupled a pair of microservices running on AWS Elastic Beanstalk, and also created a fan-out processing system for a gamification platform. Today, his favorite SQS use case stores inference jobs and makes them available to a worker process running on Amazon SageMaker.

Peter Hanssens (Serverless Hero) uses SQS to offload tasks that do not need to be processed immediately. Several years ago, while assisting some data scientists, he created an event-driven batch-processing system that used a Lambda function to check a queue every 5 minutes and fire up EC2 instances to build models, while keeping strict control over the maximum number of running instances.

Serkan Ozal (Serverless Hero) has been using SQS since 2013 or so. He focuses on asynchronous message processing and counts on the ability of SQS to handle a huge number of events. He also makes uses of the message visibility feature so that he can re-process messages if necessary.

Matthieu Napoli (Serverless Hero) has been using SQS for about five years, starting out with EC2, and as a replacement for other queues. As he says, “Paired with Lambda, it gives massive parallelism out of the box without having to think too much about it. Plus the built-in failure handling makes it very robust.”

As you can see, there are a multitude of use cases for SQS.

SQS Resources
If you are not already using SQS, then it is time to get in the queue and get started. Here are some resources to help you find your way:

Happy queueing!

— Jeff;