QuickFire DynamoDB

Simon Harrison
7 min readJan 30, 2021

--

Until recently, I probably would have struggled to name any database technology I was genuinely excited about. Yeah, those who know me well would guess Neo, but there was always an element of trauma throughout those years. Maybe it’s because I now have a flood of data that I am really serious about, I can answer that question: DynamoDB.

It was only when I set about writing this blog that I questioned the name. The name, in fact, is what AWS call the set of principles and techniques that underpin their incredible reliability at such a massive scale. When they put a database on top of them, we get DynamoDB.

Don’t panic, this is not going to be a gushing fanfare about yet another fancy NoSQL datastore. Neither am I going to teach you how to suck eggs. I’m just going to share why I find it easy, brilliant and fantastic. I’ve thrown in some fun facts and top tips, and then maybe, just maybe, one day you will find it to be fantastic too?

I do assume some prior AWS and DynamoDB knowledge with some of my flippant comments, but I hope the content here is enough to tease the uninitiated too.

I’m not a DevOps Engineer

This is largely true, although after a decade or so of doing something you can easily overlook just how good you’ve become. However, I’m done with building, managing, maintaining and scaling servers. DynamoDB is serverless.

Just stop and think for a minute about what that could mean for you as a database administrator. Yes, the clue is in the name — SAAS — but just let it sink in that you never have to worry about the size of your database tables again (within reason) nor the infrastructure that they are sitting on.

I’ve run AWS RDS instances for years. Maybe it’s because I’m no expert and a bit lazy, but I have to look after these and worry over them. One thing you get for free when you create a DynamoDB Table is CloudWatch Alarms to indicate if you provisioned the Table with reads and writes too frugally and you’re operating on burst capacity. Large Tables (>10GB) are partitioned automagically without any downtime using your partition key and looked up through an AWS secret-sauce consistent hash ring algorithm; sleep well, and without fear of PagerDuty waking you up.

I’m not a Mathematician

Yet even I can calculate how to provision my Tables. DynamoDB charges on a number of criteria but by far the most important one is the reads and writes per second — your Capacity Units. Your job is to tell Dynamo how many read and write requests you expect on each Table per second to ensure that you don’t get any nasty surprises come billing day.

One read request is 4kb and a write 1kb. If you are writing Items that are 1kb every second then you need a Write Capacity Unit (WCU) of… 1, and so on. The default is 5 which costs around $3.50 a month. When you are just messing around, or in Dev, drop it down to 1 because you only pay for what you use so the worst that can happen is you get throttled and the best is that you waste some kb’s of Units. Look over the metrics tab in the console to see which way you are leaning towards. It’s a learning experience. And it’s fun.

Fun Facts

Let’s have some quick-fire fun-facts for those that are not familiar with the service.

  • DynamoDB is a Key-Value Store and a Document Database. It is serverless, Cloud, NoSQL, Fast, Flexible, Highly Scalable, Fault-Tolerant, Secure, etc. etc. la la la
  • DynamoDB is priced on the capacity you configure it with, the number of Tables you use and storage — configure capacity correctly and you can avoid the horrors that you may have read about previously.
  • DynamoDB is fully managed and scales on-demand to serve almost unlimited concurrent operations in single-digit milliseconds. These can further be brought down to microseconds with caching (DAX) — this is blazingly fast and mind-blowing considering the effort we go to making Postgres behave like this at scale.
  • There are always multiple copies of your Tables across however many Facilities there are in your Region’s Availability Zones, so don’t worry about read slaves anymore.
  • Tables are the highest level DB concept you have to grasp — i.e. forget about the underlying database, or whatever AWS has under the hood. Rows are called Items and Columns/Fields are called Attributes.
  • DynamoDB has Strong and Eventual Consistency, the latter of which is 50% cheaper on the WCUs and RCUs.

I’m not a tango dancer

But I guarantee that if you’ve got some years under your belt messing around with other database technologies already, then you’ll be strutting around the dynamo dancefloor in around 2 hours when starting with the AWS DynamoDB console. You can also obviously use the CLI, any SDK and CFN. Keep reading, and as long as you remember the 2 key takeaways at the end of this story, then you are basically production-ready.

Top Tips

In no particular order.

  • Your Table’s partition key is probably the most important decision you need to make. Then the (optional) sort key.
  • Beware of time series data and popular datasets when designing your Tables — both can easily lead to hotkeys and hot Partitions!
  • Beware of temporarily scaling up because Partitions are not scaled down afterwards
  • The max Item size is 400kb, which may come as a shock. I use Dynamo for IoT data which typically is very small, but you may be challenged by this limit. Keep your Attribute names short and snappy — but readable of course — because even the size of the Attribute string name contributes towards the 400kb max size of an Item. Attribute values can be compressed with gzip or consider splitting them across Items (and use BatchGetItem API to retrieve) or even linking to them on S3.
  • Avoid Table Scans because they do not use an Index and scan each Partition — very expensive!
  • Filters are not applied until after the query, so they are no good at solving query optimisation problems: don’t be distracted by them.
  • Use eventual consistency unless you are facing 1-second race conditions (+ your network time of course) reading new data because replication across Facilities takes <1 second.
  • Local Secondary Indexes share the Partition’s WCU and RCU capacity
  • Reduce throughput to reduce bills! Also, design for uniform workloads, use compression, use fewer Attributes, use eventual consistency, archive old data, use DAX— all will help reduce your bills. Also notice that DynamoDB pricing varies from Region to Region…
  • View the metrics on the console and see if you are using your RCU or WCU capacity, or maybe wasting any? You get 9 auto-scales of Capacity Units a day for free, so set this up!
  • If you really need sub-millisecond responses, use DAX.
  • There is no datetime type so use Unix timestamps as Numbers
  • To replicate Tables around the globe… simply use Global Tables option from the Console and choose the Regions where your global users are — it really is that simple
  • Remember, Dynamo is schemaless. Along with eventual consistency, this is one of the reasons why it's so damn fast. But because it’s schemaless, there are no Foreign Keys, 1–1s, 1-manys etc. etc. but these can actually be modelled by tactical Secondary indexing techniques.
  • Another shock you may have is when you discover that you cannot truncate Tables. Dynamo will scan all partitions (expensive) and delete each record one PUT at a time. And bear in mind that you are paying for each delete action and that you’ll have to write a script to achieve this (Dynamo console does not provide this feature). Oh, and it gets worse… this will eat up your write capacity and you’ll probably be throttled before you finish! It’s often best to just start again!

My Favourite Bits of DynamoDB

  • Serverless
  • DynamoDB Streams and Lambda Triggers. AWS even provide you with Lambda Blueprints that demonstrate handling Table Item changes. DynamoDB Streams might just be my favourite feature and I rely on them heavily.
  • Assigning Items a TTL and using Triggers to initiate side effects, such as archiving the data.
  • No need to build a RESTful API on top… it comes with a JSON API. Add a serverless AWS API Gateway on top and your mobile and web apps are good to go.
  • Point In Time Recovery — restores all Tables to any second in the last 35 days… insane.

Key Takeaways

  • The most important concept to understand before you go to production is the partition behaviour of Tables because you want to have uniform data access across them. Data modelling is as important as ever, and do this with respect to partitioning behaviour with your choice of the partition key.
  • If you can then estimate your WCU and RCU and activate auto-scaling, then you have pretty much won the battle.

I’m not a DynamoDB expert…

Or maybe I am?!

Either way, I hope this has either helped or entertained you. Enjoy your AWS journies.

--

--

No responses yet