Guide to backup and restore big tables in DynamoDB completely

To be brief, in the company I’m working at, we are depending on real-time data with no tolerance of high-latencies. The lower the latency, the better results we can provide to our customers/clients. After Amazon launched the Frankfurt region (eu-central-1) we decided to move from Ireland (eu-west-1) to Frankfurt. It’s much lower latency when you ping both regions from Istanbul.

I was looking for a practical and easy solution to the issue of ours but after trying many solutions (like Data Pipeline) and getting tired of their complexity, I tend to lean on some open-source projects to do our task.

After a couple researches I found this little awesome batch tool working with boto:

I gave it a shot and it worked nearly-perfectly. You can use this tool from your local machine or from an EC2, depending on your connectivity and data-size of course. I had issues on tables with high amount of items but, there is a workaround for that. You need to fine-tune the batch script to work with your dataset.

The default configuration is as follows (inside

AWS_SLEEP_INTERVAL = 10 # seconds
MAX_BATCH_WRITE = 25 # DynamoDB limit
SCHEMA_FILE = "schema.json"
DATA_DIR = "data"
LOCAL_REGION = "local"
DATA_DUMP = "dump"
THREAD_START_DELAY = 1 # seconds

What I needed to change are those:


Reason I’m changing this is that the script starts throwing errors because the provisioned DynamoDB write capacity units/ps is less than the inserted units/ps by the script.

There are two workarounds here, the first one is the above fine-tunes in the script. Which takes around 30 seconds for 14.000 Items to insert into with this config. But if you have more than 1.000.000 items and you want the restore operation to be completed in less than 5 minutes, then you need to temporarily scale up the DynamoDB write capacity units per table. After the restore operation succeeds, you can scale down the write capacity units back to its normal.

Back it up!

Easy. First, you need to create a user with AWS Access Key (from IAM) with FullAccess permissions to DynamoDB. There is an AWS managed policy called AmazonDynamoDBFullAccess and you can use it for the entire operation.

To backup every table in Ireland region:

python -m backup -r eu-west-1 -s "*" --accessKey AWS_ACCESS_KEY --secretKey AWS_SECRET_KEY

To backup single table in Frankfurt region (change TableName with your table name):

python -m backup -r eu-central-1 -s TableName --accessKey AWS_ACCESS_KEY --secretKey AWS_SECRET_KEY

Restore any means necessary.

One thing your should be careful with is to create table schema (empty table with its structure) before you start inserting data, otherwise script might fail during the sequential operations.

To restore every table you have, to London (eu-west-2) region:

python -m restore -r eu-west-2 -s "*" --accessKey AWS_ACCESS_KEY --secretKey AWS_SECRET_KEY --schemaOnly
python -m restore -r eu-west-2 -s "*" --accessKey AWS_ACCESS_KEY --secretKey AWS_SECRET_KEY --dataOnly

Or if you’d like to insert single table (change TableName with your table name):

python -m restore -r eu-west-2 -s TableName --accessKey AWS_ACCESS_KEY --secretKey AWS_SECRET_KEY --schemaOnly
python -m restore -r eu-west-2 -s TableName --accessKey AWS_ACCESS_KEY --secretKey AWS_SECRET_KEY --dataOnly

Data Location

The data you backed-up will be stored in a folder called dump where the script located. Table schema information is covered in scheme.json and the data per table will be rested as partitions (like 0001.json, 0002.json, etc.) if you have high amount of items per table.

The operation will take 30 seconds to 30 minutes depending on the data size inside tables. If you are only taking backups, then you can do it on-the-fly but if you are planning to migrate to another region like us, you should consider having a downtime off-hours. To avoid that you may use DynamoDB Streams but this will require a lot of time for research. There are several github projects basing this as well.

That basically sums everything up.

Be well.




A CTO gone rogue.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Exploring Python’s urrlib

No Code, but from the Perspective of a Recent Software Engineering Bootcamp Grad

Build a Python Library by Scraping a Website, Upload to PyPI, and Consume it through Django + React…

DevOps Interview Questions

Resolution of a MySQL performance issues using Percona Monitoring and Management PMM.

Helm and Charts in Kubernetes

First Contribution to Mozilla Firefox

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Umur Coşkuncan

Umur Coşkuncan

A CTO gone rogue.

More from Medium

Transform existing Lambda based stack to CloudFormation stack using SAM template— Part 3

Implementation of a Scalable Web Application using the services of AWS Elastic Beanstalk, DynamoDB…

AWS Lambda@Edge, Amazon CloudFront and HTTP Security Headers

AWS Certified Solution Architect — Aurora