Benchmarking AWS Databases

This was a quick app I created to run some basic tests against database services in AWS. It does basic read and writes using ADO.NET-ish frameworks. The dataset used is a 10 column varchar object called Item found in the mocks folder. The program can be run with different dataset sizes. Everything is configured through the appsettings.json file. A template version of that file can be found on the main project site. That file looks something like below. Note that for dynamodb there are some more configurations.

{
  "Global": {
    "NumberOfRecordsToWrite": "50000",
    "NumberOfRecordsToRead": "10000"
  },
  "Databases": {
    "DB1": {
      "Type": "DynamoDb",
      "TableName": "Benchmark",
      "AccessId": "abcdefg",
      "SecretKey": "1234567890",
      "Capacity": {
        "Size": "1",
        "Write": "5",
        "Read": "3"
      }
    },
    "DB2": {
      "Type": "Sql",
      "TableName": "Benchmark",
      "ConnectionString": "xxxxx"
    },
    "DB3": {
      "Type": "Postgre",
      "TableName": "Benchmark",
      "ConnectionString": "xxxxx"
    },
    "DB4": {
      "Type": "Fake",
      "TableName": "Benchmark",
      "ConnectionString": "xxxxx"
    }
  }
}

The program currently supports DynamoDB, SQL Server and PostgreSQL. Multiple databases can be tested by simply adding to the config file above. The databases can be hosted on Amazon RDS, Aurora or self managed through EC2.

 

Exponential Backoff 

Since DynamoDB is a throttle connection, I have some very basic calculations put in place to determine how to connect with that database. This is calculated using the Capacity configurations done on DyanmoDB. The read and writes are done in bulk using the BatchWrite and BatchRead APIs. More information about this can be found on my other DynamoDB posts referenced below.

http://solidfish.com/overview-of-aws-dynamodb/

http://solidfish.com/aws-dynamodb-sdk-api-dot-net/

When the capacity is reached on DynamoDB, AWS will start throwing exceptions indicated the provisioned throughput has been exceeded. There are different ways to handle this exception, one of which is to implement an Exponential Backoff Algorithm.

Exponential Retry or Exponential Backoff algorithm is a process in which the rate of a process is gradually decreased. This is important when dealing with capacities, such as in networking so that the bandwidth is not over congested. This is also important when working with endpoints where the traffic is throttled. In our case, AWS DynamoDB throttles the connection rate and therefore this concept is important when implementing connections into DynamoDB. This is the recommended approach by AWS. See references below for their articles on this topic.

In this application we simply double the wait the time when hitting this exception. View the github site for full source and details.

 

Results

For my project I ran the benchmark against DynamoDB and 3 different RDS instances. Each instance being different sizes. I ran a dataset of 10,000 records where each record was approximately .6 KB and on a quad core i7 machine.

DynamoDB @ 5 writes/sec / kb 2780 seconds
DynamoDB @ 10 writes/sec / kb 402 seconds
DynamoDB @ 50 writes/sec / kb 24 seconds
RDS XL (4 core / 16 RAM) 36 seconds
RDS M (2 core / 8 RAM) 232 seconds
RDS S (1 core / 2 RAM) 270 seconds

 

Github

https://github.com/johnlee/AwsDbBenchmark

 

References

Error Retries and Exponential Backoff in AWS
https://docs.aws.amazon.com/general/latest/gr/api-retries.html

Exponential Backoff and Jitter
https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/

Throughput Capacity
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ProvisionedThroughput.html

DynamoDB BatchWrite
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html