Selecting a NAT Instance Size on EC2

We’ve been using the Amazon Web Services (AWS) Virtual Private Cloud (VPC) functionality to create an isolated and secure hosting environment for our SaaS product, HunchLab. When EC2 servers in a VPC with only private IP addresses need access to S3 (or to the Internet) the network traffic must be routed through a NAT instance. This architecture provides increased security by reducing the external surface area of the application.

There are many resources about setting up a NAT instance in AWS. Many examples setup NAT instances as the m1.small or t2.micro instance sizes. Both instance sizes are low-cost and so a natural starting point for experimentation.

The m1.small is a prior generation EC2 instance type with Amazon recommending an upgrade path to the m3 instance family. The m3 family does not, however, have a small instance where only a limited amount of memory is required. The t2 instances seem like a natural fit from a cost perspective but Amazon lists their network performance as ‘low to moderate’, which wasn’t very assuring given that the primary purpose of a NAT instance is to provide network connectivity to the rest of the servers within the application.

Given that EC2 does not provide a network focused instance family like they do with compute, memory, and storage optimized families, my question was:

Which NAT instance size should we use in production?

I decided to answer this question by benchmarking several instance sizes. I tested the m1.small instance size and it’s closest replacement, the m3.medium. I also tested all three t2 instances (t2.micro, t2.small, t2.medium) because they are low cost and a new instance family which likely benefits from the latest back-end EC2 architecture improvements.

AWS rates the network performance of each instance type as low, moderate, high, or 10 Gigabit. To include instances with “enhanced networking” enabled, I also included the c3.large and c3.2xlarge instance sizes. Enhanced networking is designed to improve packets per second and reduce latency through better virtualization. The c3.2xlarge is also rated as high network performance. For all instance types I used the latest stock NAT AMI provided by AWS for my testing.

One component of our application generates large files that we store within S3. To benchmark the throughput of the different NAT instances I stored the Ubuntu 14.04 Server ISO file within a bucket in S3 in the same region as our servers. For each instance size, I downloaded the ISO file 10 times using wget from a server behind the NAT instance and recorded the throughput in MBps for each sample. I then calculated the median bandwidth and the TP80 metric (the top 80% of the samples).

I also recorded the price per hour to run each instance type in our region using reservation pricing for instances that are part of current generations. Finally, I calculated the bandwidth per unit of cost to determine the sweet spot along the performance-cost curve. Here are the results.

Results

[raw]

NAT Instance	Median Bandwidth	TP80 Bandwidth	Cents / Hour	Median Bandwidth / Cost	TP80 Bandwidth / Cost
m1.small	8.3 MBps	3.5 MBps	4.40 cents	1.88 MBps / cent	0.80 MBps / cent
t2.micro	2.7 MBps	1.7 MBps	0.86 cents	3.14 MBps / cent	1.98 MBps / cent
t2.small	13.9 MBps	10.2 MBps	1.72 cents	8.08 MBps / cent	5.92 MBps / cent
t2.medium	20.7 MBps	19.14 MBps	3.45 cents	6.00 MBps / cent	5.55 MBps / cent
m3.medium	20.4 MBps	16.6 MBps	4.25 cents	4.79 MBps / cent	3.91 MBps / cent
c3.large	43.2 MBps	32.76 MBps	6.19 cents	6.98 MBps / cent	5.29 MBps / cent
c3.2xlarge	43.3 MBps	39.02 MBps	24.77 cents	1.75 MBps / cent	1.58 MBps / cent

[/raw]

The m1.small instance, which most examples utilize, offers quite limited bandwidth and is not a good choice for a production environment. The t2.micro instance is even worse. The t2.small and t2.medium instances seem like good fits for production environments where cost is a concern. The c3 instances with enhanced networking clearly realize a performance boost compared to the other instances but come at a higher cost. For a single simultaneous transfer from S3 the c3.2xlarge instance does not realize much of an improvement over the c3.large, but I imagine that more concurrent transfers would realize a higher overall throughput.

This benchmark is of course subject to the particular hosts that I landed on during my testing. If I repeated the test, I would expect variability in the benchmarks for the t2 family due to their burstable design. For our use case, the t2.medium seems like a good choice.