Internet over AWS Direct Connect

This document describes how to organize and implement Internet connectivity from multiple datacenters over AWS Direct Connect.

3 min readFeb 12, 2021

Why?!

I guess a lot of people come over this link just to understand:
why somebody needs something like this?
Let's imagine this case:
The company decided to migrate classic on-premises infrastructure (two independent redundant Datacentres) to the cloud (AWS). As a first step, Direct Connect to each of the datacentres was implemented. All Internet-facing applications were migrated to AWS, but servers in datacentres still need Internet connectivity (repositories, updates, downloads, etc).

About Direct Connect and Transit Gateway

We have a high resilient AWS Direct Connect (DX) setup for DC1 and DC2. DC1 and DC2 also interconnected. Direct Connect service connected to Transit Gatewateway (TGW) using Direct Connect Gateway (DXGW) see Picture 1. To exchange routes between elements in this triangle DC1-AWS-DC2 BGP protocol is used.

Routers DC1(AS65101) and DC2(A65100) each establish BGP sessions with DXGW(AS6500) to exchange routes. Also, DC1 and DC2 establish BGP session with each other to provide redundancy for DX connection. To route Internet(any) traffic to DXGW from servers behind DC1 and DC2 routers, 0.0.0.0/0(default route) should be announced from DXGW to routers DC1 and DC2 over BGP protocol. After reaching DXGW all traffic will be routed to TGW wherein corresponding routing table we will manually add static route: destination 0.0.0.0/0(any) via TGW attachment associated with “NAT VPC”. This explanation covers how packet flows from servers in DC1 or DC2 to “NAT VPC”.

Inside VPC

What happens after the packet reaches VPC was(and still is) the more interesting thing to understand for me. As we are using routing all the way (means source private IP address of the servers are unchanged) it is obvious that to reach the Internet we have to involve source-based network address translation and for that, I have tried to use AWS NAT gateway (NATGW) service. I was not sure that packets originated from servers in DC1, DC2 would be processed by NATGW because source IP addresses are different than VPC CIDR where NATGW is. But it worked NAT is happening. We can conclude that NATGW will traverse any packets with any (RFC1918) source routed to NATGW (in other words source/destination check disabled)

Let’s discuss packet flow step by step, please see Picture 2.
The packet arrives at TGW where we have a route to NAT VPC attachment
which is associated with the Private subnet. Then packet will be routed according to the Private subnet routing table, where we have the default route to the NATgw in the Public subnet (classic setup).

Redundancy

It’s time to think about redundancy. In the case of NATGWs, it means that each NATGW is redundant only in one availability zone (AZ). Therefore we need to have at least two NATGWs, each in unique AZ, to be able to provide minimal redundancy. So we have used three NATGWs and as result, TGW attached to three subnets see Picture 3.

So we have redundant NATGWs in different AZs, but we don’t have any mechanisms to control routing on TGW more granularly than attachment (we can’t choose the exact subnet or NATGW to route to). And this fact leads to the question: how to provide redundancy?
I was surprised, but in this case, redundancy is built-in, we don’t need to manage or configure anything. After running this schema and doing some tests I realized that packets routed from TGW to VPC attachment will be served round-robin between subnets and as result between NATGWs.
So all requests will be distributed between three NATGWs without any additional configuration, redundancy and load distribution problems are solved.