In this blog post, we dive in and review the key differences and use cases for VPC Peering Connections and Transit Gateway at a high level.
You may have encountered the terms "VPC Peering Connection" & "Transit Gateway" during your time as an engineer working on infrastructure in AWS, but what do they mean, and where do they differ? Hopefully, after this post, you'll be able to distinguish the difference between the two immediately and when to use them.
What is a VPC Peering Connection?
At its core, a VPC Peering Connection (abbreviated as PCX) is a way to connect two VPCs to communicate and route traffic between them using private IPv4 and IPv6 addresses. With a peering connection between two VPCs, resources inside these can communicate at a nearly "unlimited" (same region) speed but are ultimately limited to the instance/ resource network speed inside those VPCs.
If we had 100 instances in VPC1 and another 100 instances in VPC2, they could connect and transfer data near line rate to each other. The most significant limitation you'll experience during this is the 5gbps limit per traffic flow. To get around this, you can utilise multi-flow traffic/ multipath TCP. Remember this the next time you want to provision 2 EC2 instances with 10gbps interfaces and wonder why speeds are limited. This only applies outside of EC2 cluster groups.
When enabling a peering connection to a VPC in another region, all traffic is encrypted by default too. A VPC can have up to 125 peering connections, even across different AWS accounts and is private by default. With this in mind, however, this type of connection does not support transitive routing. What does transitive routing mean?
In the above diagram, 4 VPCs are set up with two different traffic flows. One flow originating from VPC 1, destined for VPC 3, and another from VPC 1, destined for the internet.
Transitive routing in the cloud means having the capability to route traffic to its destination through intermediaries, such as another VPC. Unfortunately, VPC Peering Connections do not support Transitive Routing. It's a point-to-point network.
Flow A, where resources in VPC 1 want to connect to a resource in VPC 2, can communicate freely (with the essential security groups and network firewall settings completed), but if that same resource in VPC 1 wants to communicate with anything in VPC 3, it cannot. This is one of the limitations surrounding VPC Peering Connections. To get around this, we must have peering connections between all VPCs and the necessary route tables to point traffic to the correct VPC. This can quickly become complex and create a vast admin overhead. Consider the scenario where we may need to add another 5 VPCs; we then have to set up a peering connection between our existing VPCs and the new VPCs.
The same goes for Flow B. VPC 1 can freely communicate with anything in VPC 4 but cannot go out to the internet due to the lack of transitive routing. However, an option to resolve this is to have an instance in VPC 4 to proxy traffic out to the internet, thus taking the traffic off the routing plane in AWS.
There are other solutions also! For example, we could have VGWs (Virtual Private Gateways) on each of the spoke VPCs and create a VPN tunnel from those VGWs to a VPN instance (third party, such as Palo Alto, Checkpoint, etc.) inside a Transit or Egress VPC. The VPN instance could then have its own route table and route traffic to any of the spoke VPCs and to the internet. Again though, this is another unnecessary complexity in the majority of cases.
What is a Transit Gateway?
Transit Gateways (TGWs) are another method of connecting VPCs and have the benefit of being able to connect with other resources centrally. We can create multiple types of connectivity attachments with TGW, such as;
- SD-WAN/third-party network appliance.
- AWS Direct Connect gateway.
- Peering connection with another transit gateway.
- VPN connection to a transit gateway.
Transit Gateways act as a central hub/ router, allowing connectivity to & from multiple VPCs and on-premises networks. Routing with TGW works on a next-hop attachment basis. Route tables (1 or many, default or custom) are created to tell the Transit Gateway what attachment to send incoming traffic to. Having these rules/ route tables located centrally is a big win compared to Peering Connections.
Below is an example of a default route table for Transit Gateway. Each VPC's CIDR is propagated into the route table on attachment in this situation. When a route is propagated, it becomes aware of the entire subnet range inside that VPC. Another option is for static routes to target specific ranges or IP addresses.
If VPC 1 wanted to communicate to VPC 5, it would look something like this;
The VPC router in VPC 1 would look through its route table to find where to send the traffic to, in this case, the TGW attachment in VPC 1.
The TGW attachment sends this traffic to the TGW.
TGW looks through the route table and inspects looks for a match against the destination IP. If multiple matching routes exist, TGW evaluates where best to send the traffic. The priority order is as follows;
- The most specific route for the destination address.
- Routes with the same destination IP address but different targets;
- ANY static routes
- Prefix list referenced routes
- VPC propagated routes
- Direct Connect gateway propagated routes
- Transit Gateway Connect propagated routes
- Site-to-Site VPN propagated routes
TGW then forwards the traffic to the attachment in VPC 5, to be dealt with by the VPC router in VPC 5.
Each Transit Gateway can host up to 5,000 attachments, 20 different route tables, and up to 10,000 routes per route table. These can be increased via a support ticket to AWS, but adjusting your configuration and using route summarisation where applicable is recommended. In addition to this, each VPC attachment can support up to 50Gbps of traffic.
Each VPC attachment can support up to 50Gbps of traffic, but you can have more than one VPC attachment. Each VPC attachment for TGW sits in an Availability Zone. So your CIDR ranges in each AZ in a VPC would each have its own 50Gbps attachment to the Transit Gateway itself and can further manipulate traffic via the route tables in TGW to each AZ in that VPC.
Like VPC Peering connections, we can enable cross-account sharing of TGW via RAM (AWS Resource Manager) and allow connectivity that way.
With all of the above in mind, we have a massive amount of flexibility in how we design and route our traffic in a much more feasible manner, allowing the network to grow without the management overhead in comparison.
Let's look at an example diagram of a typical Transit Gateway setup.
With the above diagram, you can now see how much more straightforward, logically, the network is. The TGW is a central router/ hub, hosting the route table(s) with what information goes where (in this example, we'll say all VPCs can access everything), a VPC attachment to each VPC, including an egress VPC hosting a NAT Gateway and Internet Gateway. All VPC resources can, in theory, communicate with other VPC resources and the internet. Much simpler to look at and much easier to implement, especially if you know there will be expansion within your infrastructure in the coming months or years. Traffic flows can be amended quickly and centrally, and you should see your technical debt lower in the future, especially in more extensive networks.
It's a bit out of scope for this blog post, but we'll cover the basics. An Internet Gateway (IGW) essentially provides a one-to-one NAT for private addresses in that VPC, which in turn allows connectivity on the public internet. The IGW does not know anything about the private addresses in the other VPCs. The NAT Gateway in this scenario manages this for us, keeping track of traffic in and out of the NAT gateway and forwarding it as required to the IGW. This setup allows us to manage our internet connectivity for the entire network centrally. It can be modified to allow for central inspection before being forwarded in/ out of the network with AWS Network Manager, third-party appliances, etc.
While VPC Peering Connections are the simplest of the two, they can get complex quickly when adding more VPC intercommunication. Therefore, if you know your requirements and those won't change throughout the lifetime of the service, a Peering Connection is most suitable.
Peering Connections may be the better option if another AWS account or third party requires connectivity to your VPC. But, of course, that's if the only requirement is to access that VPC alone. Another option is to use AWS Private Link, but that's for another post.
With VPC Peering Connections, however, complexity will massively increase the more extensive your network becomes. So if you plan to add more services to your portfolio, you'd be better suited to start with Transit Gateway and build out that way, to reduce the complexity and centrally manage all your routing. For example, imagine allowing VPN connectivity between your DC and VPCs. Remember, VPC PCXs doesn't allow transitive routing, meaning you'll have to create gateways in each VPC and tunnels on your DC routers to each of these, adding further complexity and management overhead.
At this moment in time (for London, EU as the example);
- $0.06 per AWS Transit Gateway attachment per hour.
- $0.02 per GB of data processed by TGW.
VPC Peering Connection
- Free data transfer when traffic remains in the same Availability Zone between VPCs. Otherwise, it's charged at the standard in-region transfer rates. Right now, this seems to be around $0.01 per GB.
Suppose you want to extend your cloud footprint and add additional VPCs, connectivity from your on-premises DCs etc. In that case, you will do well to look into and implement Transit Gateway sooner rather than later. Remember, we have a 125 VPC Peering Connection limit per VPC and a 5,000 attachment limit per TGW.
The amount of management overhead between Peering Connections and Transit Gateway is night and day. The complexity of Peering Connections between multiple VPCs and Transit Gateway becomes apparent when looking forward to the future. TGWs allow various types of connections, such as; VPN, Direct Connect, and VPC. All are fully scalable while being centrally managed.
- Allow connectivity between all VPCs and on-premises networks.
- Allow's data transfer up to 50Gbps between VPCs.
- Centrally managed routing tables and connectivity options.
- Allow's data transfer up to 1.25Gbps per VPN tunnel, but can be increased beyond this utilising ECMP.
- Allow's adjustment of the MTU size.
- Ability to create and utilise central inspection-based architecture in your cloud infrastructure.
- Less complex with large cloud networks compared to VPC PCX's, and quickly adopt other AWS accounts into your traffic flow (if your company acquired another company, for example).
- Allow the use of Multicast.
- Can create blackholes to drop traffic where required.
- Access to "Appliance Mode" allows stateful network flows to security/ proxy devices.
- Future proof.
VPC Peering Connections
- Allow a one-to-one connection between VPCs.
- Nearly unlimited speed, only limited by the instance network speeds.
- Simple to set up for small networks but cumbersome beyond that.
- It can allow connectivity from other VPCs in different AWS accounts.