A Virtual Private Network or VPC is a private network for cloud resources that resembles a traditional network that you would operate in your own data center.
Any new AWS account includes a default VPC that can get you started quickly, to launch an EC2 instance, a small Website, etc., but it is not suitable for long term or enterprise-level setups.
Changing a VPC subnet layout is very hard once you start deploying resources and using it heavily, so you should ideally get it right (or as close as possible) from the start.
Many times, the main challenge with VPC design and subnetting is that we do not know how the workloads might look in the future, so for those scenarios, starting with a baseline of good practices is always a good idea.
In this post, we will be deploying a VPC with good practices originally proposed by AWS in its Solutions library, but we incorporated a couple of tweaks and we will be deploying it via Terraform instead.
The Target Solution
The Good Practices Breakdown (The Why’s)
First, please note there are four subnets for each group or type in different Availability Zones, this enables multi-AZ deployments for enhanced availability and reliability.
Then, for each group in particular:
Private Subnets A
- This group offers 8187 private IP addresses available per AZ, which should be enough for most workloads.
- The reason this group is bigger is that most resources you run do not need a Public IP or to serve traffic directly, following closed-open principles, you should then deploy them to private subnets.
- You deploy here working nodes in your clusters, the EC2 instances in an auto-scaling group that will serve traffic behind a load balancer, etc.
Public Subnets
- This group offers 4091 private IP addresses available per AZ, half of the Private Subnets A group given we expect to have less public facing resources.
- EC2 instances, and multiple public-facing AWS services need to be deployed here, so besides the private IP, they can also serve traffic from the Internet directly.
Private Subnets B
- This group offers 2043 private IP addresses available per AZ.
- These subnets can be used by a specific type of resources that require extra or different network security settings.
- A common use for this group of subnets is deploying databases.
- These subnets get a separate Network Access Control List (NACL), so more restrictive inbound/outbound rules can be defined specifically for the resources deployed here.
Spare Capacity
Every good design should account for future growth, so this design keeps a free range of IPs for defining subnets down the road of 2043 private IP addresses per AZ.
Terraform It!
Whenever possible, is good to leverage good modules and good work made by other people, in this case, we leverage a great VPC module from the Terraform registry, and we configure it like below to create our desired VPC:
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "3.19.0"
name = "main-vpc"
cidr = var.vpc_cidr
azs = var.azs
private_subnets = [
cidrsubnet(var.vpc_cidr, 3, 0),
cidrsubnet(var.vpc_cidr, 3, 1),
cidrsubnet(var.vpc_cidr, 3, 2),
cidrsubnet(var.vpc_cidr, 3, 3)
]
public_subnets = [
cidrsubnet(var.vpc_cidr, 4, 8),
cidrsubnet(var.vpc_cidr, 4, 9),
cidrsubnet(var.vpc_cidr, 4, 10),
cidrsubnet(var.vpc_cidr, 4, 11)
]
database_dedicated_network_acl = true
database_subnets = [
cidrsubnet(var.vpc_cidr, 5, 24),
cidrsubnet(var.vpc_cidr, 5, 25),
cidrsubnet(var.vpc_cidr, 5, 26),
cidrsubnet(var.vpc_cidr, 5, 27)
]
# A NAT Gateway allows resources in private subnets to reach the Internet to download packages, etc.
# but it has an associated cost, so only enable if you need it
enable_nat_gateway = true
single_nat_gateway = true
enable_dns_hostnames = true
enable_dns_support = true
map_public_ip_on_launch = false
enable_flow_log = true
create_flow_log_cloudwatch_log_group = true
create_flow_log_cloudwatch_iam_role = true
flow_log_max_aggregation_interval = 600
flow_log_cloudwatch_log_group_retention_in_days = 7
# Default Security Group - Restrict all ingress and egress access by default
manage_default_security_group = true
default_security_group_egress = []
default_security_group_ingress = []
}
Additionally, the configuration above makes sure the “default” Security Group in the VPC does not allow any ingress or egress traffic, this is a security good practice to ensure we do not open access to new resources by picking the “default” security group when provisioning new resources.
For new resources, a good practice around security groups is to create new ones and pair them with the resources they protect, so rules are always targeted and specific; this is again the closed-open principle (close all access by default, and only open what is needed by the workload or resource to perform its job).
Bonus: Expose VPC Related IDs as CloudFormation Outputs
We regularly work with Serverless applications that require to know the VPC and Subnet IDs, so it is useful to export the generated VPC-related IDs to CloudFormation outputs, so they can more easily be consumed from CloudFormation-based stacks and frameworks.
To achieve this, we can pair the Terraform code above with another Terraform module from the registry to create a CloudFormation stack that exposes the values:
module "vpc_stack_outputs" {
source = "github.com/labinhood/terraform-aws-stack-outputs"
name = "TerraformOutputs-MainVPC"
outputs = {
VpcId = module.vpc.vpc_id
VpcName = module.vpc.name
VpcArn = module.vpc.vpc_arn
VpcCidr = module.vpc.vpc_cidr_block
AvailabilityZone1A = module.vpc.azs[0]
AvailabilityZone1B = module.vpc.azs[1]
AvailabilityZone1C = module.vpc.azs[2]
AvailabilityZone1D = module.vpc.azs[3]
PrivateSubnet1A = module.vpc.private_subnets[0]
PrivateSubnet1B = module.vpc.private_subnets[1]
PrivateSubnet1C = module.vpc.private_subnets[2]
PrivateSubnet1D = module.vpc.private_subnets[3]
PublicSubnet1A = module.vpc.public_subnets[0]
PublicSubnet1B = module.vpc.public_subnets[1]
PublicSubnet1C = module.vpc.public_subnets[2]
PublicSubnet1D = module.vpc.public_subnets[3]
DbSubnet1A = module.vpc.database_subnets[0]
DbSubnet1B = module.vpc.database_subnets[1]
DbSubnet1C = module.vpc.database_subnets[2]
DbSubnet1D = module.vpc.database_subnets[3]
}
}
The original Stack Outputs Terraform module from the registry has a version constraint that does not work with Terraform versions > 0.12, but we found out recently the original code works well in Terraform v1.3.7, so we created a public fork on GitHub that removes such constraint, the forked module on GitHub is at:
https://github.com/labinhood/terraform-aws-stack-outputs
Conclusion
A VPC prepared for the future and with a design that allows the creation of secure, highly-available (available most of the time) and resilient (ability to recover from disruptions) solutions can be an essential component in any well-rounded architecture.
In this post, we have created a VPC with good practices originally proposed by AWS in its Solutions library, but we incorporated a couple of tweaks and deployed it via Terraform.
References
https://aws.amazon.com/solutions/implementations/vpc/
https://registry.terraform.io/modules/terraform-aws-modules/vpc/aws/latest
https://developer.hashicorp.com/terraform/language/functions/cidrsubnet