Game Servers on Demand with Terraform, ECS, and a DNS Trick
Summer break was coming up and my kid wanted a Minecraft server so friends could join. I figured I'd spin one up on AWS with Terraform since I already had the infrastructure chops. The problem though? I didn't want to pay for an EC2 instance running 24/7 when a bunch of kids only play for a few hours a day.
So I built something that boots the server on demand when someone tries to connect, shuts it down when nobody's playing, and costs almost nothing when idle. It worked so well that I ended up adding Factorio and Satisfactory servers using the same pattern.
The DNS Auto-Start Trick
This is the centerpiece of the whole setup and I'm pretty proud of it.
When nobody is playing, the game server's DNS record in Route53 just points at 1.1.1.1. The server isn't running. There's no ECS task, no EC2 instance, nothing. The only cost is the Route53 hosted zone.
Here's what happens when someone tries to connect:
- The Minecraft client does a DNS lookup for
minecraft.example.com - Route53 query logging captures that lookup and ships it to CloudWatch Logs
- A CloudWatch subscription filter matches the query and triggers a Lambda function
- The Lambda sets the ECS service
desired_countfrom 0 to 1 - ECS Fargate spins up the container, the watchdog container updates the Route53 record to point at the actual task IP
- SNS fires a notification so I know the server just started
- The player retries the connection a minute later and they're in
The beauty is that the kids don't need to ask me to start the server. They just open Minecraft, try to connect, and it boots up. They get a connection timeout the first try, wait a bit, and then they're playing.
The Terraform for the Lambda trigger looks roughly like this:
resource "aws_cloudwatch_log_subscription_filter" "minecraft_dns" {
name = "minecraft-dns-trigger"
log_group_name = aws_cloudwatch_log_group.route53_query_log.name
filter_pattern = "minecraft.example.com"
destination_arn = aws_lambda_function.start_server.arn
}
resource "aws_lambda_function" "start_server" {
function_name = "minecraft-start"
runtime = "python3.11"
handler = "index.handler"
filename = data.archive_file.start_server.output_path
environment {
variables = {
ECS_CLUSTER = aws_ecs_cluster.games.name
ECS_SERVICE = aws_ecs_service.minecraft.name
}
}
}
And the Lambda itself is dead simple:
import boto3
import os
def handler(event, context):
ecs = boto3.client("ecs")
service = ecs.describe_services(
cluster=os.environ["ECS_CLUSTER"],
services=[os.environ["ECS_SERVICE"]],
)["services"][0]
if service["desiredCount"] == 0:
ecs.update_service(
cluster=os.environ["ECS_CLUSTER"],
service=os.environ["ECS_SERVICE"],
desiredCount=1,
)
It checks if the server is already running first so it doesn't thrash on repeated DNS queries.
Minecraft Server Setup
The Minecraft server runs on ECS Fargate with Spot instances. I'm using the itzg/minecraft-server Docker image, which is fantastic. It handles all the server configuration through environment variables.
The task definition runs two containers:
The Minecraft server container:- 5GB memory, 2 vCPU- Forge server on 1.20.1 with CurseForge mod integration- EFS volume for persistent world storage
The watchdog container:- Monitors the player count every few minutes- After 20 minutes of zero players, sets desired_count back to 0- Updates the Route53 record back to 1.1.1.1 on shutdown- Fires an SNS notification so I know it stopped
resource "aws_ecs_task_definition" "minecraft" {
family = "minecraft"
requires_compatibilities = ["FARGATE"]
network_mode = "awsvpc"
cpu = 2048
memory = 5120
volume {
name = "minecraft-data"
efs_volume_configuration {
file_system_id = aws_efs_file_system.minecraft.id
root_directory = "/"
}
}
container_definitions = jsonencode([
{
name = "minecraft"
image = "itzg/minecraft-server:latest"
essential = true
portMappings = [
{ containerPort = 25565, protocol = "tcp" }
]
environment = [
{ name = "EULA", value = "TRUE" },
{ name = "TYPE", value = "FORGE" },
{ name = "VERSION", value = "1.20.1" },
{ name = "CF_API_KEY", value = var.curseforge_api_key },
{ name = "MEMORY", value = "4G" },
]
mountPoints = [
{
sourceVolume = "minecraft-data"
containerPath = "/data"
}
]
},
{
name = "watchdog"
image = var.watchdog_image
essential = false
environment = [
{ name = "GAME", value = "minecraft" },
{ name = "IDLE_TIMEOUT_MIN", value = "20" },
{ name = "ROUTE53_ZONE_ID", value = var.zone_id },
{ name = "ROUTE53_RECORD", value = "minecraft.example.com" },
{ name = "SNS_TOPIC_ARN", value = aws_sns_topic.game_alerts.arn },
]
}
])
}
resource "aws_ecs_service" "minecraft" {
name = "minecraft"
cluster = aws_ecs_cluster.games.id
task_definition = aws_ecs_task_definition.minecraft.arn
desired_count = 0
launch_type = "FARGATE"
capacity_provider_strategy {
capacity_provider = "FARGATE_SPOT"
weight = 1
}
network_configuration {
subnets = var.public_subnets
security_groups = [aws_security_group.minecraft.id]
assign_public_ip = true
}
}
Notice desired_count = 0. That's the resting state. Nobody's playing, nothing's running.
EFS handles world persistence so the world data survives container restarts and Spot interruptions. I set up automated backups on a schedule: hourly, daily, weekly, and monthly. Losing a kid's Minecraft world would be a disaster. Trust me on that one.
resource "aws_backup_plan" "game_data" {
name = "game-server-backups"
rule {
rule_name = "hourly"
target_vault_name = aws_backup_vault.games.name
schedule = "cron(0 * * * ? *)"
lifecycle {
delete_after = 1
}
}
rule {
rule_name = "daily"
target_vault_name = aws_backup_vault.games.name
schedule = "cron(0 0 * * ? *)"
lifecycle {
delete_after = 7
}
}
rule {
rule_name = "weekly"
target_vault_name = aws_backup_vault.games.name
schedule = "cron(0 0 ? * SUN *)"
lifecycle {
delete_after = 30
}
}
rule {
rule_name = "monthly"
target_vault_name = aws_backup_vault.games.name
schedule = "cron(0 0 1 * ? *)"
lifecycle {
delete_after = 365
}
}
}
Adding Factorio
Since the Minecraft setup worked so well, I couldn't resist adding a Factorio server. Same DNS trick, same auto-start pattern.
Factorio needed more resources though. 8GB RAM, 4 vCPU. I'm using the factoriotools/factorio:stable image. The main differences from Minecraft:
- Factorio uses UDP instead of TCP, so the security group and health checks had to change
- I wrote a custom UDP probe for the health check since Fargate's built-in health checks don't support UDP
- RCON is enabled for remote console access, which is handy for server management without SSH
resource "aws_security_group" "factorio" {
name = "factorio-server"
vpc_id = var.vpc_id
ingress {
from_port = 34197
to_port = 34197
protocol = "udp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 27015
to_port = 27015
protocol = "tcp"
cidr_blocks = [var.admin_cidr]
description = "RCON"
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
Factorio runs on Fargate too, with Spot as an option I can toggle. The watchdog container uses the same logic: poll player count, shut down after idle timeout, flip the DNS record back to 1.1.1.1.
Adding Satisfactory
Satisfactory was a different beast. Fargate just couldn't cut it.
The problem though? Satisfactory needs sustained high single-thread CPU performance. Fargate's shared vCPUs don't give you that. The game would stutter and lag even with plenty of memory allocated. So I had to go with EC2.
I'm running it on an i7i.xlarge instance. Spot is explicitly disabled for this one. Satisfactory saves are enormous, the game doesn't handle interruptions gracefully, and nobody wants to lose 4 hours of factory building because AWS reclaimed a Spot instance. The cost premium is worth it.
The DNS auto-start trick still works here, but instead of setting an ECS desired_count, the Lambda starts/stops the EC2 instance directly:
import boto3
import os
def handler(event, context):
ec2 = boto3.client("ec2")
instance_id = os.environ["INSTANCE_ID"]
status = ec2.describe_instances(
InstanceIds=[instance_id]
)["Reservations"][0]["Instances"][0]["State"]["Name"]
if status == "stopped":
ec2.start_instances(InstanceIds=[instance_id])
The watchdog runs as a systemd service on the EC2 instance itself instead of a sidecar container. Same idea, different execution.
Control Panels
Basically I got tired of SSH-ing into things or opening the AWS console every time I wanted to check on a server. So each game server got its own control panel.
They're Lambda-backed web UIs with a dark theme. Password-protected using HMAC-SHA256 tokens so only people who know the password can hit the start/stop buttons. Each panel shows real-time server status with polling, so you can see if the server is starting up, running, or stopped.
Nothing fancy, just functional. HTML, a little JavaScript, and a Lambda function URL. Keeps the kids from bugging me to check if the server is up.
What It Actually Costs
This is the whole point of the on-demand setup. When nobody is playing, there's no compute running. The only baseline cost is EFS storage and Route53, which runs me about $18/month across all three servers. That's it. Three game servers, persistent world data, automated backups, sitting there waiting for someone to connect.
When a server is actually running, the compute costs kick in:
| Server | Hourly | If you played every day for ~3hrs |
|---|---|---|
| Minecraft (Fargate Spot) | ~$0.05-0.10/hr | ~$5-10/mo |
| Factorio (Fargate) | ~$0.10-0.20/hr | ~$10-20/mo |
| Satisfactory (EC2 i7i.xlarge) | ~$0.18/hr | ~$15-20/mo |
In practice, I'm not playing all three games every month, and I'm definitely not running them all at once. I can't play Factorio and Satisfactory at the same time. That's kind of the whole point of the auto-shutdown. The server boots when you connect, shuts down when you leave, and you only pay for the hours you actually played.
A typical month where I'm actively playing one game might add $10-20 in compute on top of the $18 baseline. Some months I don't play any of them and it's just the storage cost.
What started as "let me spin up a Minecraft server for the kid's summer break" turned into a whole game server platform. Thankfully, the DNS auto-start pattern made it easy to replicate. Each new game server is mostly the same Terraform module with different container configs and resource allocations. The kids are happy, their friends can join whenever they want, and I got to build something fun. Sweet.