Hi, I'm ThadeusB.

I code, I play, I love, I innovate

Game Servers on Demand with Terraform, ECS, and a DNS Trick

Summer break was coming up and my kid wanted a Minecraft server so friends could join. I figured I'd spin one up on AWS with Terraform since I already had the infrastructure chops. The problem though? I didn't want to pay for an EC2 instance running 24/7 when a bunch of kids only play for a few hours a day.

So I built something that boots the server on demand when someone tries to connect, shuts it down when nobody's playing, and costs almost nothing when idle. It worked so well that I ended up adding Factorio and Satisfactory servers using the same pattern.

The DNS Auto-Start Trick

This is the centerpiece of the whole setup and I'm pretty proud of it.

When nobody is playing, the game server's DNS record in Route53 just points at 1.1.1.1. The server isn't running. There's no ECS task, no EC2 instance, nothing. The only cost is the Route53 hosted zone.

Here's what happens when someone tries to connect:

  1. The Minecraft client does a DNS lookup for minecraft.example.com
  2. Route53 query logging captures that lookup and ships it to CloudWatch Logs
  3. A CloudWatch subscription filter matches the query and triggers a Lambda function
  4. The Lambda sets the ECS service desired_count from 0 to 1
  5. ECS Fargate spins up the container, the watchdog container updates the Route53 record to point at the actual task IP
  6. SNS fires a notification so I know the server just started
  7. The player retries the connection a minute later and they're in

The beauty is that the kids don't need to ask me to start the server. They just open Minecraft, try to connect, and it boots up. They get a connection timeout the first try, wait a bit, and then they're playing.

The Terraform for the Lambda trigger looks roughly like this:

resource "aws_cloudwatch_log_subscription_filter" "minecraft_dns" {
  name            = "minecraft-dns-trigger"
  log_group_name  = aws_cloudwatch_log_group.route53_query_log.name
  filter_pattern  = "minecraft.example.com"
  destination_arn = aws_lambda_function.start_server.arn
}

resource "aws_lambda_function" "start_server" {
  function_name = "minecraft-start"
  runtime       = "python3.11"
  handler       = "index.handler"
  filename      = data.archive_file.start_server.output_path

  environment {
    variables = {
      ECS_CLUSTER = aws_ecs_cluster.games.name
      ECS_SERVICE = aws_ecs_service.minecraft.name
    }
  }
}

And the Lambda itself is dead simple:

import boto3
import os

def handler(event, context):
    ecs = boto3.client("ecs")
    service = ecs.describe_services(
        cluster=os.environ["ECS_CLUSTER"],
        services=[os.environ["ECS_SERVICE"]],
    )["services"][0]

    if service["desiredCount"] == 0:
        ecs.update_service(
            cluster=os.environ["ECS_CLUSTER"],
            service=os.environ["ECS_SERVICE"],
            desiredCount=1,
        )

It checks if the server is already running first so it doesn't thrash on repeated DNS queries.

Minecraft Server Setup

The Minecraft server runs on ECS Fargate with Spot instances. I'm using the itzg/minecraft-server Docker image, which is fantastic. It handles all the server configuration through environment variables.

The task definition runs two containers:

The Minecraft server container:- 5GB memory, 2 vCPU- Forge server on 1.20.1 with CurseForge mod integration- EFS volume for persistent world storage

The watchdog container:- Monitors the player count every few minutes- After 20 minutes of zero players, sets desired_count back to 0- Updates the Route53 record back to 1.1.1.1 on shutdown- Fires an SNS notification so I know it stopped

resource "aws_ecs_task_definition" "minecraft" {
  family                   = "minecraft"
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = 2048
  memory                   = 5120

  volume {
    name = "minecraft-data"
    efs_volume_configuration {
      file_system_id = aws_efs_file_system.minecraft.id
      root_directory = "/"
    }
  }

  container_definitions = jsonencode([
    {
      name      = "minecraft"
      image     = "itzg/minecraft-server:latest"
      essential = true
      portMappings = [
        { containerPort = 25565, protocol = "tcp" }
      ]
      environment = [
        { name = "EULA",         value = "TRUE" },
        { name = "TYPE",         value = "FORGE" },
        { name = "VERSION",      value = "1.20.1" },
        { name = "CF_API_KEY",   value = var.curseforge_api_key },
        { name = "MEMORY",       value = "4G" },
      ]
      mountPoints = [
        {
          sourceVolume  = "minecraft-data"
          containerPath = "/data"
        }
      ]
    },
    {
      name      = "watchdog"
      image     = var.watchdog_image
      essential = false
      environment = [
        { name = "GAME",             value = "minecraft" },
        { name = "IDLE_TIMEOUT_MIN", value = "20" },
        { name = "ROUTE53_ZONE_ID",  value = var.zone_id },
        { name = "ROUTE53_RECORD",   value = "minecraft.example.com" },
        { name = "SNS_TOPIC_ARN",    value = aws_sns_topic.game_alerts.arn },
      ]
    }
  ])
}

resource "aws_ecs_service" "minecraft" {
  name            = "minecraft"
  cluster         = aws_ecs_cluster.games.id
  task_definition = aws_ecs_task_definition.minecraft.arn
  desired_count   = 0
  launch_type     = "FARGATE"

  capacity_provider_strategy {
    capacity_provider = "FARGATE_SPOT"
    weight            = 1
  }

  network_configuration {
    subnets          = var.public_subnets
    security_groups  = [aws_security_group.minecraft.id]
    assign_public_ip = true
  }
}

Notice desired_count = 0. That's the resting state. Nobody's playing, nothing's running.

EFS handles world persistence so the world data survives container restarts and Spot interruptions. I set up automated backups on a schedule: hourly, daily, weekly, and monthly. Losing a kid's Minecraft world would be a disaster. Trust me on that one.

resource "aws_backup_plan" "game_data" {
  name = "game-server-backups"

  rule {
    rule_name         = "hourly"
    target_vault_name = aws_backup_vault.games.name
    schedule          = "cron(0 * * * ? *)"
    lifecycle {
      delete_after = 1
    }
  }

  rule {
    rule_name         = "daily"
    target_vault_name = aws_backup_vault.games.name
    schedule          = "cron(0 0 * * ? *)"
    lifecycle {
      delete_after = 7
    }
  }

  rule {
    rule_name         = "weekly"
    target_vault_name = aws_backup_vault.games.name
    schedule          = "cron(0 0 ? * SUN *)"
    lifecycle {
      delete_after = 30
    }
  }

  rule {
    rule_name         = "monthly"
    target_vault_name = aws_backup_vault.games.name
    schedule          = "cron(0 0 1 * ? *)"
    lifecycle {
      delete_after = 365
    }
  }
}

Adding Factorio

Since the Minecraft setup worked so well, I couldn't resist adding a Factorio server. Same DNS trick, same auto-start pattern.

Factorio needed more resources though. 8GB RAM, 4 vCPU. I'm using the factoriotools/factorio:stable image. The main differences from Minecraft:

  • Factorio uses UDP instead of TCP, so the security group and health checks had to change
  • I wrote a custom UDP probe for the health check since Fargate's built-in health checks don't support UDP
  • RCON is enabled for remote console access, which is handy for server management without SSH
resource "aws_security_group" "factorio" {
  name   = "factorio-server"
  vpc_id = var.vpc_id

  ingress {
    from_port   = 34197
    to_port     = 34197
    protocol    = "udp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = 27015
    to_port     = 27015
    protocol    = "tcp"
    cidr_blocks = [var.admin_cidr]
    description = "RCON"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Factorio runs on Fargate too, with Spot as an option I can toggle. The watchdog container uses the same logic: poll player count, shut down after idle timeout, flip the DNS record back to 1.1.1.1.

Adding Satisfactory

Satisfactory was a different beast. Fargate just couldn't cut it.

The problem though? Satisfactory needs sustained high single-thread CPU performance. Fargate's shared vCPUs don't give you that. The game would stutter and lag even with plenty of memory allocated. So I had to go with EC2.

I'm running it on an i7i.xlarge instance. Spot is explicitly disabled for this one. Satisfactory saves are enormous, the game doesn't handle interruptions gracefully, and nobody wants to lose 4 hours of factory building because AWS reclaimed a Spot instance. The cost premium is worth it.

The DNS auto-start trick still works here, but instead of setting an ECS desired_count, the Lambda starts/stops the EC2 instance directly:

import boto3
import os

def handler(event, context):
    ec2 = boto3.client("ec2")
    instance_id = os.environ["INSTANCE_ID"]

    status = ec2.describe_instances(
        InstanceIds=[instance_id]
    )["Reservations"][0]["Instances"][0]["State"]["Name"]

    if status == "stopped":
        ec2.start_instances(InstanceIds=[instance_id])

The watchdog runs as a systemd service on the EC2 instance itself instead of a sidecar container. Same idea, different execution.

Control Panels

Basically I got tired of SSH-ing into things or opening the AWS console every time I wanted to check on a server. So each game server got its own control panel.

They're Lambda-backed web UIs with a dark theme. Password-protected using HMAC-SHA256 tokens so only people who know the password can hit the start/stop buttons. Each panel shows real-time server status with polling, so you can see if the server is starting up, running, or stopped.

Nothing fancy, just functional. HTML, a little JavaScript, and a Lambda function URL. Keeps the kids from bugging me to check if the server is up.

What It Actually Costs

This is the whole point of the on-demand setup. When nobody is playing, there's no compute running. The only baseline cost is EFS storage and Route53, which runs me about $18/month across all three servers. That's it. Three game servers, persistent world data, automated backups, sitting there waiting for someone to connect.

When a server is actually running, the compute costs kick in:

ServerHourlyIf you played every day for ~3hrs
Minecraft (Fargate Spot)~$0.05-0.10/hr~$5-10/mo
Factorio (Fargate)~$0.10-0.20/hr~$10-20/mo
Satisfactory (EC2 i7i.xlarge)~$0.18/hr~$15-20/mo

In practice, I'm not playing all three games every month, and I'm definitely not running them all at once. I can't play Factorio and Satisfactory at the same time. That's kind of the whole point of the auto-shutdown. The server boots when you connect, shuts down when you leave, and you only pay for the hours you actually played.

A typical month where I'm actively playing one game might add $10-20 in compute on top of the $18 baseline. Some months I don't play any of them and it's just the storage cost.

What started as "let me spin up a Minecraft server for the kid's summer break" turned into a whole game server platform. Thankfully, the DNS auto-start pattern made it easy to replicate. Each new game server is mostly the same Terraform module with different container configs and resource allocations. The kids are happy, their friends can join whenever they want, and I got to build something fun. Sweet.