Kubernetes Deployment (EKS)
Deploy Nemesis to Amazon EKS (Elastic Kubernetes Service) for production cloud deployments, with the same Dapr operator-managed sidecars and KEDA autoscaling as k3d/k3s.
Note
For local development, use k3d or k3s instead. EKS is intended for persistent cloud deployments where you need AWS-managed infrastructure, auto-scaling node groups, and internet-accessible endpoints.
Warning
EKS incurs AWS charges (~$244/month minimum). See Cost Management and always tear down when done.
Why EKS?
| Aspect | k3d | k3s | EKS |
|---|---|---|---|
| Runtime | k3s inside Docker | Native on host | AWS-managed Kubernetes |
| Infrastructure | Local | VM / bare-metal | AWS cloud |
| Cost | Free | Free | ~$300+/month |
| Storage | Local disk | Local disk | EBS (gp3) + EFS (elastic) |
| Load balancer | Docker port mapping | ServiceLB (Klipper) | AWS NLB |
| Node scaling | Manual | Manual | Managed node groups (auto) |
| Best for | Local dev, CI | VMs, production-like | Cloud production, team use |
System requirements are the same as k3d/k3s (4 cores, 12+ GB RAM per node, 100 GB disk).
Prerequisites
Tools
| Tool | Install |
|---|---|
| AWS CLI v2 | See AWS docs |
| eksctl | See Eksctl docs |
| kubectl | See K8s docs |
| Helm | curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 \| bash |
Dapr and KEDA are installed automatically via Helm by the setup script.
AWS Account Setup
You need an AWS account with an IAM user that has programmatic access.
1. Create an IAM user (or use an existing one)
In the IAM Console, create a user with programmatic access. Attach the policy below.
2. Required IAM permissions
eksctl needs permissions to create CloudFormation stacks, EC2 instances, EKS clusters, IAM roles, and more. Attach this policy to your IAM user:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"eks:*",
"ec2:*",
"elasticfilesystem:*",
"iam:CreateRole",
"iam:DeleteRole",
"iam:AttachRolePolicy",
"iam:DetachRolePolicy",
"iam:PutRolePolicy",
"iam:DeleteRolePolicy",
"iam:GetRole",
"iam:GetRolePolicy",
"iam:PassRole",
"iam:ListRolePolicies",
"iam:ListAttachedRolePolicies",
"iam:CreateInstanceProfile",
"iam:DeleteInstanceProfile",
"iam:AddRoleToInstanceProfile",
"iam:RemoveRoleFromInstanceProfile",
"iam:GetInstanceProfile",
"iam:ListInstanceProfilesForRole",
"iam:CreateOpenIDConnectProvider",
"iam:DeleteOpenIDConnectProvider",
"iam:GetOpenIDConnectProvider",
"iam:ListOpenIDConnectProviders",
"iam:TagOpenIDConnectProvider",
"iam:CreatePolicy",
"iam:DeletePolicy",
"iam:GetPolicy",
"iam:ListEntitiesForPolicy",
"iam:CreateServiceLinkedRole",
"iam:TagRole",
"cloudformation:*",
"autoscaling:*",
"elasticloadbalancing:*",
"ssm:GetParameter",
"logs:*",
"sts:GetCallerIdentity",
"sts:DecodeAuthorizationMessage",
"kms:CreateKey",
"kms:CreateAlias",
"kms:DescribeKey",
"kms:ListAliases"
],
"Resource": "*"
}
]
}
Tip
For a quick start, you can use the AWS-managed AdministratorAccess policy instead. The policy above is a more restricted alternative for production use.
3. Create an Access Key
In user details, click the "Create access key" button, select "Command Line Interface (CLI)", check the confirmation to continue, add a description and then save the Access key and Secret access key for the next step.
4. Configure the AWS CLI
aws configure
# Enter your Access Key ID, Secret Access Key, and default region (e.g., us-east-1)
5. Verify your credentials
aws sts get-caller-identity
You should see your account ID, user ARN, and user ID.
Quick Start
# 1. Create EKS cluster with EBS CSI, EFS CSI, Traefik, Dapr, KEDA
./k8s/scripts/setup-cluster-eks.sh
# 2. Update values-eks.yaml with the NLB hostname printed by the setup script if you didn't select to regenerate values-eks.yaml
# (edit nemesis.url to match your NLB hostname)
# 3. Deploy Nemesis
./k8s/scripts/deploy.sh install --values k8s/helm/nemesis/values-eks.yaml
# 4. Verify everything is running
./k8s/scripts/verify.sh
Access Nemesis at the NLB hostname printed during setup. The setup script generates random credentials and displays them at the end -- save them!
Tip
Unlike k3d/k3s (which use the default n/n credentials), the EKS setup script automatically generates a random password with the username nemesis to avoid exposing an internet-facing deployment with guessable credentials. The generated htpasswd entry is written to values-eks.yaml.
Get the NLB hostname at any time:
kubectl get svc traefik -n kube-system -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'
Cluster Configuration
Instance Types
| Instance | vCPUs | RAM | Monthly Cost | Notes |
|---|---|---|---|---|
m7i.large |
2 | 8 GB | ~$74/node | Minimum viable, tight on resources |
m7i.xlarge |
4 | 16 GB | ~$147/node | Recommended (default) |
m7i.2xlarge |
8 | 32 GB | ~$294/node | Production with LLM/monitoring stacks |
Costs are approximate on-demand prices for us-east-1. The m7i family (Intel Sapphire Rapids, 4th Gen) offers better price-performance and 10x network throughput vs older m5 instances at a similar price point.
Node Count
The default is 1 node. The Cluster Autoscaler (installed by the setup script) automatically adds nodes when KEDA scales pods beyond what the current nodes can handle.
- 1 node (
m7i.xlarge): Default — autoscales up as needed - 2 nodes (
m7i.xlarge): Avoids cold-start delay when scaling - 3-4 nodes (
m7i.xlarge): Production with headroom for burst traffic
Environment Variable Overrides
All setup script parameters are configurable via environment variables:
# Example: 3-node cluster with m5.2xlarge in us-west-2
CLUSTER_NAME=nemesis-prod \
AWS_REGION=us-west-2 \
NODE_TYPE=m7i.2xlarge \
NODE_COUNT=3 \
NODE_MIN=3 \
NODE_MAX=6 \
EKS_VERSION=1.31 \
./k8s/scripts/setup-cluster-eks.sh
| Variable | Default | Description |
|---|---|---|
CLUSTER_NAME |
nemesis |
EKS cluster name |
AWS_REGION |
us-east-1 |
AWS region |
NODE_TYPE |
m7i.xlarge |
EC2 instance type for nodes |
NODE_COUNT |
1 |
Initial node count |
NODE_MIN |
1 |
Minimum nodes (auto-scaling) |
NODE_MAX |
4 |
Maximum nodes (auto-scaling) |
EKS_VERSION |
1.35 |
Kubernetes version |
Networking and Access
How It Works
The setup script installs Traefik as a LoadBalancer service with AWS NLB annotations. AWS automatically provisions a Network Load Balancer and assigns it a public hostname.
All Nemesis services are exposed through Traefik IngressRoute CRDs, just like k3d/k3s. The only difference is the entry point: NLB on port 443 instead of localhost:7443.
Getting the NLB Hostname
# Get the NLB hostname
kubectl get svc traefik -n kube-system -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'
Note
The NLB may take 2-5 minutes to provision after the setup script completes. DNS propagation may add another 1-2 minutes.
Restricting Access
By default, the NLB is internet-facing. To restrict access to specific IP ranges:
- Find the NLB in the EC2 Console > Load Balancers
- Go to the NLB's Security tab
- Edit inbound rules to allow only your IP ranges on ports 443
Alternatively, set the NLB to internal-only:
# Before running setup, or re-install Traefik with:
helm upgrade traefik traefik/traefik -n kube-system \
--set "service.annotations.service\\.beta\\.kubernetes\\.io/aws-load-balancer-scheme=internal"
Self-Signed TLS (Default)
The setup script generates a self-signed TLS certificate, just like k3d/k3s. Your browser will show a certificate warning, which is expected.
Optional: ACM + Route53 for Proper TLS
To use a valid TLS certificate with a custom domain:
1. Set up a Route53 hosted zone for your domain in the Route53 Console.
2. Request an ACM certificate in the ACM Console for your domain (e.g., nemesis.example.com). Use DNS validation and add the CNAME record to Route53.
3. Create a CNAME record in Route53 pointing your domain to the NLB hostname:
nemesis.example.com → abc123.elb.us-east-1.amazonaws.com
4. Use NLB TLS termination by adding the ACM certificate ARN to the Traefik service:
helm upgrade traefik traefik/traefik -n kube-system \
--set "service.annotations.service\\.beta\\.kubernetes\\.io/aws-load-balancer-ssl-cert=arn:aws:acm:us-east-1:123456789012:certificate/abc-123" \
--set "service.annotations.service\\.beta\\.kubernetes\\.io/aws-load-balancer-ssl-ports=443"
5. Update values-eks.yaml with your domain:
nemesis:
url: "https://nemesis.example.com/"
port: 443
Then redeploy: ./k8s/scripts/deploy.sh install --values k8s/helm/nemesis/values-eks.yaml
Optional: AWS Managed Services
By default, all infrastructure (PostgreSQL, RabbitMQ, SeaweedFS) runs in-cluster as pods. For production workloads, you can swap these for AWS managed services.
PostgreSQL to Amazon RDS
Create an RDS PostgreSQL instance and update your Helm values to point at it. You'll need to disable the in-cluster PostgreSQL and update connection settings:
# In a custom values file (e.g., values-eks-rds.yaml)
postgres:
# Disable the in-cluster PostgreSQL deployment
# (requires adding an `enabled` toggle to the Helm template — not yet supported)
external:
host: "your-rds-instance.abc123.us-east-1.rds.amazonaws.com"
port: "5432"
database: "enrichment"
Warning
The Helm chart does not currently support external database configuration out of the box. You would need to modify the _helpers.tpl connection string template and disable the in-cluster PostgreSQL StatefulSet. This is an advanced modification.
RabbitMQ to Amazon MQ
Amazon MQ supports RabbitMQ as a broker engine. Create an Amazon MQ broker and update the Dapr pub/sub component connection strings in the Helm chart's dapr/ templates.
Warning
Like RDS, this requires modifying the Helm chart templates to support external connection strings for the Dapr pub/sub components. This is an advanced modification.
SeaweedFS to Amazon S3
Create an S3 bucket and update the S3 credentials and endpoint in your values file:
# In a custom values file
credentials:
s3:
accessKey: "YOUR_AWS_ACCESS_KEY"
secretKey: "YOUR_AWS_SECRET_KEY"
You would also need to modify the S3 endpoint environment variables in the Helm templates to point to s3.amazonaws.com instead of the in-cluster SeaweedFS service, and disable the in-cluster SeaweedFS deployment.
Note
All three managed service integrations require Helm chart modifications that are not yet abstracted into simple values.yaml toggles. They are documented here for advanced users who want to customize their deployment.
Storage
EBS CSI Driver
The setup script automatically installs the Amazon EBS CSI driver and creates a gp3 StorageClass as the default.
The EKS values overlay (values-eks.yaml) uses 10x the base defaults for cloud use:
| Component | Base Default | EKS Default | EBS Cost/month |
|---|---|---|---|
| SeaweedFS | 50 Gi | 500 Gi | ~$40 |
| PostgreSQL | 20 Gi | 200 Gi | ~$16 |
| RabbitMQ | 10 Gi | 100 Gi | ~$8 |
| Prometheus | 10 Gi | 100 Gi | ~$8 |
| Loki | 10 Gi | 100 Gi | ~$8 |
| Jaeger | 10 Gi | 100 Gi | ~$8 |
| Grafana | 5 Gi | 50 Gi | ~$4 |
| Phoenix | 5 Gi | 50 Gi | ~$4 |
Adjust sizes in values-eks.yaml before deploying if you want smaller (cheaper) or larger volumes.
gp3 vs gp2
The setup script uses gp3 volumes, which are newer and cheaper than gp2:
- gp3: $0.08/GB/month, 3,000 baseline IOPS included
- gp2: $0.10/GB/month, IOPS scales with volume size
The script removes the default annotation from gp2 (if present) so that gp3 is used for all PVCs.
EFS CSI Driver (Mounted Containers)
The setup script automatically installs the Amazon EFS CSI driver and creates an encrypted EFS filesystem for large file processing (disk images, ZIPs, etc.).
What it does: Nemesis's container_monitor (in the web-api service) watches a /mounted-containers directory for large files and processes them. On Docker Compose this is a simple bind mount. On EKS, the setup script provisions an AWS EFS filesystem that:
- Supports
ReadWriteManyso multiple pods can access it simultaneously - Can be mounted from outside the cluster (operators copying large files via NFS or a bastion host)
- Is elastic (no pre-provisioned size needed, you only pay for what you store)
- Is encrypted at rest
This is automatically enabled when you run setup-cluster-eks.sh. The script creates the EFS filesystem, mount targets in each subnet, a security group allowing NFS access from cluster nodes, and writes the configuration to values-eks.yaml.
Note
k3d and k3s deployments are unaffected. The mountedContainers feature is disabled by default in values.yaml and only enabled in the generated values-eks.yaml.
How to Use
After deploying to EKS, the /mounted-containers directory is mounted inside the web-api pod. To copy files for processing:
Option 1: kubectl cp (simplest)
# Copy a file into the mounted-containers volume
kubectl cp /path/to/disk-image.vmdk web-api-<pod-id>:/mounted-containers/ -n nemesis
Option 2: Mount EFS on a bastion host
Mount the EFS filesystem on an EC2 instance in the same VPC using the NFS protocol:
# Get the EFS filesystem ID from values-eks.yaml
grep efsFileSystemId k8s/helm/nemesis/values-eks.yaml
# On the bastion host (install amazon-efs-utils first):
sudo mount -t efs <fs-id>:/ /mnt/efs
# Copy files
cp /path/to/disk-image.vmdk /mnt/efs/
The container_monitor will automatically detect new files and begin processing them.
Cleanup Behavior
By default, source files are deleted from EFS after successful processing (cleanupAfterProcessing: true). This prevents large files from accumulating on the elastic filesystem and driving up EFS costs.
To keep source files for forensic reference or re-processing, set cleanupAfterProcessing: false in your values file:
mountedContainers:
enabled: true
cleanupAfterProcessing: false # move to completed/ instead of deleting
When disabled, processed files are moved to a completed/ subdirectory within the mounted volume instead of being deleted. You are responsible for manually cleaning up completed/ to manage EFS storage costs.
Verifying EFS is Working
# Check the PVC is bound
kubectl get pvc mounted-containers -n nemesis
# Check the volume is mounted in web-api
kubectl exec deployment/web-api -n nemesis -- df -h /mounted-containers
# Check container_monitor logs
kubectl logs deployment/web-api -n nemesis | grep -i "container_monitor\|mounted"
Disabling Mounted Containers
If you don't need large file processing, you can disable it by editing values-eks.yaml before deploying:
mountedContainers:
enabled: false
Or remove the mountedContainers section entirely. When disabled, no PVC is created, no volumeMount is added to web-api, and the container_monitor gracefully skips startup.
EFS Costs
EFS pricing is pay-per-use (~$0.30/GB/month for Standard storage class in us-east-1). There is no minimum or pre-provisioned size. An empty filesystem costs nothing. See EFS Pricing for details.
Cost Management
Estimated Monthly Costs
| Resource | Cost |
|---|---|
| EKS control plane | ~$73 |
| 1x m7i.xlarge node (default) | ~$147 |
| EBS storage (80 Gi gp3) | ~$6 |
| EFS (mounted containers) | ~$0 (pay per GB stored) |
| Network Load Balancer | ~$18 base + data |
| Total (1 node) | ~$244/month |
Additional nodes added by the Cluster Autoscaler cost ~$147/month each (m7i.xlarge on-demand).
Costs vary by region. Use the AWS Pricing Calculator for exact estimates.
Cost Reduction Tips
- Spot instances: Add
--spotto the eksctl node group for up to 70% savings (but nodes can be reclaimed) - Smaller instances: Use
m7i.largefor testing (reduces node cost by ~50%) - Single node: Set
NODE_COUNT=1 NODE_MIN=1for minimal testing - Scheduled scaling: Scale node group to 0 outside business hours via AWS Console or CLI
- Reserved instances: Commit to 1-year or 3-year terms for 30-60% savings on node costs
Teardown
Danger
Always tear down your EKS cluster when you're done to avoid ongoing AWS charges. A forgotten cluster costs ~$236+/month (more if autoscaled).
# Full teardown: remove Helm releases, IAM resources, and EKS cluster
./k8s/scripts/teardown-cluster-eks.sh
# Remove only Helm releases, keep the EKS cluster running
./k8s/scripts/teardown-cluster-eks.sh --keep-cluster
# Skip confirmation prompt
./k8s/scripts/teardown-cluster-eks.sh --yes
Verify No Lingering Resources
After teardown, verify no AWS resources remain:
# Check for orphaned EBS volumes
aws ec2 describe-volumes --region us-east-1 \
--filters Name=tag-key,Values=kubernetes.io/cluster/nemesis \
--query 'Volumes[].{ID:VolumeId,State:State,Size:Size}' --output table
# Check for orphaned EFS filesystems
aws efs describe-file-systems --region us-east-1 \
--query "FileSystems[?Tags[?Key=='kubernetes.io/cluster/nemesis']].{ID:FileSystemId,State:LifeCycleState,Name:Name}" --output table
# Check for orphaned load balancers
aws elbv2 describe-load-balancers --region us-east-1 \
--query 'LoadBalancers[?contains(LoadBalancerName, `nemesis`)].{Name:LoadBalancerName,DNS:DNSName}' --output table
# Check CloudFormation stacks
aws cloudformation list-stacks --region us-east-1 \
--query 'StackSummaries[?contains(StackName, `nemesis`) && StackStatus!=`DELETE_COMPLETE`].{Name:StackName,Status:StackStatus}' --output table
If you find orphaned EBS volumes, delete them manually:
aws ec2 delete-volume --volume-id vol-0123456789abcdef0 --region us-east-1
Operations
Operations are the same as k3d/k3s. See the Kubernetes Deployment guide for details.
Check Status
kubectl get pods -n nemesis
kubectl get svc -n nemesis
./k8s/scripts/deploy.sh status
View Logs
kubectl logs -f deployment/web-api -n nemesis
kubectl logs -f deployment/file-enrichment -n nemesis
kubectl logs -f deployment/file-enrichment -c daprd -n nemesis # Dapr sidecar
Run Helm Tests
helm test nemesis -n nemesis
Troubleshooting
Nodes not joining / NotReady
Check the node group status in the AWS Console (EKS > Clusters > nemesis > Compute). Common causes:
- IAM permissions: The node IAM role may be missing permissions. eksctl normally handles this, but verify the role exists.
- Subnet capacity: The VPC subnets may not have enough IP addresses. eksctl creates a new VPC by default.
kubectl get nodes
kubectl describe node <node-name>
Mounted-containers PVC stuck in Pending
The mounted-containers PVC uses the EFS CSI driver (not EBS). If it's stuck in Pending:
# Check if the EFS CSI driver pods are running
kubectl get pods -n kube-system -l app.kubernetes.io/name=aws-efs-csi-driver
# Check the EFS CSI driver addon status
aws eks describe-addon --cluster-name nemesis --addon-name aws-efs-csi-driver --region us-east-1
# Check the efs-sc StorageClass exists and has the correct fileSystemId
kubectl get storageclass efs-sc -o yaml
# Verify EFS mount targets are available
aws efs describe-mount-targets --file-system-id <fs-id> --region us-east-1
Common causes:
- EFS CSI driver not installed: Re-run
setup-cluster-eks.sh(it's idempotent) - Security group misconfigured: The
efs-<cluster-name>security group must allow TCP 2049 from the cluster security group - Mount targets not ready: Mount targets take 1-2 minutes to become available after creation
EBS PVCs stuck in Pending
Almost always caused by a missing or broken EBS CSI driver:
# Check if the EBS CSI driver pods are running
kubectl get pods -n kube-system -l app.kubernetes.io/name=aws-ebs-csi-driver
# Check the EBS CSI driver addon status
aws eks describe-addon --cluster-name nemesis --addon-name aws-ebs-csi-driver --region us-east-1
# Check the gp3 StorageClass exists
kubectl get storageclass
If the driver is missing, re-run the setup script (it's idempotent):
./k8s/scripts/setup-cluster-eks.sh
Image pull errors from ghcr.io
EKS nodes pull images from the internet by default. If pulls fail:
- Verify the nodes have internet access (NAT Gateway in the VPC)
- Check if ghcr.io is accessible:
kubectl run test --image=ghcr.io/specterops/nemesis/web-api:latest --restart=Never - For private registries, add
imagePullSecretsto the namespace
LoadBalancer stuck in Pending
The Traefik LoadBalancer service creates an AWS NLB. If it stays in Pending:
kubectl describe svc traefik -n kube-system
Common causes:
- Subnet tags missing: eksctl normally tags subnets, but verify the public subnets have
kubernetes.io/role/elb=1 - IAM permissions: The cluster needs permissions to create load balancers
- Quota limits: Check your NLB quota in the AWS Console (Service Quotas)
eksctl timeouts (CloudFormation)
Cluster creation takes 15-20 minutes. If it times out:
# Check CloudFormation stack status
aws cloudformation describe-stacks --region us-east-1 \
--query 'Stacks[?contains(StackName, `nemesis`)].{Name:StackName,Status:StackStatus}'
# View stack events for error details
aws cloudformation describe-stack-events --region us-east-1 \
--stack-name eksctl-nemesis-cluster \
--query 'StackEvents[?ResourceStatus==`CREATE_FAILED`].{Resource:LogicalResourceId,Reason:ResourceStatusReason}'
Common causes:
- Region capacity: Try a different region or instance type
- Service quotas: Check VPC, EIP, and EC2 instance limits in AWS Service Quotas
- IAM permissions: Verify your user has the permissions listed in Prerequisites