Skip to content
Go back

Kubernetes Guide Part 5: Production EKS & AWS Integration

Edit page

AWS EKS Deep Dive: From Manual Setup to Production-Ready Kubernetes Clusters

When I first looked at AWS’s container orchestration options, I felt like I was staring at a menu in a foreign language. ECS? EKS? Fargate? EC2? What’s the difference, and which one should I actually use?

After weeks of hands-on experimentation—creating clusters manually through the console, wrestling with IAM roles, debugging the infamous cluster autoscaler, and eventually discovering the magic of eksctl—I finally have clarity. Today, I’m sharing everything I learned about deploying production-ready Kubernetes on AWS, complete with the mistakes I made so you don’t have to.

Understanding AWS’s Container Landscape: A High-Level Overview

Before diving into EKS, let’s map out the entire AWS container ecosystem. Think of this as choosing your adventure based on your needs:

The Four Main Paths

1. ECS + EC2: The AWS-Native Approach

Amazon Elastic Container Service (ECS) with EC2 instances.

What it is: AWS’s proprietary container orchestration service running on EC2 instances you manage.

When to use:

Trade-offs:

2. ECS + Fargate: The Serverless Container Dream

ECS with Fargate compute engine.

What it is: Run containers without managing any servers. AWS handles all infrastructure.

When to use:

Trade-offs:

3. EKS (Elastic Kubernetes Service): The Kubernetes Standard

Managed Kubernetes control plane with self-managed or managed worker nodes.

What it is: Fully managed Kubernetes control plane + your choice of worker node management.

When to use:

Management levels:

Trade-offs:

4. ECR (Elastic Container Registry): Your Private Docker Hub

AWS’s Docker image registry.

What it is: Secure, scalable, and reliable registry to store and manage your container images.

Why you need it: Whether you use ECS or EKS, you’ll push your Docker images to ECR and pull them during deployment. Think of it as your team’s private DockerHub.

My Recommendation

For learning and production Kubernetes deployments, EKS with Managed Node Groups is the sweet spot:

Now let’s build one.

Creating an EKS Cluster Manually (The Console Way)

Before we use automation tools, it’s crucial to understand what’s happening under the hood. Let’s create an EKS cluster through the AWS Management Console.

Step 1: Create an IAM Role for the EKS Cluster

The EKS control plane needs permissions to manage AWS resources on your behalf (creating load balancers, managing ENIs, etc.).

In the AWS Console:

  1. Navigate to IAMRolesCreate role
  2. Select AWS ServiceEKSEKS - Cluster
  3. AWS automatically attaches the required policy: AmazonEKSClusterPolicy
  4. Name it: EKS-Cluster-Role
  5. Create the role

What this role does: Allows EKS to make API calls to AWS services like EC2, Elastic Load Balancing, and CloudWatch on your behalf.

Step 2: Create a VPC for Your Cluster

EKS requires a VPC with specific networking configurations (public and private subnets, route tables, NAT gateways). Rather than creating this manually, AWS provides a CloudFormation template.

Why CloudFormation? Creating a production-ready VPC manually involves:

That’s 30+ resources. CloudFormation does this in 5 minutes.

Steps:

  1. Go to CloudFormationCreate stackWith new resources

  2. For the template URL, use the official AWS EKS VPC template:

    https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2020-10-29/amazon-eks-vpc-private-subnets.yaml

    Or find the latest at: AWS EKS VPC CloudFormation Templates

  3. Stack name: eks-vpc-stack

  4. Create the stack

Check the Outputs tab after creation—you’ll need the VPC ID and subnet IDs for cluster creation.

Public vs. Private Subnets:

This is a best practice architecture—worker nodes in private subnets access the internet via NAT Gateway for pulling images, while LoadBalancers in public subnets serve traffic.

Step 3: Create the EKS Cluster

  1. Navigate to EKSClustersCreate cluster

  2. Configuration:

    • Cluster name: my-eks-cluster
    • Kubernetes version: 1.33 (use the latest stable version)
    • Cluster service role: Select the EKS-Cluster-Role you created
  3. Networking:

    • VPC: Select the VPC from your CloudFormation stack
    • Subnets: Select all subnets (both public and private)
    • Security groups: Use the default security group created
    • Cluster endpoint access:
      • Public and Private (recommended): Control plane accessible from both internet (for your laptop) and within VPC (for nodes)
      • Public only: Less secure
      • Private only: Very secure but requires VPN/bastion host to manage cluster
  4. Add-ons (optional but recommended):

    • CoreDNS: DNS service for Kubernetes
    • kube-proxy: Network proxy running on each node
    • VPC CNI: Networking plugin for pod IP assignment

    These are essential components. Install them unless you have specific reasons not to.

  5. Create the cluster (takes 10-15 minutes)

Step 4: Connect to Your Cluster

Once the cluster is active, configure kubectl to connect:

# Verify AWS CLI is configured
aws configure list

# Update kubeconfig to include your new cluster
aws eks update-kubeconfig --name my-eks-cluster --region us-east-1

# Verify connection
kubectl cluster-info
kubectl get svc

You should see the Kubernetes API server endpoint. At this point, your control plane is ready, but you have zero worker nodes—no place to run workloads yet.

Creating Worker Nodes (Managed Node Groups)

The control plane is the brain; worker nodes are the muscles. Let’s add compute capacity.

Step 1: Create an IAM Role for Worker Nodes

Worker nodes need permissions to:

In IAM Console:

  1. Create roleAWS ServiceEC2
  2. Attach these three policies:
    • AmazonEKSWorkerNodePolicy (core EKS permissions)
    • AmazonEC2ContainerRegistryReadOnly (pull images from ECR)
    • AmazonEKS_CNI_Policy (networking for pods)
  3. Name: EKS-Worker-Node-Role
  4. Create role

Step 2: Create a Managed Node Group

In your EKS cluster:

  1. Compute tab → Add node group
  2. Configuration:
    • Node group name: eks-nodegroup-1
    • Node IAM role: EKS-Worker-Node-Role
  3. Compute configuration:
    • AMI type: Amazon Linux 2 (optimized for EKS)
    • Instance type: t3.medium (2 vCPU, 4GB RAM - good starting point)
    • Disk size: 20 GB
  4. Scaling configuration:
    • Desired size: 2 nodes
    • Minimum size: 1 node
    • Maximum size: 4 nodes
  5. Remote access (optional but recommended for troubleshooting):
    • Enable SSH access
    • Select your EC2 key pair
    • Specify allowed SSH source (your IP)
  6. Create node group

Wait 5-10 minutes. Verify with:

kubectl get nodes

You should see 2 nodes in Ready state. Congratulations! You now have a functional EKS cluster.

Auto Scaling: Teaching Your Cluster to Grow and Shrink

Static node counts are fine for learning, but production workloads need dynamic scaling. When traffic spikes, you need more nodes. When traffic drops, you want to save money by scaling down.

Enter the Kubernetes Cluster Autoscaler.

How It Works

  1. You deploy a pod that requires more resources than available
  2. Pod remains in Pending state
  3. Cluster Autoscaler detects this
  4. Autoscaler calls AWS Auto Scaling Group API to increase desired capacity
  5. New EC2 instance joins cluster
  6. Pod gets scheduled

Reverse happens when nodes are underutilized for 10+ minutes.

The Architecture: IRSA (IAM Roles for Service Accounts)

Here’s where it gets sophisticated. The Cluster Autoscaler pod needs AWS permissions to modify Auto Scaling Groups. Instead of giving every pod on the node these permissions (overly broad), we use IRSA:

This is secure, granular, and follows the principle of least privilege.

Setup Guide (Step-by-Step)

I’m going to share the complete, working setup that finally worked after hours of debugging. This is based on the comprehensive troubleshooting guide I pieced together.

1. Get Cluster Information

# Set variables (replace with your values)
export CLUSTER_NAME="my-eks-cluster"
export AWS_REGION="us-east-1"

# Get AWS account ID
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
echo "AWS Account ID: $AWS_ACCOUNT_ID"

# Get OIDC provider URL
export OIDC_URL=$(aws eks describe-cluster --name $CLUSTER_NAME --region $AWS_REGION --query "cluster.identity.oidc.issuer" --output text)
echo "OIDC URL: $OIDC_URL"

# Extract OIDC ID
export OIDC_ID=$(echo $OIDC_URL | cut -d '/' -f 5)
echo "OIDC ID: $OIDC_ID"

2. Create OIDC Provider (If Not Exists)

# Check if it exists
aws iam list-open-id-connect-providers | grep $OIDC_ID

# If not, create it
aws iam create-open-id-connect-provider \
  --url $OIDC_URL \
  --client-id-list sts.amazonaws.com \
  --thumbprint-list 9e99a48a9960b14926bb7f3b02e22da2b0ab7280

3. Create IAM Policy for Autoscaler

cat > cluster-autoscaler-policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:DescribeAutoScalingInstances",
        "autoscaling:DescribeLaunchConfigurations",
        "autoscaling:DescribeScalingActivities",
        "autoscaling:DescribeTags",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeLaunchTemplateVersions"
      ],
      "Resource": ["*"]
    },
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:SetDesiredCapacity",
        "autoscaling:TerminateInstanceInAutoScalingGroup"
      ],
      "Resource": ["*"]
    }
  ]
}
EOF

# Create the policy
aws iam create-policy \
  --policy-name AmazonEKSClusterAutoscalerPolicy \
  --policy-document file://cluster-autoscaler-policy.json

4. Create IAM Role with Trust Policy

cat > trust-policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/oidc.eks.${AWS_REGION}.amazonaws.com/id/${OIDC_ID}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.${AWS_REGION}.amazonaws.com/id/${OIDC_ID}:sub": "system:serviceaccount:kube-system:cluster-autoscaler",
          "oidc.eks.${AWS_REGION}.amazonaws.com/id/${OIDC_ID}:aud": "sts.amazonaws.com"
        }
      }
    }
  ]
}
EOF

# Create the role
aws iam create-role \
  --role-name EKSClusterAutoscalerRole \
  --assume-role-policy-document file://trust-policy.json

# Attach the policy
aws iam attach-role-policy \
  --role-name EKSClusterAutoscalerRole \
  --policy-arn arn:aws:iam::${AWS_ACCOUNT_ID}:policy/AmazonEKSClusterAutoscalerPolicy

5. Download and Customize the Manifest

# Download official manifest
wget https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

Edit the file and make these critical changes:

A. Add IAM role annotation to ServiceAccount:

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
  name: cluster-autoscaler
  namespace: kube-system
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::YOUR_ACCOUNT_ID:role/EKSClusterAutoscalerRole

B. Add pod annotation to prevent eviction:

spec:
  template:
    metadata:
      labels:
        app: cluster-autoscaler
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "false"

C. Add AWS region environment variable:

containers:
- image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.33.0
  name: cluster-autoscaler
  env:
  - name: AWS_REGION
    value: "us-east-1"

D. Update command arguments:

command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-eks-cluster
- --balance-similar-node-groups
- --skip-nodes-with-system-pods=false

E. Use the correct image version:

Match your Kubernetes version:

KubernetesAutoscaler Image
1.33v1.33.0
1.32v1.32.0
1.31v1.31.0
image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.33.0

6. Tag Your Auto Scaling Groups

# Find your ASG name
ASG_NAME=$(aws autoscaling describe-auto-scaling-groups \
  --query "AutoScalingGroups[?contains(Tags[?Key=='eks:cluster-name'].Value, '$CLUSTER_NAME')].AutoScalingGroupName" \
  --output text)

# Add required tags
aws autoscaling create-or-update-tags \
  --tags \
  ResourceId=$ASG_NAME,ResourceType=auto-scaling-group,Key=k8s.io/cluster-autoscaler/enabled,Value=true,PropagateAtLaunch=false \
  ResourceId=$ASG_NAME,ResourceType=auto-scaling-group,Key=k8s.io/cluster-autoscaler/$CLUSTER_NAME,Value=owned,PropagateAtLaunch=false

These tags are how the autoscaler discovers which Auto Scaling Groups it can modify.

7. Deploy the Autoscaler

kubectl apply -f cluster-autoscaler-autodiscover.yaml

# Verify it's running
kubectl get pods -n kube-system -l app=cluster-autoscaler

# Check logs for success messages
kubectl logs -n kube-system -l app=cluster-autoscaler --tail=50

Success indicators in logs:

Testing the Autoscaler

Deploy a workload that needs more resources than available:

# test-autoscaling.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: nginx
# Deploy
kubectl apply -f test-autoscaling.yaml

# Get LoadBalancer URL
kubectl get svc nginx
# Wait for EXTERNAL-IP, then visit it in browser

# Scale up to trigger autoscaling
kubectl scale deployment nginx --replicas=10

# Watch the autoscaler in action
kubectl logs -n kube-system -l app=cluster-autoscaler -f

# Watch nodes being added
kubectl get nodes -w

You’ll see new nodes join the cluster within 2-3 minutes!

Common Errors I Encountered (And How to Fix Them)

Error 1: NoCredentialProviders

Symptom: CrashLoopBackOff, logs show “NoCredentialProviders: no valid providers”

Cause: OIDC provider not created or service account annotation missing

Fix: Verify OIDC provider exists and service account has the IAM role annotation

Error 2: ImagePullBackOff

Symptom: Pod won’t start, “image pull error”

Cause: Wrong autoscaler version for your Kubernetes version

Fix: Check Kubernetes version with kubectl version and use matching autoscaler image

Error 3: No Auto Scaling Groups Found

Symptom: Logs show “0 ASG found”

Cause: Missing ASG tags or wrong cluster name in command arguments

Fix: Ensure ASG has both required tags and cluster name matches exactly

EKS with Fargate: The Serverless Kubernetes Experience

Want to run Kubernetes pods without managing any EC2 instances? That’s Fargate.

How Fargate Works with EKS

When to Use Fargate

Good fit:

Not ideal:

Setting Up EKS with Fargate

1. Create IAM Role for Fargate

In IAM Console:

  1. Create roleAWS ServiceEKSEKS - Fargate Pod
  2. AWS attaches the policy: AmazonEKSFargatePodExecutionRolePolicy
  3. Name: EKS-Fargate-Pod-Role

2. Create a Fargate Profile

In your EKS cluster:

  1. Compute tab → Fargate profilesCreate profile
  2. Configuration:
    • Name: fargate-profile-1
    • Pod execution role: EKS-Fargate-Pod-Role
    • Subnets: Select private subnets only (Fargate requires private subnets)
  3. Pod selectors:
    • Namespace: fargate
    • Labels (optional): Can match specific label selectors

This means: “Any pod deployed to the fargate namespace will run on Fargate.”

3. Deploy a Workload to Fargate

# Create the namespace
kubectl create namespace fargate
# nginx-fargate.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-fg
  namespace: fargate
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-fg
  template:
    metadata:
      labels:
        app: nginx-fg
    spec:
      containers:
      - name: nginx-fg
        image: nginx
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-fg
  namespace: fargate
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: nginx-fg
kubectl apply -f nginx-fargate.yaml

# Watch the pod start
kubectl get pods -n fargate -w

# Get LoadBalancer URL
kubectl get svc -n fargate nginx-fg

Notice the pod startup is slightly slower (30-60 seconds) because AWS is provisioning the Fargate task.

Verify it’s on Fargate:

kubectl get pod -n fargate <pod-name> -o wide

You won’t see a traditional node name—it’ll show a Fargate node identifier.

The Fast Track: Creating EKS Clusters with eksctl

After manually creating clusters twice, I discovered eksctl—a CLI tool that does in one command what took us 45 minutes through the console.

What is eksctl?

An official CLI tool for EKS created by Weaveworks and AWS. It’s like kubectl for cluster creation—declarative, simple, and powerful.

Installing eksctl

macOS:

brew tap weaveworks/tap
brew install weaveworks/tap/eksctl

Linux:

curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin

Windows (Chocolatey):

choco install eksctl

Verify:

eksctl version

Creating a Cluster with eksctl

eksctl create cluster \
  --name eksctl-demo-k8s \
  --version 1.33 \
  --region us-east-1 \
  --nodegroup-name eksctl-demo-ngr \
  --node-type t3.medium \
  --nodes 2 \
  --nodes-min 1 \
  --nodes-max 4

That’s it. This single command:

Time: 15-20 minutes. Manual clicks: Zero.

Verify Everything

# Cluster info
kubectl cluster-info

# Nodes
kubectl get nodes

# Test deployment
kubectl create deployment nginx --image=nginx
kubectl expose deployment nginx --port=80 --type=LoadBalancer
kubectl get svc

Deleting the Cluster

eksctl delete cluster \
  --name eksctl-demo-k8s \
  --region us-east-1 \
  --wait

This deletes everything: cluster, node group, VPC, IAM roles, CloudFormation stacks. Clean and thorough.

Cleaning Up Resources (Important!)

EKS clusters cost $0.10/hour for the control plane plus EC2/Fargate costs. Don’t forget to delete when done learning.

Manual Cleanup Order

  1. Delete Fargate profiles (if any)
  2. Delete Node groups
  3. Delete Cluster
  4. Delete CloudFormation stack (VPC)
  5. Delete IAM roles (if you don’t need them)

eksctl Cleanup

eksctl delete cluster --name <cluster-name> --region <region> --wait

Done. One command.

Key Takeaways & Production Checklist

After this deep dive, here’s what clicked for me:

EKS abstracts the control plane complexity but you still need to understand IAM, VPC, and networking
Managed Node Groups are the sweet spot for most use cases—balance of control and convenience
Cluster Autoscaler requires IRSA setup—take time to understand OIDC providers and IAM trust policies
Fargate is magical for variable workloads but not always cost-effective at scale
eksctl is the fastest way to learn—start here, then dive into manual setup to understand internals

Production Readiness Checklist

Before going live, ensure:

What’s Next?

Now that you have a production-grade EKS cluster, here’s what to explore:

  1. Set up CI/CD: Integrate with GitHub Actions or GitLab CI to auto-deploy to EKS
  2. Implement monitoring: Deploy Prometheus and Grafana for observability
  3. Add ingress controller: Use AWS Load Balancer Controller or NGINX Ingress
  4. Explore service mesh: Try AWS App Mesh or Istio for advanced traffic management
  5. Experiment with EKS Add-ons: AWS released new capabilities in 2025 including built-in Argo CD and Kube Resource Orchestrator (KRO)

Resources


Have you struggled with EKS setup? What tripped you up—IAM roles, networking, or the autoscaler? Drop a comment, and let’s troubleshoot together!

And if this guide saved you hours of debugging (like it would have saved me), bookmark it for your team. Future engineers will thank you. 🚀

Happy Kubernetes-ing on AWS!


← Previous: Part 4


Edit page
Share this post on:

Previous Post
From Jenkins to Kubernetes: My Complete CI/CD Adventure
Next Post
Kubernetes Guide Part 4: Operators, Security, & Microservices