Running Containers on AWS as per Business Requirements and Capabilities

We can run containers with EKS, ECS, Fargate, Lambda, App Runner, Lightsail, OpenShift or on just EC2 instances on AWS Cloud. In this post I will discuss on how to choose the AWS service based on our organization requirements and capabilities.

In-Short

CaveatWisdom

Caveat: Meeting the business objectives and goals can become difficult if we don’t choose the right service based on our requirements and capabilities.

Wisdom:

  1. Understand the complexity of your application based on how many microservices and how they interact with each other.
  2. Estimate how your application scales based on business.
  3. Analyse the skillset and capabilities of your team and how much time you can spend for administration and learning.
  4. Understand the policies and priorities of your organization in the long-term.

In-Detail

You may wonder why we have many services for running the containers on AWS. One size does not fit all. We need to understand our business goals and requirements and our team capabilities before choosing a service.

Let us understand each service one by one.

All the services which are discussed below require the knowledge of building containerized images with Docker and running them.

Running Containers on Amazon EC2 Manually

You can deploy and run containers on EC2 Instances manually if you have just 1 to 4 applications like a website or any processing application without any scaling requirements.

Organization Objectives:

  1. Run just 1 to 4 applications on the cloud with high availability.
  2. Have full control at the OS level.
  3. Have standard workload all the time without any scaling requirements.

Capabilities Required:

  1. Team should have full understanding of AWS networking at VPC level including load balancers.
  2. Configure and run container runtime like docker daemon.
  3. Deploying application containers manually on the EC2 instances by accessing through SSH.
  4. Knowledge of maintaining OS on EC2 instances.

The cost is predictable if there is no scaling requirement.

The disadvantages in this option are:

  1. We need to maintain the OS and docker updated manually.
  2. We need to constantly monitor the health of running containers manually.

What if you don’t want to take the headache of managing EC2 instances and monitoring the health of your containers? – Enter Amazon Lightsail

 Running Containers with Amazon Lightsail

The easiest way to run containers is Amazon Lightsail. To run containers on Lightsail we just need to define the power of the node (EC2 instance) required and scale that is how many nodes. If the number of containers instances is more than 1, then Lightsail copies the container across multiple nodes you specify. Lightsail uses ECS under the hood. Lightsail manages the networking.

Organization Objectives:

  1. Run multiple applications on the cloud with high availability.
  2. Infrastructure should be fully managed by AWS with no maintenance.
  3. Have standard workload and scale dynamically when there is need.
  4. Minimal and predictable cost with bundled services including load balancer and CDN.

Capabilities Required:

  1. Team should have just knowledge of running containers.

Lightsail can dynamically scale but it should be managed manually, we cannot implement autoscaling based on certain triggers like increase in traffic etc.

What if you need more features like building a CI/CD pipeline, integration with a Web Application Firewall (WAF) at the edge locations? – Enter AWS App Runner

 

Running Containers with AWS App Runner

AWS App runner is one more easy service to run containers. We can implement Auto Scaling and secure the traffic with AWS WAF and other services like private endpoints in VPC. App Runner directly connects to the image repository and deploy the containers. We can also integrate with other AWS services like Cloud Watch, CloudTrail and X-Ray for advanced monitoring capability.

Organization Objectives:

  1. Run multiple applications on the cloud with high availability.
  2. Infrastructure should be fully managed by AWS with no maintenance.
  3. Auto Scale as per the varying workloads.
  4. Implement high security features like traffic filtering and isolating workloads in a private secured environment.

Capabilities Required:

  1. Team should have just knowledge of running containers.
  2. AWS knowledge of services like WAF, VPC, CloudWatch is required to handle the advanced requirements.

App Runner supports full stack web applications including front-end and backend services. At present App Runner supports only stateless applications, stateful applications are not supported.

What if you need to run the containers in a serverless fashion, i.e., an event driven architecture in which you run the container only when needed (invoked by an event) and pay only for the time the process runs to service the request? – Enter AWS Lambda.

Running Containers with AWS Lambda

With Lambda, you pay only for the time your container function runs in milliseconds and how much RAM you allocate to the function, if your function runs for 300 milliseconds to process a request then you pay only for that time. You need to build your container image with the base image provided by AWS. The base images are open-source made by AWS and they are preloaded with a language runtime and other components required to run a container image on Lambda. If we choose our own base image then we need to add appropriate runtime interface client for our function so that we can receive the invocation events and respond accordingly.

Organization Objectives:

  1. Run multiple applications on the cloud with high availability.
  2. Infrastructure should be fully managed by AWS with no maintenance.
  3. Auto Scale as per the varying workloads.
  4. Implement high security features like traffic filtering and isolating workloads in a private secured environment.
  5. Implement event-based architecture.
  6. Pay only for the requests process without idle time for apps.
  7. Seamlessly integrate with other services like API Gateway where throttling is needed.

Capabilities Required:

  1. Team should have just knowledge of running containers.
  2. Team should have deep understanding of AWS Lambda and event-based architectures on AWS and other AWS services.
  3. Existing applications may need to be modified to handle the event notifications and integrate with runtime client interfaces provided by the Lambda Base images.

We need to be aware of limitations of Lambda, it is stateless, max time a Lambda function can run is 15 minutes, it provides a temporary storage for buffer operations.

What if you need more transparency i.e., access to underlying infrastructure at the same time the infrastructure is managed by AWS? – Enter AWS Elastic Beanstalk.

Running Containers with AWS Elastic Beanstalk

We can run any containerized application on AWS Elastic Beanstalk which will deploy and manage the infrastructure on behalf of you. We can create and manage separate environments for development, testing, and production use, and you can deploy any version of your application to any environment. We can do rolling deployments or Blue / Green deployments. Elastic Beanstalk provisions the infrastructure i.e., VPC, EC2 instances, Load Balances with Cloud Formation Templates developed with best practices.

For running containers Elastic Beanstalk uses ECS under-the-hood. ECS provides the cluster running the docker containers, Elastic Beanstalk manages the tasks running on the cluster.

Organization Objectives:

  1. Run multiple applications on the cloud with high availability.
  2. Infrastructure should be fully managed by AWS with no maintenance.
  3. Auto Scale as per the varying workloads.
  4. Implement high security features like traffic filtering and isolating workloads in a private secured environment.
  5. Implement multiple environments for developing, staging and productions.
  6. Deploy with strategies like Blue / Green and Rolling updates.
  7. Access to the underlying instances.

Capabilities Required:

  1. Team should have just knowledge of running containers.
  2. Foundational knowledge of AWS and Elastic Beanstalk is enough.

What if you need to implement more complex microservices architecture with advanced functionality like service mesh and orchestration? Enter Elastic Container Service Directly

Running Containers with Amazon Elastic Container Service (Amazon ECS)

When we want to implement a complex micro-services architecture with orchestration of container, then ECS is the right choice. Amazon ECS is a fully managed service with built-in best practices for operations and configuration. It removes the headache of complexity in managing the control plane and gives option to run our workloads anywhere in cloud and on-premises.

ECS give two launch types to run tasks, Fargate and EC2. Fargate is a serverless option with low overhead with which we can run containers without managing infrastructure. EC2 is suitable for large workloads which require consistently high CPU and memory.

A Task in ECS is a blueprint of our microservice, it can run one or more containers. We can run tasks manually for applications like batch jobs or with a Service Schedular which ensures the scheduling strategy for long running stateless microservices. Service Schedular orchestrates containers across multiple availability zones by default using task placement strategies and constraints.

Organization Objectives:

  1. Run complex microservices architecture with high availability and scalability.
  2. Orchestrate the containers as per complex business requirements.
  3. Integrate with AWS services seamlessly.
  4. Low learning curve for the team which can take advantage of cloud.
  5. Infrastructure should be fully managed by AWS with no maintenance.
  6. Auto Scale as per the varying workloads.
  7. Implement high security features like traffic filtering and isolating workloads in a private secured environment.
  8. Implement complex DevOps strategies with managed services for CI/CD pipelines.
  9. Access to the underlying instances for some applications and at the same time have a serverless option for some other workloads.
  10. Implement service mesh for microservices with a managed service like App Mesh.

Capabilities Required:

  1. Team should have knowledge of running containers.
  2. Intermediate level of understanding of AWS services is required.
  3. Good knowledge of ECS orchestration and scheduling configuration will add much value.
  4. Optionally Developers should have knowledge of services mesh implementation with App mesh if it is required.

What if you need to migrate existing on-premises container workloads running on Kubernetes to the Cloud or what if the organization policy states to adopt open-source technologies? – Enter Amazon Elastic Kubernetes Service.

 

Running Containers with Amazon Elastic Kubernetes Service (Amazon EKS)

Amazon EKS is a fully managed service for Kubernetes control plane and it gives option to run workloads on self-managed EC2 instances, Managed EC2 Instances or fully managed serverless Fargate service. It removes the headache of managing and configuring the Kubernetes Control Plane with in-built high availability and scalability. EKS is an upstream implementation of CNCF released Kubernetes version, so all the workloads presently running on-premises K8S will work on EKS. It gives option to extend and use the same EKS console to on-premises with EKS anywhere.

Organization Objectives:

  1. Adopt open-source technologies as a policy.
  2. Migrate existing workloads on Kubernetes.
  3. Run complex microservices architecture with high availability and scalability.
  4. Orchestrate the containers as per complex business requirements.
  5. Integrate with AWS services seamlessly.
  6. Infrastructure should be fully managed by AWS with no maintenance.
  7. Auto Scale as per the varying workloads.
  8. Implement high security features like traffic filtering and isolating workloads in a private secured environment.
  9. Implement complex DevOps strategies with managed services for CI/CD pipelines.
  10. Access to the underlying instances for some applications and at the same time have a serverless option for some other workloads.
  11. Implement service mesh for microservices with a managed service like App Mesh.

Capabilities Required:

  1. Team should have knowledge of running containers.
  2. Intermediate level of understanding of AWS services is required and deep understanding of networking on AWS for Kubernetes will a lot, you can read my previous blog here.
  3. Learning curve is high with Kubernetes and should spend sufficient time for learning.
  4. Good knowledge of EKS orchestration and scheduling configuration.
  5. Optionally Developers should have knowledge of services mesh implementation with App mesh if it is required.
  6. Team should have knowledge on handling Kubernetes updates, you can refer to my vlog here.

 

Running Containers with Red Hat OpenShift Service on AWS (ROSA)

If the Organization manages its existing workloads on Red Hat OpenShift and want to take advantage of AWS Cloud then we can migrate easily to Red Hat OpenShift Service on AWS (ROSA) which is a managed service. We can use ROSA to create Kubernetes clusters using the Red Hat OpenShift APIs and tools, and have access to the full breadth and depth of AWS services. We can also access Red Hat OpenShift licensing, billing, and support all directly through AWS

 

I have seen many organizations adopt multiple service to run their container workloads on AWS, it is not necessary to stick to one kind of service, in a complex enterprise architecture it is recommended to keep all options open and adopt as the business needs changes.

Build Docker Container for Java App and Deploying it on Amazon EKS

Github Link https://github.com/getramki/Deploy-JavaApp-On-EKS.git

This repo contains a Sample Spring Boot Java App with the dockerfile which uses Amazon Corretto 17 as base image and manifestes for creating an Amazon EKS cluster and deploying the sample app to the cluster as a container and exposing it with a service and classic load balancer.

Prerequisites

Docker, AWS Account and IAM user with necessary permissions for creating EKS Cluster, aws cli, configure IAM user with necessary programmatic permissions, eksctl cli, kubectl Please install and configure above before going further

  • You can incur charges in your AWS Account by following this steps below
  • The code will deploy in us-west-2 region, change it where ever necessary if deploying in another region

After downloading the repo in the terminal CD to repo directory and follow the steps for

  1. Building a Docker Image for a Java App and Pushing it to Amazon ECR.
  2. Creating an Amazon EKS cluster with eksctl
  3. Deploying the sample app to the EKS cluster.

Steps for Building a Docker Image and Pushing it to Amazon ECR

  • Change directory to sample
cd sample
  • Run docker daemon
sudo dockerd 
  • Build an image
docker build --tag sample . 
  • View local images
docker images
  • docker build build stage
docker build -t sample-build --target build . 
  • docker build production stage
docker build -t sample-production --target production . 
  • Get ECR Login and pass it to docker
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin Replace-With-AWS-Account-ID.dkr.ecr.us-west-2.amazonaws.com
  • Create ECR repo
aws ecr create-repository --repository-name sample-repo --image-scanning-configuration scanOnPush=true --region us-west-2
  • Tag the image
docker tag sample-production:latest Replace-With-AWS-Account-ID.dkr.ecr.us-west-2.amazonaws.com/sample-repo
  • Push the Image to ECR Repo
docker push Replace-With-AWS-Account-ID.dkr.ecr.us-west-2.amazonaws.com/sample-repo

Create EKS Cluser

Create an Amazon EKS cluster in us-west-2 region with 2 t3.micro instances Creation of EKS cluster can take up to 20 minutes

eksctl create cluster -f devcluster-addons-us-west-2.yaml

Deploy Image to EKS Cluster

Update Image URL in deployment.yaml file Replace-With-AWS-Account-ID

  • Deploy Java Sample-App
kubectl apply -f deployment.yaml
  • Deploy Java Sample-App Service
kubectl apply -f service.yaml
kubectl apply -f ingress.yaml
  • Get Deployments
kubectl get deployment sample-app
kubectl get deployments
kubectl get service sample-app -o wide
kubectl get pods -n default

Delete Resources

  • Delete Deployments
kubectl delete deployment sample-app
  • Delete services
kubectl delete service sample-app
  • Delete ingress if you have created it
kubectl delete ingress sample-app
  • Delete Amazon EKS Cluster
eksctl delete cluster -f devcluster-addons-us-west-2.yaml

Planning and Managing Amazon VPC IP Space in an Amazon EKS Cluster

For the sake of simplicity, I will discuss only IPv4 addressing in this post, I will discuss IPv6 addressing in another blog post.

In-Short

CaveatWisdom

Caveat: Planning Amazon VPC IP space and choosing right EC2 instance type is important for Amazon EKS Cluster, or else, Kubernetes can stop creating or scaling pods for want of IP addresses in the cluster and our applications can stop scaling.

Wisdom:

  1. Create larger VPC with CIDR range like 10.0.0.0/16 and if needed add additional CIDR ranges to VPC with custom CNI networking
  2. Create Subnets with sufficient IPs and if needed use different subnet for secondary ENIs (network interfaces)
  3. Choose right type of instance which can support appropriate number of IPs
  4. Manage the IP allocation to Pods and creation of ENIs

In-Detail

Some Basics

Amazon VPC is a virtual private network which you create to logically isolate EC2 instances and assign private IP addresses for your EC2 instances within the VPC IP address range you defined. A public subnet is a subnet with a route table that includes a route to an internet gateway, whereas a private subnet is a subnet with a route table that doesn’t include a route to an internet gateway.

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. It creates Pods, each Pod can contain one container or a group of logically related containers. A Pod is like an ephemeral light weight virtual machine which will have its own private IP Address, this Pod’s IP address is assigned by kube-proxy which is a component of Kubernetes.

Amazon EKS is a managed Kubernetes service to run Kubernetes in the AWS cloud, it automatically manages the availability and scalability of the Kubernetes control plane nodes responsible for scheduling containers, managing application availability, storing cluster data, and other key tasks.

Kubernetes support Container Network Interface (CNI) plugins for cluster networking, a suitable CNI plugin is required to implement Kubernetes Networking Model. In Amazon EKS cluster, Amazon VPC CNI plugin is used which is an opensource implementation. This VPC CNI plugin allocates the IP address to pods on each node from the VPC CIDR range.

In the above sample Amazon EKS Cluster Architecture, the control plane of Kubernetes is managed by Amazon EKS in its own VPC and we will not have any access to it. Only the worker nodes (Data Plane of K8s) can be in a VPC which is managed by us.

The above VPC has a CIDR range 192.168.0.0/16 with two public subnets and two private one each in an availability zone, an Internet gateway attached to VPC, an Application Load Balancer which can distribute traffic to worker nodes and VPC endpoints which can route traffic to other AWS services.

The NAT gateways are in the public subnets which can be used by worker nodes are in private subnets to communicate to AWS Services and Internet.

Amazon EKS cluster is created with VPC CNI add-on which is represented at each worker node, and each worker node can have multiple Elastic Network Interfaces (ENIs)(one primary and others secondary).

The Story

Let’s say a global multi-national company which have thousands of applications running on on-premises Kubernetes installation have decided to move to AWS to take advantage of scalability, high-availability and also to save cost.

For large migrations like this, we need to meticulously plan and migrate the workloads to AWS, there are many AWS migration tools which will make our job easy which I will discuss in another blog, for this post let us concentrate on planning our Amazon EKS cluster from networking perspective.

Let us discuss three main things from networking perspective

  1. Planning VPC CIDR Range
  2. Understanding Subnets and ENIs
  3. Choosing the right EC2 Instance types for worker nodes
  4. Managing the ENIs and IP allocation to Pods

When we add secondary VPC CIDR blocks we need to configure VPC CNI plugin in the Amazon EKS cluster so that they can be used for scheduling Pods. Amazon VPC supports RFC1918 and non-RFC 1918 CIDR blocks, so EKS Clusters in Amazon VPCs addressed with CIDR blocks in the 100.64.0.0/10 and 198.19.0.0/16 ranges with Hybrid networking models, i.e., if we would like to extend our EKS cluster workloads to on-premises data centres we can you those CIDR blocks along with CNI custom networking. We can conserve the RFC1918 IP space in a hybrid network by leveraging RFC6598 addresses.

1. Planning VPC CIDR Range

We should be planning a larger CIDR block for VPC with netmask of /16 which can provide up to 65536 IPs, this allows us to add new workloads for ever expanding business requirements. If we run out of IP space in this primary range then we can add secondary CIDR blocks to the VPC.  We can add up to 4 additional CIDR blocks by default and request a quota increase up to 50 including primary CIDR block. Even though we can have 50 CIDR blocks we should be aware that there is a Network Address Usage (NAU) limit which is metric applied to resources in VPC, by default NAU units per VPC will be 64000 and the quota can be increased up to 256000. So practically we can have only 4 CIDR Blocks with netmask of /16.

2. Understanding Subnets and ENIs

The CIDR range of public subnets can be smaller with netmasks like /27 and /24. We can plan public subnets to be smaller in size to host only NAT Gateways and bastion hosts, at the same time we need to have at least 6 free IPs which can be consumed by Amazon EKS for its internal use, this can be for creating network interfaces by Amazon EKS in the subnets.

 We don’t want to expose our application workloads to public internet so mostly we will be hosting our workloads in private subnets which should be larger in size. In the example architecture above we have use netmask /20 for private subnets which can support 4096 IPs (few of them will be reserved for internal use).

In a worker node, the VPC CNI plugin automatically allocates Secondary ENIs (Elastic Network Interfaces) when the IP addresses from the Primary Network Interface get exhausted. Secondary Network Interfaces created by VPC CNI Plugin will be in the same subnet as Primary Network Interface by default, some times if there are not enough IP addresses in the Primary Interface Subnet, we may have to use a different Subnet for Secondary ENIs, this can be done by Custom Networking with VPC CNI plugin.

 However, we need to be aware that when we enable custom networking, IP addresses from Primary ENI will not be assigned to Pods, so multiple secondary ENIs with different subnets will be helpful but we will be wasting one ENI that is the primary ENI.

3. Choosing the right EC2 Instance types for worker nodes

The number of Secondary ENIs attached to the EC2 Instances (worker nodes) depends on the type of the EC2 Instances. Each EC2 instance type has a maximum number network interfaces that it can support and also there a maximum limit of Private IPv4 addresses which a network interface can handle again based on EC2 Instance type.

We need to analyse the CPU and memory requirements of Pods hosting our application containers and number of Pods which will be scaled during minimum and maximum traffic periods. Then we need to analyse how many EC2 instances will be scaled.

Based on these factors including the ENIs and IPs which the instance can handle we need to choose the right type of Instance.

Pods scaling and Instance scaling in an EKS cluster is a big topic which I will be discussing in a separate blog, one thing we need to keep in mind is EC2 Instance type also affects the IP address allocation to Pods.

4. Managing the ENIs and IP allocation to Pods

We might go for smaller instance types and small clusters to save cost or may be various other business reasons, it becomes very important to manage the IPs in the small clusters to save cost, for this we need to understand how VPC CNI works.

VPC CNI has two components, aws-cni and ipamd (IP address management daemon). aws-cni is responsible for setting up Pod-to-Pod communication network and ipamd is a daemon in the Kubernetes which is responsible for managing ENIs and a warm-pool of IPs.

To scale out Pods quickly it is necessary to maintain a warm pool of IPs because provisioning a new ENI and attaching it to an EC2 instance can take some time, so ipamd attaches an ENI in advance and maintain a warm pool of IPs.

Let us understand how ipamd allocates ENI and IPs with an example

Let’s say we have planned for M5.4xLarge instance in the subnet which have 256 IP range. Each M5.4xLarge instance can support up to 8 ENIs and each ENI can support up to 30 IPs. Out of 30 IPs one IP will be reserved for internal use.

When the instance is added to EKS cluster, it starts with one ENI which is primary, if the number of Pods running in the instance is between 0 to 29 then to keep a warm pool of IPs ipamd requests EC2 service to allocate one more secondary ENI, so that the total ENIs will be 2 and warm pool of IPs available in the starting will be 58 for the Instance, when the number of running Pods becomes 30 (>29) then ipamd will request for one more ENI which will increase total ENIs to 3 and IPs for the instance to 87. This can increase so on based on number of running Pods and till the EC2 instance supports ENI.

So, the formula for number of IPs is

the number of ENIs for the instance type × (the number of IPs per ENI – 1))

Here in this case even though number of Pods running in a Node is only 30, the IPs allocated to that node is 87. The free IPs is 87-30 = 57 which might not be used in that instance, this also leads to situation that, one instance in the subnet get more of IPs and other instances in the same subnet will starve for IPs.

In the instance which got more IPs, other resources like vCPU and Memory can become less which are utilized by already running Pods and other instances which have free vCPU and Memory may not have sufficient IPs to run the Pods. This can create an imbalance in distribution of Pods between instances and wastage of resources at the same time escalating the cost.

To manage such situations, we can define CNI configuration variables WARM_ENI_TARGET, WARM_IP_TARGET and MINIMUM_IP_TARGET.

WARM_ENI_TARGET is 1 by default, so we need touch it.

With WARM_IP_TARGET we can define the number of free IP addresses ipamd should maintain.

With MINIMUM_IP_TARGET we can define the total number of IPs allocated to Pods.

It is always recommended to use MINIMUM_IP_TARGET and WARM_IP_TARGET together, which will reduce the API calls made by ipamd to EC2, which can slow down the system.

In the above example we can define MINIMUM_IP_TARGET = 30 and WARM_IP_TARGET = 2, this will make the ipamd to maintain 2 warm-pool of IPs after 30 Pods are running. This will give space for other Instances in the subnet to consume the available IPs.

Increasing the Pod Density per Node

To increase Pod density per node we can enable Prefix mode with which we can increase the number of IPs allocated to Node by keeping the maximum number of ENIs same per instance. This can reduce the cost bi bin packing the Pods running per instance and use a smaller number of instances. However we need to be aware that this can affect the high availability of our applications which may need to run in multiple availability zones for high availability.