Auto scaling Java REST APIs on AWS

  • Paul - Software Architect
    Paul Minasian
    Software Architect

Horizontally scaling Java REST APIs on AWS using EC2 Auto Scaling service is easy and fun. You’ll find here information about how to setup an EC2 Auto Scaling sandbox environment in AWS to be able to automatically scale a provided sample Java REST API using the following AWS services:

I’ll provide a brief introduction to the EC2 Auto Scaling service. Afterwards, I’ll describe how to setup and run the EC2 Auto Scaling sandbox environment in AWS so that you can play and experiment with the features of the service. It will help you to better understand some of the EC2 Auto Scaling concepts explained here.

Please note that is also possible to use AWS services such as Elastic BeanstalkElastic Container Service, or Elastic Kubernetes Service to horizontally scale your web applications and services.

EC2 Auto Scaling Service

This service makes it possible to scale EC2 instances in and out to be able to handle the decreased and increased workloads as effectively and cost efficiently as possible.

Scale out means that new EC2 instances will be launched and added to the Auto Scaling group. In the diagram below, the application was running on one EC2 instance (highlighted in green), and once a scale out action was triggered, two new EC2 instances (highlighted in blue) were launched. You can specify a warm-up period for EC2 instances. After the warm-up period of the last EC2 instance part of the same scale out action, the newly launched instances became part of the Auto Scaling group.

1-Auto scaling Java REST APIs on AWS-ISAAC

Conversely, scale in means that some of the existing EC2 instances will be terminated and removed from the Auto Scaling group. In the diagram below, the application was running on three EC2 instances, and once a scale in action was triggered, two new EC2 instances (highlighted in red) were terminated. After termination of the EC2 instances, the instances were removed from the Auto Scaling group.

2-Auto scaling Java REST APIs on AWS-ISAAC 

For the created Auto Scaling group, you will need to specify the minimum size, desired capacity and the maximum size.

3-Auto scaling Java REST APIs on AWS-ISAAC

When you first create an Auto Scaling group, the group will contain the number of EC2 instances as indicated by the desired capacity. When the group starts to scale out or in because of a scaling policy, the desired capacity will be changed automatically. It cannot be less than the minimum size and cannot be greater than the maximum size.

An EC2 Auto Scaling group can launch instances in different availability zones (AZs) for high availability of your application / platform. When creating an Auto Scaling group, you will need to specify the subnets in which the EC2 instances are launched. In the diagram below, you can see that for Auto Scaling Group A three public subnets have been specified, and for Auto Scaling Group B three private subnets have been specified. Now, when Auto Scaling Group A or B is about to launch an EC2 instance, it will make sure that the EC2 instance is launched in one of the healthy AZ and that the number of EC2 instances are balanced across the number of AZs.

4-Auto scaling Java REST APIs on AWS-ISAAC

When an AZ becomes unhealthy and thus rendering the running EC2 instances part of an Auto Scaling group unhealthy as well, an Auto Scaling group will make sure that new EC2 instances are launched in the remaining and healthy AZs to be able to meet the desired capacity of the group.

For an EC2 Auto Scaling group to be able to launch an EC2 instance, it will need to know what instance needs to be launched. For that you will need to either create a launch template or a launch configuration. Both provide instance configuration template which an Auto Scaling group uses to launch EC2 instances.

They provide similar functionality, however a launch template allows you to have multiple versions of a template. In the below diagram, you can see two different launch templates (A and B) and two versions of launch template A. Launch template A (version 1) could function as a reusable template which defines common configuration parameters such as tags or network configuration that other executable templates can reuse. Launch template A (version 1) would then never directly be used by an Auto Scaling group to launch EC2 instances. The other configuration parameters can be specified as part of another version of the same template (version 2 in the diagram below).

5-Auto scaling Java REST APIs on AWS-ISAAC

The EC2 Auto Scaling documentation recommends that you use launch templates instead of launch configurations to ensure that you can use the latest features of Amazon EC2 service.

For EC2 Auto Scaling service, two types of automatic scaling are possible:

  • Dynamic Scaling
  • Scheduled Scaling

Dynamic Scaling allows you to specify target tracking scaling policies, which will then enable an Auto Scaling group to automatically respond to changes in resource utilization with either scale in or out actions. This allows the group to scale your EC2 capacity automatically to handle changes in the amount of traffic that your application receives. For instance, dynamic scaling makes it possible for an Auto Scaling group to scale out based on an event. An event could be something like an aggregated average CPU utilization metric of all EC2 instances within an Auto Scaling group being above a specified threshold (e.g., 50%) for some time.

Scheduled Scaling allows you to set your own schedule. You are now in charge of predicting the traffic patterns of your web application. Scaling actions are performed automatically as a function of time and date. You will need to create a scheduled action. A scheduled action will indicate EC2 Auto Scaling at what time to perform a scaling action. For each scheduled action, you will need to specify the start time when the scaling action should take effect, and the new minimum, desired, and maximum sizes of the scaling action.

When you use the generic AWS Auto Scaling service, an additional type of automatic scaling is available, namely, Predictive Scaling. Predictive Scaling predicts when to scale using machine learning algorithms and historic data. It predicts future traffic, including regularly-occurring spikes, and provisions the right number of EC2 instances in advance of predicted changes. Predictive Scaling’s machine learning algorithms detect changes in daily and weekly patterns, automatically adjusting their forecasts. The model needs at least one day’s of historical data to start making predictions; it is re-evaluated every 24 hours to create a forecast for the next 48 hours.

When the Elastic Load Balancing (ELB) load balancer is used as shown in the diagram below, the Auto Scaling group will automatically register the newly launched EC2 instances, and deregister them from when needed. A load balancer acts as a single point of contact for all incoming web traffic to your Auto Scaling group.

6-Auto scaling Java REST APIs on AWS-ISAAC

When creating an EC2 Auto Scaling group, you can specify the ELB load balancer when you intend to use a Classic Load Balancer, or a target group when you intend to use either an Application Load Balancer or a Network Load Balancer. When creating an Application Load Balancer or Network Load Balancer, you will need to assign a target group as part of routing configuration. Each target group that you create can be associated with only one load balancer. A load balancer routes requests to the targets (EC2 instance, IP, Lambda function) in a target group using the target group settings that you specify, and performs health checks on the targets using the health check settings that you specify.

Classic Load Balancer routes and load balances either at the transport layer (TCP/SSL), or at the application layer (HTTP/HTTPS). A Classic Load Balancer supports either EC2-Classic or a VPC.

Application Load Balancer routes and load balances at the application layer (HTTP/HTTPS), and supports path-based routing. An Application Load Balancer can route requests to ports on one or more registered targets, such as EC2 instances, in your virtual private cloud (VPC).

Network Load Balancer routes and load balances at the transport layer (TCP/UDP Layer-4), based on address information extracted from the TCP packet header, not from packet content. Network Load Balancers can handle traffic bursts, retain the source IP of the client, and use a fixed IP for the life of the load balancer.

Next to monitoring the health of the EC2 instances by an Auto Scaling group, you can also configure Elastic Load Balancing health checks to monitor the health of registered instances so that the load balancer or target group only routes traffic to the healthy instances.

Finally, specific CloudWatch metrics are tracked by EC2 Auto Scaling. This is instructed by a scaling policy that you create for your Auto Scaling group. The scaling policy also defines what action to take when the associated CloudWatch alarm goes off (enters the ALARM state). The metrics coming from all of the instances in the Auto Scaling group are aggregated and used to trigger an alarm. If for instance, you have three EC2 instances running in the group, and the CPU utilization of the first EC2 instance is at 40 percent, the second at 60 percent and the third at 80 percent, than the average of the CPU utilization of all three instances is at 60 percent. When the policy is in effect, EC2 Auto Scaling adjusts the group's desired capacity up or down when the alarm is triggered.

EC2 Auto Scaling sandbox environment

Setting up the sandbox environment

Components and Services

The sandbox environment consists of the following components and services:

Apollo Missions API

Apollo Missions API is a simple REST API written in Java using the Quarkus framework. This Java application runs on each EC2 instance launched by the Auto Scaling group. Later on, I will describe how to create a custom Amazon Machine Image (AMI) which will contain a Linux service to run the Java application on EC2 boot time. The application consists of the following endpoints:

  • /missions/manned
  • /missions/manned/{missionId}
  • /longComputation
  • /health

The first two endpoints provide some basic data regarding the manned Apollo Missions. The /logComputation endpoint is used by the Apache JMeter for load testing purposes. Creating load on EC2 instances will trigger an alarm in CloudWatch, and cause a scaling policy to take a scale out action. The /health endpoint is used by the ELB load balancer to monitor the health of the Java application (whether it is running, accepting and successfully processing HTTP requests or not).

Amazon Virtual Private Cloud

The default VPC within the Europe (London) eu-west-2 region is used. An additional security group is created in this VPC which permits the HTTP traffic to flow to the EC2 instances. It also allows for SSH traffic in case you would like to SSH into the EC2 instances. The default public subnets will be used when launching EC2 instances. Each public subnet is associated with a different AZ. 

Elastic Compute Cloud (EC2)

A Key Pair is created for the EC2 instances. The Key Pair is specified within the Launch Template. The public key is automatically deployed within a launched EC2 instance. When you need to SSH into an EC2 instance, use the private key associated with the public key.

Furthermore, a custom AMI is created which contains a Linux service to run the Java application on EC2 boot time. Finally, a Launch Template is created which bases the launched EC2 instances on this AMI.

Elastic Load Balancing (ELB)

A Target Group is created with health checks. The /health endpoint of the Java application is used by the health checks. The Target Group allows the created application load balancer to route the HTTP traffic to the EC2 instances launched by the Auto Scaling group.

EC2 Auto Scaling

An Auto Scaling group is created with a dynamic scaling policy using CPU utilization metric for the specified target tracking value. It uses the created Launch Template to launch new EC2 instances when a scale out action takes place.

CloudWatch

You can use this service to see which alarms went off, and view the EC2 and ELB statistics such as for instance the CPU utilization and Request Count Sum respectively.

Setup

The values mentioned here regarding AZs are for the Europe (London) eu-west-2 region. You can of course use a different region but then you will need to provide the values for the default public subnets and AZs of your chosen region.

Custom AMI

Within the Instances section of the EC2 Console, launch an instance with the below values which need to be specified or are different from the default values.

Key

Value

AMI

Ubuntu Server 18.04 LTS (HVM), SSD Volume Type

or

higher version for Ubuntu Server

Instance Type

t3.small

Tags

Key: Name

Value: Apollo Missions API

Create a new Security Group

Name: Apollo Missions API SG

Description: Apollo Missions API SG

HTTP

  • Protocol: TCP
  • Port Range: 80
  • Source: 0.0.0.0/0

Custom TCP Rule

  • Protocol: TCP
  • Port Range: 8080
  • Source: 0.0.0.0/0

SSH

  • Protocol: TCP
  • Port Range: 22
  • Source: 0.0.0.0/0

Key Pair

Select the Key Pair that you either created earlier or create a new Key Pair.

Once the new EC2 instance is up and running, SSH into the instance. The username is ubuntu. Use the private key associated with the public key as part of the Key Pair that you’ve specified for the EC2 instance.

Create a dev folder within the home directory. Pull the source code of Apollo Missions API and run the mvnw package command to build the application on your development machine.

Once the application is built, SFTP the lib folder and the apollo-missions-api-1.0.0-runner.jar file found in the target folder to the dev folder on the EC2 instance. Also SFTP the apollo-missions-api.service and apollo-missions-api.sh files found in the resources/linux folder to the dev folder on the EC2 instance.

Make sure the file apollo-missions-api.sh is executable:

chmod 776 apollo-missions-api.sh

Now install the OpenJDK 11:

sudo apt-get install openjdk-11-jdk-headless

Let’s create a Linux service which will allow us to start and stop the Java application. This service will also be enabled so that our application will run on EC2 boot time.

Move the apollo-missions-api.service to /etc/systemd/system/ folder:

sudo mv apollo-missions-api.service /etc/systemd/system/apollo-missions-api.service

Now execute the below commands to create and enable the Linux service for the Java application and to start the application:

sudo systemctl daemon-reload
sudo systemctl enable apollo-missions-api
sudo systemctl start apollo-missions-api

Check the status of the service with the below command:

sudo systemctl status apollo-missions-api

Now make a request to the API in your browser and verify that you are getting a valid JSON response. URL: http://<EC2 public IP address>:8080/missions/manned

Select the launched EC2 instance in the EC2 Console and select Actions -> Create Image.

Specify the below values for the AMI.

Key

Value

Image name

Apollo Missions API

Find the AMI ID in the AMIs section of the EC2 console. You will need this id when you create a Launch Template.

Launch Template

Once the AMI has been created, terminate the running EC2 instance, and proceed to the creation of a Launch Template. Within the EC2 Console, choose Launch Templates and create a Launch Template with the below values which need to be specified or are different from the default values.

Key

Value

Launch template name

Apollo-Missions-API-TMPL

Auto Scaling guidance

Tick the box ‘Provide guidance to help me set up a template that I can use with EC2 Auto Scaling’

AMI

Type in the AMI ID of your created AMI and select that AMI.

Instance type

t3.small

Key pair name

The key pair that you created / used earlier for the launched EC2 instance.

Networking platform

Virtual Private Cloud (VPC)

Security Group

Apollo Missions API SG

Instance tags

Key: Name

Value: Apollo Missions API

Check the boxes for

  • Tag instances
  • Tag volumes

Advanced details: Detailed CloudWatch monitoring

Enabled

ELB - Target Group

After creating the Launch Template, you will need to create the Target Group and the ELB application load balancer. Within the EC2 Console, select Target Groups and create one with the below values which need to be specified or are different from the default values.

Key

Value

Name

Apollo-Missions-API-TG

Target type

Instance

Protocol

HTTP

Port

8080

Health check settings

Protocol: HTTP

Path: /health

ELB - Application Load Balancer

Now create an ELB application load balancer with the below values which need to be specified or are different from the default values.

Key

Value

Name

Apollo-Missions-API-ALB

Availability Zones

Select all available AZs:

  • eu-west-2a
  • eu-west-2b
  • eu-west-2c

Configure Security Groups

Apollo Missions API SG

Configure Routing – Select Existing Target Group

Apollo-Missions-API-TG

Auto Scaling Group

Finally, it’s time to create the Auto Scaling group. Within the EC2 Console, select Auto Scaling Groups and create one using a Launch Template with the below values which need to be specified or are different from the default values.

Key

Value

Launch Template

Apollo-Missions-API-TMPL

Group Name

Apollo-Missions-API-ASG

Group Size

1

Subnet

Select all the public subnets for the default VPC which are associated with the following AZs:

  • eu-west-2a
  • eu-west-2b
  • eu-west-2c

Load Balancing

Tick the box for ‘Receive traffic from one or more load balancers’.

Target Groups

Apollo-Missions-API-TG

Health Check Type

ELB

Monitoring

Tick the box ‘Enable CloudWatch detailed monitoring’.

Configure scaling policies

Tick the box ‘Use scaling policies to adjust the capacity of this group’.

Minimum size: 1

Maximum size: 10

Scale Group Size

  • Name: CPU Utilization
  • Metric type: Average CPU Utilization
  • Target value: 50
  • Instances need: 180

Running the sandbox environment

If you’ve successfully setup the sandbox environment, you should be able to see the first EC2 instance launched by the Auto Scaling group to meet the desired capacity of 1.

You can access the REST API on that EC2 instance directly or via the ELB load balancer.

  • EC2: http://<EC2 public IP address>:8080/missions/manned
  • ELB: http://<ELB load balancer’s DNS name>/missions/manned

Upon launching the EC2 instance, the Auto Scaling group has registered the EC2 instance with the Target Group so that the ELB load balancer can route the traffic to the EC2 instances.

7-Auto scaling Java REST APIs on AWS-ISAAC

The Monitoring tab of the Target Group shows among other metrics that one healthy host is running.

8-Auto scaling Java REST APIs on AWS-ISAAC

Select the Auto Scaling group and click on the Activity Tab. In the Activity history section you will see one log entry. See below for an example:

  • Status: Successful
  • Description: Launching a new EC2 instance: i-0958be7cdd91d0405
  • Cause: At 2020-04-24T10:40:25Z a user request created an AutoScalingGroup changing the desired capacity from 0 to 1. At 2020-04-24T10:40:27Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 0 to 1.
  • Start time: 2020 April 24, 02:40:28 PM +02:00
  • End time: 2020 April 24, 02:41:00 PM +02:00

Select the Instance management tab and find the single EC2 instance that is running.

9-Auto scaling Java REST APIs on AWS-ISAAC

To trigger a scale out action by the Auto Scaling group, you will need to put some extra load on the EC2 instances. You can use the provided JMeter project as part of the Apollo Mission API source code. The project file is located in the resources/jmeter folder. This project has been created with Apache JMeter version 5.2.1.

Once the project is opened in JMeter, provide the ELB load balancer’s DNS name for the field Server Name or IP of the /longComputation endpoint.

10-Auto scaling Java REST APIs on AWS-ISAAC

Hit the play button to send HTTP requests to the ELB load balancer. Ten concurrent users will for a period of 20 minutes continuously send HTTP requests.

Click on the Summary Report and watch the statistics.

11-Auto scaling Java REST APIs on AWS-ISAAC

Upon starting the JMeter tests, only one EC2 instance was running. You can SSH into this EC2 instance to see the CPU utilization using for instance the htop utility.

From the below screenshot we can see that the CPU utilization is above 50 percent. It will remain this way until the alarm in CloudWatch goes off and multiple scale out actions take place to increase the number of EC2 instances to handle the new workload.

12-Auto scaling Java REST APIs on AWS-ISAAC

In the Activity history section of the Auto Scaling group you will see more log entries. See below for an example of an additional scale out entry:

  • Status: Successful
  • Description: Launching a new EC2 instance: i-0fb2c14f85caf48f5
  • Cause: At 2020-04-24T10:44:25Z a monitor alarm TargetTracking-Apollo-Missions-API-ASG-AlarmHigh-c4e129cd-59b9-4092-821c-df45ce16fe2e in state ALARM triggered policy CPU Utilization changing the desired capacity from 1 to 2. At 2020-04-24T10:44:29Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 1 to 2.
  • Start time: 2020 April 24, 02:45:01 PM +02:00
  • End time: 2020 April 24, 02:48:33 PM +02:00

Average response time decreases when additional EC2 instances are launched to process the workload.

13-Auto scaling Java REST APIs on AWS-ISAAC

After the desired capacity has reached the maximum which is 10, I increased the maximum size to 20 and ran the JMeter tests for a second round of 20 minutes.

14-Auto scaling Java REST APIs on AWS-ISAAC

15-Auto scaling Java REST APIs on AWS-ISAAC

The number of EC2 instances increases gradually over time to meet the dynamically adjusted desired capacity by the Auto Scaling group. Once we stop the JMeter tests, the CPU utilization will drop drastically and thus multiple scale in actions will take place to decrease the number of running EC2 instances until only one instance is running which is the minimum size, and the desired capacity will be one.

The below screenshot of the Targets tab within the Target Group shows that EC2 instance deregistration is in progress. Once the deregistration is finished, the EC2 instances will be terminated.

16-Auto scaling Java REST APIs on AWS-ISAAC