AWS: Timeout issues

Hi,

In this post I will talk about the code updates triggered from Team City that were failing in AWS. This caused the updates in AWS to be rolled back.

The errors I got were:

Updating Auto Scaling group named: MyScalingGroup failed. Reason: Failed to receive 1 resource signal(s) within the specified duration

and

Instance id(s) ‘my-instance-id’ did not pass health check after command execution. Aborting the operation.

Initially I believed that these have happened because the instances failed to finish their health checks. I tried a few things to solve this issue:

1) Made a Service Role for the environment.

2) Set the Health Target to TCP:80.

3) Increased the timeout value for health-checks.

Unfortunately none of these seemed to work.

After many hours spent on investigating this issue I decided to also look in the Load Balancer settings and in the Auto Scaling Group settings and I noticed a very interesting behavior.

The Load Balancer was using 2 Availability Zones (1b and 1c) while the Auto Scaling Group was using 3 Availability Zones (1a, 1b,1c). That meant that every new instance that was spun off by the Auto Scaling Group was in the AZ that was not associated to the Load Balancer (1a). This has led to the issue that AWS could not perform the heath checks because it couldn’t reach that instance at all.

 

 

One Response to “AWS: Timeout issues”

  1. Phil Says:

    Thanks for the troubleshooting tips! My issue was also with the Load Balancer, turns out modifying from HTTP to TCP fixed the issue. And also adding an HTTPS listener.

Leave a comment