Amazon CloudWatch monitors your cloud resources and applications, including Amazon Elastic Compute Cloud (EC2) instances. You can track cloud, system, and application metrics, see them in graphical form, and arrange to be notified (via a CloudWatch alarm) if they cross a threshold value that you specify. You can also stop, terminate, or recover an EC2 instance when an alarm is triggered (see my blog post, Amazon CloudWatch – Alarm Actions for more information on alarm actions).
New Action – Reboot Instance
Today we are giving you a fourth action. You can now arrange to reboot an EC2 instance when a CloudWatch alarm is triggered. Because you can track and alarm on cloud, system, and application metrics, this new action gives you a lot of flexibility.
You could reboot an instance if an instance status check fails repeatedly. Perhaps the instance has run out of memory due to a runaway application or service that is leaking memory. Rebooting the instance is a quick and easy way to remedy this situation; you can easily set this up using the new alarm action. In contrast to the existing recovery action which is specific to a handful of EBS-backed instance types and is applicable only when the instance state is considered impaired, this action is available on all instance types and is effective regardless of the instance state.
If you are using the CloudWatch API or the AWS Command Line Interface (CLI) to track application metrics, you can reboot an instance if the application repeatedly fails to respond as expected. Perhaps a process has gotten stuck or an application server has lost its way. In many cases, hitting the (virtual) reset switch is a clean and simple way to get things back on track.
Creating an Alarm
Let’s walk through the process of creating an alarm that will reboot one of my instances if the CPU Utilization remains above 90% for an extended period of time. I simply locate the instance in the AWS Management Console, focus my attention on the Alarm Status column, and click on the icon:
Then I click on Take the action, choose Reboot this instance, and set the parameters (90% or more CPU Utilization for 15 minutes in this example):
If necessary, the console will ask me to confirm the creation of an IAM role as part of this step (this is a new feature):
The role will have permission to call the “Describe” functions in the CloudWatch and EC2 APIs. It also has permission to reboot, stop, and terminate instances
I click on Create Alarm and I am all set!
This feature is available now and you can start using it today in all public AWS regions.