AWS gives you the components that you need to build systems that are highly reliable: multiple Regions (each with multiple Availability Zones), Amazon CloudWatch (metrics, monitoring, and alarms), Auto Scaling, Load Balancing, several forms of cross-region replication, and lots more. When you put them together in line with the guidance provided in the Well-Architected Framework, your systems should be able to keep going even if individual components fail.
However, you won’t know that this is indeed the case until you perform the right kinds of tests. The relatively new field of Chaos Engineering (based on pioneering work done by “Master of Disaster” Jesse Robbins in the early days of Amazon.com, and then taken into high gear by the Netflix Chaos Monkey) focuses on adding stress to an application by creating disruptive events, observing how the system responds, and implementing improvements. In addition to pointing out the areas for improvements, Chaos
Original URL: http://feedproxy.google.com/~r/AmazonWebServicesBlog/~3/5dAf2jYNI4o/