Fixing production issues is one of the key responsibilities of system and network administrators. In fact, I’ve always found it to be one of the most interesting parts of infrastructure engineering. Diving as deep as needed into the problem at hand, not only do you (eventually) have the satisfaction of solving the issue, you also learn a lot of things along the way, which you probably wouldn’t have been exposed to under normal circumstances.
Operating systems certainly present such opportunities. Over time, they’ve grown ever more complex, forcing administrators to master a zillion configuration files and settings. Although infrastructure as code and automation have greatly improved provisioning and managing servers, there’s always room for mistakes and breakdowns that prevent a system from starting correctly. The list is endless: missing hardware drivers, misconfigured file systems, invalid network configuration, incorrect permissions, and so on. To make things worse, many issues can effectively
Original URL: http://feedproxy.google.com/~r/AmazonWebServicesBlog/~3/B2Bmvkfg6xc/