Streamlining Diagnostic Processes

By Mitchell Kowalchick, DevOps Engineer at CloudCheckr

Actionable Diagnostic Steps You Can Take Right Now

For this blog post I wanted to go through some easy to implement techniques that can really assist in your diagnostic approach. While some may find this to be a simple concept, many entities do not institute a convenient prioritization system and this will make things difficult for the administrative team. In subsequent blog posts I will attempt to further detail the diagnostic approach at a higher level, but for now let’s start with the basics.

Naming

Naming strategies, while often overlooked, act as an important prioritization tool. When things become chaotic it is important to settle your key servers first. In my experience I have found tiered or ranked designations for server names to be the most helpful. Not only do these conventions allow you to really focus on your vital resources, but they allow you to quickly sort through the infrastructure to where the problems are occurring. Typically naming could be associated with the identification portion of the problem solving cycle by separating your infrastructure into individual fragments you can troubleshoot more effectively.
Consider the ideologies represented in Maslow’s hierarchy of needs… This is a psychological paradigm in which one must attain the lower level before moving up to the next. As an example most people would need to be able to breath before thinking about getting a Frosty at Wendy’s. The ideals of Maslow’s triangle can be used in structuring your infrastructure and naming convention in such a way that vital resources are prioritized and core functionality is easy to find. For example priority servers can be designated tier 1 and more aesthetic or less profitable servers might be dubbed tier 4. This assists in both preventative and emergency maintenance: The daily routine quickly turns into spot checking the tiers from most important to the least and improves resource management capabilities.

Timing

Identifying trends can severely reduce the amount of investigation required in finding the problem and is typically much less taxing to perform. Let’s take a look at some of CloudCheckr’s heat map reports for RDS and one of our servers.
The server above is clearly experiencing some issues and is unhealthy, and to many it can be difficult to find a starting point. If we look a bit closer though the heatmap allows us to see some very clear trends between the hours of 5 to 7 am and 5 to 8 pm. Now that we have some smaller pieces to work with we can dig into logs recorded at those times to get a clearer picture of the problem. Once logged onto the server I was able to identify the processes, queries, and jobs that were the most taxing on the server. We can then dig in further and determine if the server has too many queries being forced upon it or we have an indexing/code related issue that is causing a bottleneck on the database.

Gauge the effectiveness of your attempted fix

After attempting a fix it is very important to watch the trend fall to a healthy state. In many cases the initial fix will bring smaller issues to the surface yet they can still be diagnosed in a similar fashion. Many times developers need to prioritize the more beneficial changes before moving forward to larger infrastructure changes so, when possible, acknowledge the effectiveness of the change and if the server is healthy enough move to another more costly issue before losing time in the nuances of one particular server. The process of making small, timely, and effective changes will typically deliver much greater performance and stability.

To see Mitch’s previous blog post for his thoughts on the DevOps position and where it is headed, check it out here. You can also learn more about the Top 3 Areas You Need to Optimize for DevOps.

Cloud Resources Delivered

Get free cloud resources delivered to your inbox. Sign up for our newsletter.

Cloud Resources Delivered

Subscribe to our newsletter

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

<img class="icon" src="https://s22581.pcdn.co/wp-content/uploads/2022/05/icon-SpotIO.png" /> Spot

Careers

Contact CloudCheckr

Cloud Visibility

Cost Optimization

Security

Compliance Management

Resource Utilization

Cloud Services and Billing

AWS Well-Architected

Enterprise

Managed Services

Public Sector

CloudCheckr CMx

Spot Eco

Spot Elastigroup

Spot Ocean

Spot Portfolio

Amazon Web Services

Microsoft Azure

Google Cloud

Business Partner Program

Become a Business Partner

Partner Center

Integrations

Resources Hub

Blog

Webinars

Events

Cloud Management Assessment

Native Tool Comparison

Pricing Quote

Blog | Automation | August 2, 2016

Streamlining Diagnostic Processes

Actionable Diagnostic Steps You Can Take Right Now

Naming

Timing

Gauge the effectiveness of your attempted fix

Cloud Resources Delivered

Cloud Resources Delivered

Related Resources

Article

Spot and AWS Deliver an Integrated Deployment Experience with CloudCheckr Built-in Solution

Case Study

How inQdo Cloud Eliminated Billing Complexity and Uncovered New Revenue Streams

Article

Map Your Cloud Journey: 50 Essential Cloud KPIs to Guide the Way

Article

An Insider’s Look at AWS re:Invent 2022

Spot