Report Highlights Complexity in Implementing Cloud-Native Observability

The use of cloud-native containers and microservices, as well as orchestration platforms like Kubernetes, is becoming more and more commonplace. A full 75% of companies are now focusing on constructing cloud-native applications and that number is set to increase in the coming year. But the move to cloud-native brings new challenges, namely dealing with mounting complexity in metrics and logging.

Many developers report drowning in a sea of observability data. Correlating cloud-native observability data points to issues isn’t always easy, and many engineers are dealing with too many false positives that hinder their job. Yet, observability is poignant for maintaining SLOs and reducing mean-time-to-resolution.

The 2023 Cloud Native Observability Report from Chronosphere sheds light on cloud-native observability and the challenges therein. Below, we’ll identify the key takeaways from the report and consider how technology leaders should respond as they look to improve their observability practices in the year ahead.

The Benefits of Cloud-Native Observability

The move to microservices, containers and multi-cloud has certainly increased the complexity of modern software architecture. It has also expanded the number of data points these systems produce and their cardinality. As such, 87% of engineers say using cloud-native architecture has increased the complexity of discovering and troubleshooting incidents.

Engineers (and end users) all want a quick resolution when something goes haywire. And cloud-native observability has risen as a necessary solution to trace issues to their root cause. This brings positive net outcomes to the business and future-proofing it for further innovation—67% say having a solid observability function improves the foundation for all business value. And 71% say their business can’t innovate effectively without good observability.

Challenges in Cloud-Native Observability

Engineers are grappling with too many alerts, causing additional toil and performance bottlenecks. The cardinality of infrastructure output data has especially increased for cloud-native adopters. As a result, it’s estimated that engineers spend, on average, 10 hours per week simply triaging incidents. Resolving these low-level issues is taking time away from innovation.

And, the current observability toolsets aren’t presenting ample context to address these moments. Part of this is the inundation of alerts—59% of developers say that half of the incident alerts they receive aren’t actually helpful or usable. The contributing factor is a lack of context for resolution—40% of engineers frequently get alerts from their observability solution without enough context to triage the incident.

But it’s not just false positives holding back your typical observability solution. In addition to that, 49% struggle with inconsistent performance. Almost half (45%) also say their observability tool requires a lot of time and manual labor. With so much on their plate, additional manual toil is the last thing DevOps engineers need.

Which is Better: Open Source or Vendor Solutions?

As I’ve previously covered, many powerful open source toolsets are available for implementing cloud-native observability, including Prometheus, Jaeger, OpenTelemetry, Fluentd and others. Yet, there are specific challenges when going the in-house route. For example, organizations using an in-house solution were 50% more likely to report high-severity incidents quarterly or more, suggesting that in-house observability solutions aren’t actually equating to decreased incidents. According to the report, those using a vendor solution were 65% faster at detecting issues than those without a cohesive approach.

So, when comparing observability tools, what are engineers most looking for? Well, among those using a vendor solutions, 40% say speed and performance are their top priorities. A full 61% would consider using a vendor observability solution to enhance team productivity and 54% would do so to improve reliability. Although the study doesn’t mention it, other points, like the ability to work with open standards, should also be top of mind when making technological decisions.

The report suggests that it is more strategic to have a unified observability solution with a dedicated observability team to centralize these practices. According to the report, engineers without a cohesive observability approach spend significantly more time (16+ hours) each week troubleshooting incidents. This decreases to about 10 hours a week with in-house solutions and down to about six hours per week using a vendor solution. Therefore, a vendor solution might be a necessary timesaver to reduce pings after hours and increase context to empower efficient triaging.

Centralize Observability to Overcome Cloud-Native Complexity

Gartner predicts that by 2025, 95% of new digital workloads will be deployed on cloud-native platforms, up from 30% in 2021. This represents a significant and sudden shift toward cloud-native, meaning DevOps must quickly evolve.

Observability could decrease the inherent complexity of cloud-native architecture. Yet, many observability practices have introduced additional burdens. Toil within the engineering workflow is a major contributor to burnout and stress, which could negatively affect employee retention. The majority (88%) of engineers report negative impacts on themselves and their careers from spending so much time troubleshooting. To decrease toil and instill best practices, leadership must centralize observability methods and tooling. The good news is that 50% of companies already have a central observability team, and a quarter plan to create one next year.

It can be surmised from the report that certain observability tools are underperforming others. Yet, the study doesn’t cite specific observability solutions by name, which should be taken with a grain of salt as Chronosopehre is an observability solution provider.

The 2023 Cloud-Native Observability Report was conducted by Chronosphere and Method Communications. Between September and October 2022, they surveyed 500 full-time software developers familiar with observability. For further insights, you can download the full report here.

Bill Doerrfeld

Bill Doerrfeld is a tech journalist and analyst. His beat is cloud technologies, specifically the web API economy. He began researching APIs as an Associate Editor at ProgrammableWeb, and since 2015 has been the Editor at Nordic APIs, a high-impact blog on API strategy for providers. He loves discovering new trends, interviewing key contributors, and researching new technology. He also gets out into the world to speak occasionally.

Bill Doerrfeld has 103 posts and counting. See all posts by Bill Doerrfeld