Beyond monitoring and logging – into Observability.


Table of Contents

First thing first

Monitoring and logging

Observability.

Differences and similarities

How to implement Observability.

As a design principle.

As an Ad-On

Closing words

First thing first

All cloud providers implement logging and monitoring on their core services, in other cases they also provide the means to capture logs and health data from virtual machines and external system. On the on-prem world things get a little trickier since the entire stack for monitoring and logging depends on the implementation and policies for each company, but all in all, data from applications and systems flows into a repository where it can be consumed by the relevant stakeholders. But this influx of data is only that, data. And even creating dashboard and alerting is not enough for actual observability.

Let dig first into what of the terms used above are and how they differ from each other in order to better understand why logging and monitoring is not observability, and why it’s important to have the latest.

Monitoring and logging

Monitoring and logging are key concepts for observability, but they are not observability itself, to understand this, let’s define each of the concepts:

  • Monitoring: in a nutshell is the act of look at a given output of the process or look at the execution status of the process/pipeline and the generated logs from this execution
  • Logging: Is the process of generate output to communicate the status of the process, the errors and the success or failures of the execution.

The above definitions are an over-simplification of the complexity to accomplish this, but they reflect the core of what is the purpose of each of them.

They do not represent observability by themselves, but they are a key aspect for what we are trying to achieve and the starting point of our journey.

Next, let’s define the core concept we are trying to explain: Observability.

Observability.

This concept refers to the capability to see the entire process, a data loading process for instance, from start to end in a single pane of glass, with the status of each step, success and failures history, statistics on data/requests for example in order to understand the past of the process, current status and plan ahead. A more in detail list of what can we expect below:

  • Process Execution history
  • Step metrics, trends and detailed execution history
  • Logs for the steps.
  • Ability to start the analysis of issues.
  • Alerting and notifications on events.

These capabilities within any system will allow for a faster response time in case of any issues, to plan for growth and understand proactively how the process works and places to improve it.

Differences and similarities

When talking about differences, we can narrow it down to the following key aspects:

  • Scope – This is the Big One, main difference between both concepts is the scope, while monitoring handles a process/tool or even, just the logs. Observability, focus on getting the big picture of the complete picture of all related processes and should provide the ability to drill down to a more detailed view.
  • Tools – In here both are very similar, as tools overlap, and actually tools from monitoring can be used as source or directly as part of the observability platform
  • Users – Users also overlap between tools, being the main difference the broader audience that Observability can have, going from pure business to depth technical, while monitoring will be aimed to mostly technical user, not being this a restriction but a use pattern.

As you can see, the differences are technical in nature, but business and process related, in normal monitoring we have a “Tunnel Vision” of what is happening, while in Observability we have the whole view.

How to implement Observability

As a design principle.

Whenever possible we should aim to implement Observability from the start, as part of the design and with the same importance that we give to network and security. Having this concept since the system creation will allow to have this pane of glass to leverage the information and constantly improve the operation. It’s a consideration that each architect should have as part of the system design, some important concepts are:

  • Define metrics to be captured.
  • Allow system to inform its state.
  • Capture logging information.
    • Allow going to higher levels of debugging if necessary.
  • Define SLA, SLO and SLI and align the observability data with those.
  • Have a central place to look at this information, or at least a clear way to reach the different metrics.
  • A plan to act.

With these in place, we should be able to have the much-needed observability since the start of the project, allowing greater agility.

As an Ad-On

There are times when the project doesn’t have observability as a design principle. It can due to many reasons, but the important part is that there is no proper observability in place to understand the operation of the project.

In cases like this, it’s important to identify the low-hanging fruit first to start implementing the observability as a compliment to the existing architecture, but with the goal in mind to have the complete system under the pane of glass.

Once we have the target process, the design should of the observability platform should consider all relevant aspects of the process, but also be flexible enough to plug into the rest of the system as the implementation moves forward.

Some steps to start the observability implementation are:

  • Define your target process.
  • Identify metrics.
  • Define how to capture data.
  • Prepare analysis for the data once tit’s start flowing to identify trends
  • Create an initial set of alerts and notifications based on best practices.
    • This should be tailored later with historical data.

The importance of adding the ability to understand the system is crucial, so, even without initial implementation. Observability should be added as an external agent that can provide the right information.

Closing words.

Observability is something each platform should aim to provide in order to, not just watch the process, but to understand how it is behaving now and how it will do it in the future based on the history. Multiple tools can help us achieve the observability, explaining them goes beyond the scope of this document, but the important thing to remembers is “Observability goes beyond monitoring and logging” is the usage of those concepts to enhance the operation.

Fork me on GitHub