SigNoz Terminology
Glossary / Generic Terminology
SigNoz Terminology - Introductory guide to get users started with the key concepts of observability and distributed tracing
[include diagrams of some of these, like for span, ][give some example of each]
Terms specific to SigNoz
APM:
Application Performance Monitoring. Typically a back-end of the Tracing Data Source.
Instrumentation:
Means the ability to measure the performance and to diagnose errors in your application code. Instrumenting a piece of software means generating relevant data like logs, metrics, and traces to gauge the software's performance. Instrumentation is the process of enabling your application code to generate telemetry data.
Observability:
In control theory, the term “observability” states that a system is observable if the internal states of the system can be determined by looking at its inputs and outputs. For distributed software systems, observability means how well we are able to troubleshoot our systems for performance issues with collected data from the system. If I see somethings from the outside world and I want to see what’s inside, that’s observability, folks! Eg - using thermometer to measure temperature.
HotROD:
Sample application given by jaegerr, used by signoz as a default app (click here to remove hot rod app from signoz). For more details on HotROD , refer this
load hotrod:
A service to push data continuous data to hot rod app to evaluate metrics and make charts.
SigNoz RED metrics stands for:
- Rate of requests
- Error rate of requests
- Duration taken by requests
RBAC:
Role-based access control (RBAC) is a method of regulating access to computer or network resources based on the roles of individual users within your organization
SSO:
Single sign-on (SSO) is an authentication method that enables users to securely authenticate with multiple applications and websites by using just one set of credentials.
General Terminology
Telemetry:
Telemetry is the automated communication processes from multiple data sources. Telemetry data is used to improve customer experiences, monitor security, application health, quality, and performance
OpenTelemetry - OTel:
OpenTelemetry is an open-source project under the Cloud Native Computing Foundation (CNCF) that aims to standardize the generation and collection of telemetry data. OpenTelemetry data includes logs, metrics, and traces. is a collection of tools, APls, and SDKs. You can use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) for analysis in order to understand your software's performance and behaviour. It does not give you a backend, that’s on you. SigNoz is an excellent backend to visualise, store, analyse OTel data. Others - Jaegerr, prometheus, etc.
Collector:
A vendor-agnostic implementation on how to receive, process, and export telemetry data. A single binary that can be deployed as an agent or gateway.
OpenTelemetry Collector:
A collector is someone that collects OpenTelemetry data. Has 3 parts - Receivers(can receive data in kafka, jaegerr, prometheus, otlp, zipkin formats), Processors, Exporters. [link to otel-collector repo]
The Collector is made up of the following components:
- receivers: How to get data into the Collector; these can be push or pull based
- processors: What to do with received data
- exporters: Where to send received data; these can be push or pull based
OpenTelemetry Collector Contrib: [with repo link, state few things that signoz uses from it]
Different receivers to get more data, in a hassle free, abstracted and easy way from your infrastructure code. Supports receivers for redis, memcached, nginx, k8, googlecloudpubs, etc.
OTLP: (OpenTelemetry Protocol)
New format defined by OpenTelemetry itself internally to process and export data.
OLAP Databases:
Online analytical processing, or OLAP, is an approach to answer multi-dimensional analytical queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Example - Clickhouse, Apache Druid.
Tools in Observability Space
Jaeger:
Jaeger is a popular open-source distributed tracing tool that was originally built by teams at Uber and then open-sourced. It is used to monitor and troubleshoot applications based on microservices architecture. For Jaeger, Cassandra and Elasticsearch are the primary supported storage backends.
Prometheus
Prometheus is a free software application used for event monitoring and alerting. It records real-time metrics in a time series database built using a HTTP pull model, with flexible queries and real-time alerting. Can write alert as code.
PromQL:
Prometheus provides a functional query language called PromQL (Prometheus Query Language) that lets the user select and aggregate time series data in real time. The result of an expression can either be shown as a graph, viewed as tabular data in Prometheus's expression browser, or consumed by external systems via the HTTP API.
Zipkin:
Zipkin is a distributed tracing system. It helps gather timing data needed to troubleshoot latency problems in service architectures.
Grafana:
Grafana is a multi-platform open source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources. graphana was not built for observability metrics.
ClickHouse:
A very fast OLAP datastore. OLAP - Online analytical processing, or OLAP, is an approach to answer multi-dimensional analytical queries swiftly in computing. Earlier they used druid database, not anymore.
Elastic Search:
Elasticsearch is a document oriented database management system having a full-text search engine at its heart. Built on the Apache Lucene library, it stores data as a JSON file, supports RESTful APIs, and uses a powerful analytical engine for faster data retrieval.
DataDog:
DataDog, a popular APM tool, gives you the option to select between four data centres with no options for India.
Apache Kafka:
Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Apache Druid:
Apache Druid is a real-time database to power modern analytics applications.
Docker:
Docker is a set of platform as a service products that use OS-level virtualization to deliver software in packages called containers. The service has both free and premium tiers. The software that hosts the containers is called Docker Engine
Docker Swarm:
Docker swarm is a container orchestration tool, meaning that it allows the user to manage multiple containers deployed across multiple host machines. One of the key benefits associated with the operation of a docker swarm is the high level of availability offered for applications.
Docker Compose:
Docker Compose is a tool that was developed to help define and share multi-container applications. With Compose, we can create a YAML file to define the services and with a single command, can spin everything up or tear it all down.
Containerization:
Containerization technologies like Docker and Kubernetes make it very easy to spin up new services and scale them on-demand.
Orchestration:
Orchestration is the automated configuration, management, and coordination of computer systems, applications, and services. Orchestration helps IT to more easily manage complex tasks and workflows. IT teams must manage many servers and applications, but doing so manually isn't a scalable strategy.
Kubernetes:
Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Google originally designed Kubernetes, but the Cloud Native Computing Foundation now maintains the project.
Cortex:
Cortex XDR™ is a prevention, detection, and response platform that runs on fully integrated endpoint, network and cloud data.
locust:
Locust is an open-source load testing tool that gauges the number of concurrent users a system can handle. Testers can write simple behaviors using Python before using Locust to simulate user “swarms”. Locusts in the swarm have one or more behaviors (TaskSets) attached to them.
Data Visualisation Terms
Flamegraphs:
A flame graph visualizes a distributed request trace and represents each service call that occurred during the request's execution path with a timed, color-coded, horizontal bar. Flame graphs for distributed traces include error and latency data to help developers identify and fix bottlenecks in their applications.
helm-chart:
Helm uses a packaging format called charts. A chart is a collection of files that describe a related set of Kubernetes resources. A single chart might be used to deploy something simple, like a memcached pod, or something complex, like a full web app stack with HTTP servers, databases, caches, and so on.
Gantt charts:
A Gantt chart is a type of bar chart that illustrates a project schedule, named after its popularizer, Henry Gantt, who designed such a chart around the years 1910–1915. Modern Gantt charts also show the dependency relationships between activities and the current schedule status.
data ingestion:
Data ingestion is the process of obtaining and importing data for immediate use or storage in a database. To ingest something is to take something in or absorb something. Data can be streamed in real time or ingested in batches. In real-time data ingestion, each data item is imported as the source emits it.
Query:
It is a service that retrieves traces from storage and hosts a UI to display them
Ingester:
It is a service that reads from a particular topic and writes to another storage backend (Clickhouse, Druid).
Webhooks:
Webhooks are automated messages sent from apps when something happens. They have a message—or payload—and are sent to a unique URL—essentially the app's phone number or address. Webhooks are almost always faster than polling, and require less work on your end. They're much like SMS notifications
Service Maps:
A service map is a real-time visual representation of the instrumented services in your application's architecture. It shows you how these services are connected, along with high-level metrics like average transaction duration, requests per minute, and errors per minute.
Signals:
Metrics, logs, traces, and baggage are examples of signals. Each signal represents a coherent, stand-alone set of functionality. Each signal follows a separate lifecycle, defining its current stability level.
Metrics:
Aggregates, p99 Latency, RPS, error rates. A metric is a measurement about a service, captured at runtime.
Some examples of use cases for metrics include:
- Reporting the total number of bytes read by a service, per protocol type.
- Reporting the total number of bytes read and the bytes per request.
- Reporting the duration of a system call.
- Reporting request sizes in order to determine a trend.
- Reporting CPU or memory usage of a process.
- Reporting average balance values from an account.
- Reporting current active requests being handled.
Metadata:
name/value pair added to telemetry data. OpenTelemetry calls this Attributes on Spans, Labels on Metrics and Fields on Logs.
Metric:
Records a data point, either raw measurements or predefined aggregation, as time-series with Metadata.
Events:
Events (or logs) and traces complement, not duplicate, each other. Event: Something that happened where representation depends on the Data Source. For example, Spans.
Exporter:
Provides functionality to emit telemetry to consumers. Used by Instrumentation Libraries and the Collector. Exporters can be push or pull based.
Logs:
A log is a timestamped text record, either structured (recommended) or unstructured, with metadata. While logs are an independent data source, they may also be attached to spans. It notes important events occurring during the execution of code. more here
Log Record:
A recording of an Event. Typically the record includes a timestamp indicating when the Event happened as well as other data that describes what happened, where it happened, etc.
Structured logging: - (key value pairs)
Structured logging is the practice of implementing a consistent, predetermined message format for application logs that allows them to be treated as data sets rather than text, must be structured (key value pairs), else they are useless.
Trace:
A DAG (Directed Acyclic Graph) of Spans, where the edges between Spans are defined as parent/child relationship. where the edges between Spans are defined as parent/child relationship.
Tracing:
When a user sends a request, the request travels through different micro-services to get the user what they want. Tracing is a way to track a user request across services. Trace a request across the software stack. Sequential execution of events tied up together is tracing. (Trace id, span Id). 1 trace = 1 request. 1 request can have 100-100s of spans under it.
Distributed Tracing:
Tracks the progression of a single Request, called a Trace, as it is handled by Services that make up an Application. A Distributed Trace transverses process, network and security boundaries. Ex - Uber architecture
span:
Spans are fundamental building blocks of distributed tracing. A single trace in distributed tracing consists of a series of tagged time intervals known as spans. Spans represent a logical unit of work in completing a user request or transaction. Each unit of work in a trace is represented by a span. Smallest unit of traces is snap/event. Traces is logical grouping of these spans under a specific request. Span is an independent event. Logical grouping of span id’s associated to one single request, it gives a trace. A span provides Request, Error and Duration (RED) metrics that can be used to debug availability as well as performance issues.
SpanContext:
Information required to reference a Span. It contains the tracing identifiers and the options that are propagated from parent to child Spans. Contains TraceId, SpanId, TraceFlags, Tracestate.
Data Source:
DataSource is a name given to the connection set up to a database from a server. The name is commonly used when creating a query to the database. The data source name need not be the same as the filename for the database. For example, a database file named friends.mdb could be set up with a DSN of school. One of Traces, Metrics or Logs.
Aggregation:
The process of combining multiple measurements into exact or estimated statistics about the measurements that took place during an interval of time, during program execution. Used by the Metric Data Source.
RPS:
Request / Queries per second is a measure of the amount of search traffic an information-retrieval system, such as a search engine or a database, receives in one second. The term is used more broadly for any request–response system, where it can more correctly be called requests per second.
99th Percentile latencies:
99th percentile latency of X milliseconds in stream jobs means that 99% of the items arrived at the end of the pipeline in less than X milliseconds. Gives data about how 99% of users experience latency.
Same for 90th and 50th percentile latency.
Note: latency + processing time = response time
RCA:
Root cause analysis (RCA) is the process of discovering the root causes of problems in order to identify appropriate solutions. RCA assumes that it is much more effective to systematically prevent and solve for underlying issues rather than just treating ad hoc symptoms and putting out fires
Error Rate:
The measurement of the effectiveness of a communications channel. It is the ratio of the number of erroneous units of data to the total number of units of data transmitted.
SYSTEM Structure
Service:
A component of an Application. Multiple instances of a Service are typically deployed for high availability and scalability. A Service may be deployed in multiple locations.
Microservices:
Microservices are an architectural and organizational approach to software development where software is composed of small independent services that communicate over well-defined APIs. These services are owned by small, self-contained teams. Compared to monoliths, microservices-based applications are riskier.
Monolithic architecture:
In software engineering, a monolithic application describes a single-tiered software application in which the user interface and data access code are combined into a single program from a single platform. A monolithic application is self-contained and independent from other computing applications. We usually switch from monolithic to microservices to improve the agility of the team and to increase the scalability of the system.
Monorepo architecture:
A monorepo architecture means using one repository, rather than multiple repositories. For example, a monorepo can use one repo that contains a directory for a web app project, a directory for a mobile app project, and a directory for a server app project. Monorepo is also known as one-repo or uni-repo.
Context:
A Context is a propagation mechanism which carries execution-scoped values across API boundaries and between logically associated execution units. A Context MUST be immutable, and its write operations MUST result in the creation of a new Context containing the original values and the specified values updated.
Throughput:
In communication networks, network throughput is the rate of successful message delivery over a communication channel, such as Ethernet or packet radio. The data these messages belong to may be delivered over a physical or logical link, or it can pass through a certain network node
Queries:
In-Network query processing refers to the complete or partial evaluation of database queries at the edges of a network, rather than in a centralized database server.
Endpoints:
An endpoint is one end of a communication channel. When an API interacts with another system, the touchpoints of this communication are considered endpoints. For APIs, an endpoint can include a URL of a server or service.
Package:
The term package describes a set of code which represents a single dependency, which may be imported into a program independently from other packages.
Attribute:
A key-value pair. Used by the Tracing Data Source to attach data to a Span.
Schema File:
A Schema File is a YAML file that describes the schema of a particular version. It defines the transformations that can be used to convert the telemetry data represented in any other older compatible version of the same schema family to this schema version. More here
REST:
Representation State Transfer. A REST API (also known as RESTful API) is an application programming interface (API or web API) that conforms to the constraints of REST architectural style and allows for interaction with RESTful web services.
RPC:
Remote Procedure Call. In distributed computing, a remote procedure call is when a computer program causes a procedure to execute in a different address space, which is coded as if it were a normal procedure call, without the programmer explicitly coding the details for the remote interaction
gRPC:
A high-performance, open source universal RPC framework initially developed at Google in 2015 as the next generation of the RPC infrastructure.
API:
An Application Programming Interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build or use such a connection or interface is called an API specification.
SDK:
A Software Development Kit (SDK) is a collection of software development tools in one installable package. They facilitate the creation of applications by having a compiler, debugger and sometimes a software framework. They are normally specific to a hardware platform and operating system combination.
WIP:
Work in Progress.
PRD :
Product Requirements Document.
Multi-Tenancy:
Multitenancy is a reference to the mode of operation of software where multiple independent instances of one or multiple applications operate in a shared environment
Anomaly detection:
In data analysis, anomaly detection is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behaviour
Legal Terminology
CCPA rules:
The CCPA requires business privacy policies to include information on consumers' privacy rights and how to exercise them: the Right to Know, the Right to Delete, the Right to Opt-Out of Sale and the Right to Non-Discrimination
GDPR Rules: - General Data Protection Regulation The UK GDPR sets out seven key principles:
- Lawfulness, fairness and transparency
- Purpose limitation
- Data minimisation
- Accuracy
- Storage limitation
- Integrity and confidentiality (security)
- Accountability