Prometheus Survey

Prometheus Survey

Architecture

Prometheus server

負責蒐集及儲存各 target 的 metrics,也會檢查 alert 規則,觸發後會發 alert 給 alertmanager。

Target

要被監控的目標,會以網頁方式將 metrics 以固定格式呈現給 prometheus server 抓取。

Pushgateway:

若有一些 target 不適合以抓取的方式獲得 metrics,例如存活時間很短的 job,prometheus server 還來不及抓就關閉了,則可以將 metrics 推送到 pushgateway,再讓 prometheus server 到 pushgateway 抓取 metrics。

Alertmanager:

prometheus server 會檢查 alert 規則,將觸發的 alert 送到 alertmanager,而 alertmanager 會將 alert 整理後,確定可以發出才會將 alert 送往設定的 receiver。

Concepts

Data Model

Metric Types

Prometheus 的 client libraries 提供了四種核心 metric types:

Jobs and Instances

Storage

Local Storage

Operational Aspects

Remote Storage Integrations

Query

Expression Data Types

Time Series Selectors

Instant vector selectors

選取一組時間序列,並回傳最新的數據,可以加上 label 來過濾特定的時間序列。例如:

可以搭配等式來找出吻合的 label 或反向不吻合的 label,例如:

底下舉例上述四種用法:

Range vector selectors

與 instant vector selectors 很類似,只不過 range vector selectors 會回傳指定時間長度的數據。例如:

時間長度以一個數字加上一個單位組成,例如:

Offset modifier

Offset modifier 會根據目前 query 的時間往前位移指定時間,例如:

請注意 offset modifier 需要緊接在 selector 後面,例如:

Binary Operators

Arithmetic binary operators

二元算術運算有下列幾個:

二元算術運算可以運算在下列 data type:

Comparison binary operators

二元比較運算有下列幾個:

二元比較運算可以運算在下列 data type:

Logical/set binary operators

邏輯/集合二元運算有下列幾個:

邏輯/集合二元運算只能用在 instant vector 上,例如底下運算:

Vector Matching

One-to-one vector matches

如果有完全一模一樣的 set of labels 跟相對應的值,則稱這兩個 entries 相符,底下兩個 keyword 可以過濾指定 labels:

使用方法:

Example input:

Example query:

如果不使用 ignoring(code) 則不會有任何結果,因為沒有完全一樣的 set of labels。put 跟 del 沒有相符的 label 所以不會出現在結果裡。

Result:

Many-to-one and one-to-many vector matches

使用方法:

Example query:

Result:

Aggregation Operators

Prometheus 內建下列 aggregation 操作:

這些 aggregation 操作可以透過 without 或是 by 來指定要使用的 labels,使用方法為:

例如:sum(http_requests_errors) by (code)

parameter 只有下列操作需要:count_valuesquantiletopkbottomk

Example:

Binary Operator Precedence

下列為 Prometheus 二元運算的優先權,由高到低為:

  1. ^
  2. -, /, %
  3. +, -
  4. ==, !=, <=, <, >=, >
  5. and, unless
  6. or

有相同優先權的運算子則為左結合律,然而 ^ 則是右結合律,例如:

Functions

Prometheus functions 列表

由於數量過多,儘說明下列兩項:

Alerting

Grouping

Grouping 將類似的警告分類在一起,並且只送出一個警告訊息。在許多系統同時出問題時非常有用。

例如:某個服務裡有一半的 instances 無法連到資料庫時,同時會有許多警告送到 Altermanager。使用者希望只收到一則警告訊息,卻同時還能看出是哪幾台 instances 受到影響,此時便可以設定 Alertmanager 將這些警告訊息根據他們的 alertname 及 cluster group 在一起。

Inhibition

Inhibition 的概念是:如果有特定警告已經發出了,將有其他的警告會被禁止發出。

例如:有警告通知有一整個 cluster 無法連接。Alertmanager 可以設定禁止其他與此 cluster 相關的警告持續發出。

Silences

Silences 就是將警告關閉一段時間。silence 透過 matchers 來判斷警告是否要發出。當有一個警告送來時,會檢查是否符合所有的檢查式,或是 regular expression。若符合則不會發出任何警告。

Configuration

# The root route with all parameters, which are inherited by the child
# routes if they are not overwritten.
route:
  receiver: 'default-receiver'
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  group_by: [cluster, alertname]
  # All alerts that do not match the following child routes
  # will remain at the root node and be dispatched to 'default-receiver'.
  routes:
  # All alerts with service=mysql or service=cassandra
  # are dispatched to the database pager.
  - receiver: 'database-pager'
    group_wait: 10s
    match_re:
      service: mysql|cassandra
  # All alerts with the team=frontend label match this sub-route.
  # They are grouped by product and environment rather than cluster
  # and alertname.
  - receiver: 'frontend-pager'
    group_by: [product, environment]
    match:
      team: frontend

Alerting Flow

  1. 接收到Alert,根据labels判断属于哪些Route(可存在多个Route,一个Route有多个Group,一个Group有多个Alert)

  2. 将Alert分配到Group中,没有则新建Group

  3. 新的Group等待group_wait指定的时间(等待时可能收到同一Group的Alert),根据resolve_timeout判断Alert是否解决,然后发送通知

  4. 已有的Group等待group_interval指定的时间,判断Alert是否解决,当上次发送通知到现在的间隔大于repeat_interval或者Group有更新时会发送通知

Example: given the service never up

Best Practices

Metric and Label Naming

Metric Names

Labels

Instrumentation

為了監控的話,服務可大致上分為三類:online-serving, offline-processing, batch jobs。 這三類會有一些交集,但每個服務會傾向落在某一類上。

Online-serving Systems

Offline Processing

Batch Jobs

Subsystems

Things to Watch Out For

Histograms and Summaries

Library support

Apdex score

sum(rate(http_request_duration_seconds_bucket{le="0.3"}[5m])) by (job)
/
sum(rate(http_request_duration_seconds_count[5m])) by (job)
(sum(rate(http_request_duration_seconds_bucket{le="0.3"}[5m])) by (job)
+
sum(rate(http_request_duration_seconds_bucket{le="1.2"}[5m])) by (job))
/ 2 / sum(rate(http_request_duration_seconds_count[5m])) by (job)

Quantiles

Errors of quantile estimation

TL; DR

Alerting

TL; DR

Online serving systems

Offline processing

Batch jobs

Capacity

Metamonitoring

When to Use The Pushgateway

Comparison to Alternatives

vs. Graphite

vs. InfluxDB

There are many similarities between the systems:

Where InfluxDB is better:

Where Prometheus is better:

vs. OpenTSDB

vs. Nagios

FAQ

What Dose Prometheus Fit and Not

Fit

Not fit

Can Prometheus be made highly available?

Yes, run identical Prometheus servers on two or more separate machines. Identical alerts will be deduplicated by the Alertmanager.

For high availability of the Alertmanager, you can run multiple instances in a Mesh cluster and configure the Prometheus servers to send notifications to each of them.

How to feed logs into Prometheus?

If you want to extract Prometheus metrics from application logs, Google’s mtail might be helpful.

Can I send alerts

Yes, with the Alertmanager.

Currently, the following external systems are supported:

Can I monitor machines?

Yes, the Node Exporter exposes an extensive set of machine-level metrics on Linux and other Unix systems such as CPU usage, memory, disk utilization, filesystem fullness, and network bandwidth.

Can I monitor network devices?

Yes, the SNMP Exporter allows monitoring of devices that support SNMP.

Can I monitor batch jobs?

Yes, using the Pushgateway. See also the best practices for monitoring batch jobs.

Can I monitor JVM applications via JMX?

Yes, for applications that you cannot instrument directly with the Java client, you can use the JMX Exporter either standalone or as a Java Agent.

Useful Link

Comments

comments powered by Disqus