指标关联Trace
exemplar机制
prometheus
prometheus主要是采用 exemplars 的机制在 metrics 中带上额外的信息。通过metrics的接口可以同事暴露exemplar
https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#exemplars-1
# 后面的内容就是exemplar
# lable 采样值 采样时间
foo_bucket{le="0.1"} 8 # {} 0.054
foo_bucket{le="1"} 11 # {trace_id="KOO5S4vxi0o"} 0.67
foo_bucket{le="10"} 17 # {trace_id="oHg5SJYRHA0"} 9.8 1520879607.789
注入方式
c := GetPlayURLTotal.WithLabelValues(
strconv.FormatInt(int64(callerType), 10),
strconv.FormatInt(int64(device.GetOs()), 10),
strconv.FormatInt(int64(device.GetNetwork()), 10),
videoFormat,
)
sp := trace.SpanFromContext(ctx).SpanContext()
if sp.IsSampled() { // 可以继续增加其他条件使得exemplar样本更加典型
c.(prometheus.ExemplarAdder).AddWithExemplar(1, prometheus.Labels{
"traceID": sp.TraceID().String(),
}) // 如果是histogram类型的则类型断言为prometheus.ExemplarObserver
} else {
c.Inc()
}
otlp
otlp在协议中有Exemplar字段 可以在指标上报时将被采样的span跟指标关联.otlp-SDK是自动进行注入的,因为trace-log-metric 三者共享同样的otlp-context,所以可以不必要进行手工关联
// A representation of an exemplar, which is a sample input measurement.
// Exemplars also hold information about the environment when the measurement
// was recorded, for example the span and trace ID of the active span when the
// exemplar was recorded.
message Exemplar {
// The set of key/value pairs that were filtered out by the aggregator, but
// recorded alongside the original measurement. Only key/value pairs that were
// filtered out by the aggregator should be included
repeated opentelemetry.proto.common.v1.KeyValue filtered_attributes = 7;
// Labels is deprecated and will be removed soon.
// 1. Old senders and receivers that are not aware of this change will
// continue using the `filtered_labels` field.
// 2. New senders, which are aware of this change MUST send only
// `filtered_attributes`.
// 3. New receivers, which are aware of this change MUST convert this into
// `filtered_labels` by simply converting all int64 values into float.
//
// This field will be removed in ~3 months, on July 1, 2021.
repeated opentelemetry.proto.common.v1.StringKeyValue filtered_labels = 1 [deprecated = true];
// time_unix_nano is the exact time when this exemplar was recorded
//
// Value is UNIX Epoch time in nanoseconds since 00:00:00 UTC on 1 January
// 1970.
fixed64 time_unix_nano = 2;
// The value of the measurement that was recorded. An exemplar is
// considered invalid when one of the recognized value fields is not present
// inside this oneof.
oneof value {
double as_double = 3;
sfixed64 as_int = 6;
}
// (Optional) Span ID of the exemplar trace.
// span_id may be missing if the measurement is not recorded inside a trace
// or if the trace is not sampled.
bytes span_id = 4;
// (Optional) Trace ID of the exemplar trace.
// trace_id may be missing if the measurement is not recorded inside a trace
// or if the trace is not sampled.
bytes trace_id = 5;
}
prometheus存储方式(tjg使用该方式)
https://github.com/prometheus/prometheus/pull/6635/files
prometheus 实现了一种环形连续内存的结构来存储 exemplar,并实现了对应的查询接口
$ curl -g 'http://localhost:9090/api/v1/query_exemplars?query=test_exemplar_metric_total&start=2020-09-14T15:22:25.479Z&end=020-09-14T15:23:25.479Z'
{
"status": "success",
"data": [
{
"seriesLabels": {
"__name__": "test_exemplar_metric_total",
"instance": "localhost:8090",
"job": "prometheus",
"service": "bar"
},
"exemplars": [
{
"labels": {
"traceID": "EpTxMJ40fUus7aGY"
},
"value": "6",
"timestamp": 1600096945.479,
}
]
},
{
"seriesLabels": {
"__name__": "test_exemplar_metric_total",
"instance": "localhost:8090",
"job": "prometheus",
"service": "foo"
},
"exemplars": [
{
"labels": {
"traceID": "Olp9XHlq763ccsfa"
},
"value": "19",
"timestamp": 1600096955.479,
},
{
"labels": {
"traceID": "hCtjygkIHwAN9vs4"
},
"value": "20",
"timestamp": 1600096965.489,
},
]
}
]
}
日志关联Trace
日志关联Trace 比较简单 只要在打印日志的时候获取到链路的TraceId和spanId 就可以关联Trace和单条日志了
Log
timestamp= TraceId=xxxx SpanId=xxxxx
Json
{"trace_id": "xxx", "span_id": "xxx", "log": "xxxx"}
最终清洗入库并标记trace_id和span_id即可实现联动
otlp-SDK 最终可以实现默认关联因为共享Context
监控存储exemplar
由于influxdb目前不支持exemplar入库,所以基于现有存储结构监控可以使用ES进行exemplar存储,避免高基线问题
修改如下
- 相关prometheus的数据解析需要支持exemplar类型的解析并上报
- transfer需要支持exemplar数据入库到ES
- saas支持exemplar数据的查询
Technical solution