前一阵看k8s的时候接触了prometheus,当时感觉它的查询语法promql还是挺难理解的,所以时隔一个月的时间决定再回头找找思路。
另外呢,我觉得prometheus非常实用,结合grafana展示炫酷的大盘报表,这是一个非常实在的技能。
下面拿python client为例,讲讲我对Prometheus的理解。
项目
python的代码放在这个项目里:https://github.com/owenliang/prometheus-py,下面是所有代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
# -*- coding: utf-8 -*- from prometheus_client import Counter, Gauge, Histogram, Summary, start_http_server, Histogram import time import random # 定义4种metrics例子 c = Counter('cc', 'A counter') g = Gauge('gg', 'A gauge') h = Histogram('hh', 'A histogram', buckets=(-5, 0, 5)) s = Summary('ss', 'A summary', ['label1', 'label2']) # metrics名字, metrics说明, metrics支持的label # 在线程中启动http服务, 供metrics抓取 start_http_server(8000) # while True: # counter: 只增不减 c.inc() # gauge: 任意值 g.set(random.random()) # histogram: 任意值, 会给符合条件的bucket增加1次计数 h.observe(random.randint(-10, 10)) # summary:任意值, python client不支持summary的百分位统计, 其他语言的client也许支持, 但一般不建议用, 性能和场景都有局限 s.labels('a', 'b').observe(17) time.sleep(1) |
其中start_http_server在8000端口监听HTTP服务,供prometheus抓取metrics。
下面我们围绕代码来讲一下prometheus使用的关键概念。
客户端用法与原理
1 2 3 4 5 |
# 定义4种metrics例子 c = Counter('cc', 'A counter') g = Gauge('gg', 'A gauge') h = Histogram('hh', 'A histogram', buckets=(-5, 0, 5)) s = Summary('ss', 'A summary', ['label1', 'label2']) # metrics名字, metrics说明, metrics支持的label |
客户端可以使用4种metrics类型,我们一个一个说。
counter
一直累加的计数器,不可以减少。
定义它需要2个参数,第一个是metrics的名字,第二个是metrics的描述信息:
1 |
c = Counter('cc', 'A counter') |
它的唯一方法就是inc,只允许增加不允许减少:
1 2 3 4 5 |
def inc(self, amount=1): '''Increment counter by the given amount.''' if amount < 0: raise ValueError('Counters can only be incremented by non-negative amounts.') self._value.inc(amount) |
counter适合用来记录访问总次数之类的,通过promql可以计算counter的增长速率,即可以得到类似的QPS诸多指标。
调用的时候这样即可:
1 2 |
# counter: 只增不减 c.inc() |
抓取到的metrics这样:
1 2 3 4 5 |
# HELP cc_total A counter # TYPE cc_total counter cc_total 46.0 # TYPE cc_created gauge cc_created 1.546424546634121e+09 |
#HELP是cc的注释说明,我们刚才定义的时候指定的,#TYPE说明cc是一个counter。
其实cc这个counter被输出为cc_total,对应的累加值是46.0。
cc_created这个输出的TYPE是gauge类型,记录了cc这个metrics的创建时间,下面我们就说gauge类型是啥。
gauge
和counter略有不同。
gauge可增可减,可以任意设置,就代表了某个指标当前的值而已。
比如可以设置当前的CPU温度,内存使用量等等,它们都是上下浮动的,不是只增不减的。
定义和counter基本一样:
1 |
g = Gauge('gg', 'A gauge') |
第一个是metrics的名字,第二个是描述。
它支持3个方法:
1 2 3 4 5 6 7 8 9 10 11 |
def inc(self, amount=1): '''Increment gauge by the given amount.''' self._value.inc(amount) def dec(self, amount=1): '''Decrement gauge by the given amount.''' self._value.inc(-amount) def set(self, value): '''Set gauge to the given value.''' self._value.set(float(value)) |
因为是任意的值,所以可以inc/dec,也可以set。
最后就是调用方法,我这里每次设置一个随机值,其值可以是任意浮点数:
1 2 |
# gauge: 任意值 g.set(random.random()) |
输出也类似:
1 2 3 |
# HELP gg A gauge # TYPE gg gauge gg 0.935768437404028 |
名字就是gg,TYPE是gauge。
我认为gauge的大部分用法就是直接拿来画图就好了,不需要做promql处理。
histogram
这种主要用来统计百分位的,什么是百分位?英文叫做quantiles。
比如你有100条访问请求的耗时时间,把它们从小到大排序,第90个时间是200ms,那么我们可以说90%的请求都小于200ms,这也叫做”90分位是200ms”,能够反映出服务的基本质量。当然,也许第91个时间是2000ms,这就没法说了。
实际情况是,我们每天访问量至少几个亿,不可能把所有访问数据都存起来,然后排序找到90分位的时间是多少。因此,类似这种问题都采用了一些估算的算法来处理,不需要把所有数据都存下来,这里面数学原理比较高端,我们就直接看看prometheus的用法好了。
首先定义histogram:
1 |
h = Histogram('hh', 'A histogram', buckets=(-5, 0, 5)) |
第一个是metrics的名字,第二个是描述,第三个是分桶设置,重点说一下buckets。
这里(-5,0,5)实际划分成了几种桶:<=-5,<=0,<=5,<=无穷大。
如果我们喂给它一个-8:
1 |
h.observe(8) |
那么metrics会这样输出:
1 2 3 4 5 6 7 8 |
# HELP hh A histogram # TYPE hh histogram hh_bucket{le="-5.0"} 0.0 hh_bucket{le="0.0"} 0.0 hh_bucket{le="5.0"} 0.0 hh_bucket{le="+Inf"} 1.0 hh_count 1.0 hh_sum 8.0 |
hh_sum记录了observe的总和,count记录了observe的次数,bucket就是各种桶了,le表示<=某值。
可见,值8<=无穷大,所以只有最后一个桶计数了1次(注意,桶只是计数,bucket作用相当于统计样本在不同区间的出现次数)。
bucket的划分需要我们根据数据的分布拍脑袋指定,合理的划分可以让promql估算百分位的时候更准确,我们使用histogram的时候只需要知道先分好桶,再不断的打点即可,最终百分位的计算可以基于histogram的原始数据完成。
我们不停的随机产生-5到5之间的打点给histogram:
1 2 3 |
while True: # histogram: 任意值, 会给符合条件的bucket增加1次计数 h.observe(random.randint(-5, 5)) |
每次prometheus来scrape抓走当前的metrics长相如下:
1 2 3 4 5 6 7 8 9 10 |
# HELP hh A histogram # TYPE hh histogram hh_bucket{le="-5.0"} 12.0 hh_bucket{le="0.0"} 83.0 hh_bucket{le="5.0"} 153.0 hh_bucket{le="+Inf"} 153.0 hh_count 153.0 hh_sum -16.0 # TYPE hh_created gauge hh_created 1.546499508889123e+09 |
这些都叫做Instant-vector,瞬时向量。
到prometheus后台可以查出来最新的一组hh瞬时向量,只有标签不同(划分区间):
比无穷大小的有52次,说明一共就打点了52次。
比-5小的只有3次,比0小的29次,可以算出-5~0之间的是29-3=26次。
同样可以算出,0~5之间的是52-29=23次。
对这一组瞬时向量可以进行百分位的估算,比如我们要估算90分位的值是多少:
可见90分位的数值大概是3.9657534246575348,也就是90%的打点都比3.9小,感觉还是比较合理的哈,因为我们的随机数的上限就是5。
histogram这种metrics分桶计数的方式,在prometheus服务端做promql估算百分位,其估算准确度受限于分桶的合理性,如果桶分的不好,那估算的值就很不准了,这个大家慢慢摸索吧。
summary
因为histogram在客户端就是简单的分桶和分桶计数,在prometheus服务端基于这么有限的数据做百分位估算,所以的确不是很准确,summary就是解决百分位准确的问题而来的。
summary相当于把服务端的算法放在客户端实现,客户端打点的同时直接计算百分位。
这要求客户端提前定义好你想计算哪些百分位(就像histogram定义好每个桶的区间一样),这样客户端会直接算出精度很高的百分位值,直接给prometheus抓走使用即可。
python客户端没有完整实现summary算法,其实summary把算法搬到了客户端实现带来了很严重的性能问题,因为histogram仅仅是给每个桶做一个原子变量的计数就可以了,而summary要每次执行算法计算出最新的X分位value是多少,算法需要并发保护,所以对并发程序的性能影响就很大了,这可能也是python没实现它的原因之一吧。
另一个原因,可能是因为summary不灵活,因为百分位是提前在客户端里指定的,而histogram则可以通过promql随便指定,虽然计算的不如summary准确,但带来了灵活性。
所以summary就不展开说明了。
服务端用法与原理
接下来说说服务端,我觉得有几个理解意义重大。
关于instant-vector
这两种类型的vector什么时候用很容易乱,我说说我的理解。
最容易产生误解的地方就是:认为画graph需要用range-vector,这是第一次了解prometheus很大的误区。
我们以为画一条曲线需要很多很多的point才能串起来,所以理所当然认为画图应该是用range-vector来取N个point,所以很容易认为应该用cc[5m]这样的range-vector,这是完全错误的!
实际画曲线用的是instant-vector!
prometheus画图的时候,会往过去的时间重复的后退N秒,取每一次的instant-vector出来,最后用这些不同时间点的instant-vector来画线。
这是最新的一条cc_total instant-vector:
保持该promql不变直接切换到graph可以画出曲线,下方可以看到ajax请求,请求告知prometheus,每间隔14秒计算一次promsql得到instant-vector,整个时间窗口是start到end:
返回的数据长这样:
|
{ "status": "success", "data": { "resultType": "matrix", "result": [{ "metric": { "__name__": "cc_total", "instance": "localhost:8000", "job": "prometheus-py" }, "values": [ [1546499402.612, "39"], [1546499416.612, "53"], [1546499430.612, "67"], [1546499444.612, "81"], [1546499458.612, "95"], [1546499472.612, "109"], [1546499486.612, "123"], [1546499500.612, "137"], [1546499514.612, "5"], [1546499528.612, "19"], [1546499542.612, "33"], [1546499556.612, "47"], [1546499570.612, "61"], [1546499584.612, "75"], [1546499598.612, "89"], [1546499612.612, "103"], [1546499626.612, "117"], [1546499640.612, "131"], [1546499654.612, "145"], [1546499668.612, "159"], [1546499682.612, "173"], [1546499696.612, "187"], [1546499710.612, "201"], [1546499724.612, "214"], [1546499738.612, "228"], [1546499752.612, "242"], [1546499766.612, "256"], [1546499780.612, "270"], [1546499794.612, "284"], [1546499808.612, "298"], [1546499822.612, "312"], [1546499836.612, "326"], [1546499850.612, "340"], [1546499864.612, "354"], [1546499878.612, "368"], [1546499892.612, "382"], [1546499906.612, "396"], [1546499920.612, "410"], [1546499934.612, "424"], [1546499948.612, "438"], [1546499962.612, "452"], [1546499976.612, "466"], [1546499990.612, "480"], [1546500004.612, "494"], [1546500018.612, "508"], [1546500032.612, "522"], [1546500046.612, "536"], [1546500060.612, "550"], [1546500074.612, "564"], [1546500088.612, "578"], [1546500102.612, "591"], [1546500116.612, "605"], [1546500130.612, "619"], [1546500144.612, "633"], [1546500158.612, "647"], [1546500172.612, "661"], [1546500186.612, "675"], [1546500200.612, "689"], [1546500214.612, "703"], [1546500228.612, "717"], [1546500242.612, "731"], [1546500256.612, "745"], [1546500270.612, "759"], [1546500284.612, "773"], [1546500298.612, "787"], [1546500312.612, "801"], [1546500326.612, "815"], [1546500340.612, "829"], [1546500354.612, "843"], [1546500368.612, "857"], [1546500382.612, "871"], [1546500396.612, "885"], [1546500410.612, "899"], [1546500424.612, "913"], [1546500438.612, "927"], [1546500452.612, "941"], [1546500466.612, "954"], [1546500480.612, "968"], [1546500494.612, "982"], [1546500508.612, "996"], [1546500522.612, "1010"], [1546500536.612, "1024"], [1546500550.612, "1038"], [1546500564.612, "1052"], [1546500578.612, "1066"], [1546500592.612, "1080"], [1546500606.612, "1094"], [1546500620.612, "1108"], [1546500634.612, "1122"], [1546500648.612, "1136"], [1546500662.612, "1150"], [1546500676.612, "1164"], [1546500690.612, "1178"], [1546500704.612, "1192"], [1546500718.612, "1206"], [1546500732.612, "1220"], [1546500746.612, "1234"], [1546500760.612, "1248"], [1546500774.612, "1262"], [1546500788.612, "1276"], [1546500802.612, "1290"], [1546500816.612, "1304"], [1546500830.612, "1317"], [1546500844.612, "1331"], [1546500858.612, "1345"], [1546500872.612, "1359"], [1546500886.612, "1373"], [1546500900.612, "1387"], [1546500914.612, "1401"], [1546500928.612, "1415"], [1546500942.612, "1429"], [1546500956.612, "1443"], [1546500970.612, "1457"], [1546500984.612, "1471"], [1546500998.612, "1485"], [1546501012.612, "1499"], [1546501026.612, "1513"], [1546501040.612, "1527"], [1546501054.612, "1541"], [1546501068.612, "1555"], [1546501082.612, "1569"], [1546501096.612, "1583"], [1546501110.612, "1597"], [1546501124.612, "1611"], [1546501138.612, "1624"], [1546501152.612, "1638"], [1546501166.612, "1652"], [1546501180.612, "1666"], [1546501194.612, "1680"], [1546501208.612, "1694"], [1546501222.612, "1708"], [1546501236.612, "1722"], [1546501250.612, "1736"], [1546501264.612, "1750"], [1546501278.612, "1764"], [1546501292.612, "1778"], [1546501306.612, "1792"], [1546501320.612, "1806"], [1546501334.612, "1820"], [1546501348.612, "1834"], [1546501362.612, "1848"], [1546501376.612, "1862"], [1546501390.612, "1876"], [1546501404.612, "1890"], [1546501418.612, "1904"], [1546501432.612, "1918"], [1546501446.612, "1932"], [1546501460.612, "1946"], [1546501474.612, "1959"], [1546501488.612, "1973"], [1546501502.612, "1987"], [1546501516.612, "2001"], [1546501530.612, "2015"], [1546501544.612, "2029"], [1546501558.612, "2043"], [1546501572.612, "2057"], [1546501586.612, "2071"], [1546501600.612, "2085"], [1546501614.612, "2099"], [1546501628.612, "2113"], [1546501642.612, "2127"], [1546501656.612, "2141"], [1546501670.612, "2155"], [1546501684.612, "2169"], [1546501698.612, "2183"], [1546501712.612, "2197"], [1546501726.612, "2211"], [1546501740.612, "2225"], [1546501754.612, "2239"], [1546501768.612, "2253"], [1546501782.612, "2267"], [1546501796.612, "2281"], [1546501810.612, "2295"], [1546501824.612, "2309"], [1546501838.612, "2323"], [1546501852.612, "2336"], [1546501866.612, "2350"], [1546501880.612, "2364"], [1546501894.612, "2378"], [1546501908.612, "2392"], [1546501922.612, "2406"], [1546501936.612, "2420"], [1546501950.612, "2434"], [1546501964.612, "2448"], [1546501978.612, "2462"], [1546501992.612, "2476"], [1546502006.612, "2490"], [1546502020.612, "2504"], [1546502034.612, "2518"], [1546502048.612, "2532"], [1546502062.612, "2546"], [1546502076.612, "2560"], [1546502090.612, "2574"], [1546502104.612, "2588"], [1546502118.612, "2602"], [1546502132.612, "2616"], [1546502146.612, "2630"], [1546502160.612, "2644"], [1546502174.612, "2658"], [1546502188.612, "2672"], [1546502202.612, "2686"], [1546502216.612, "2699"], [1546502230.612, "2713"], [1546502244.612, "2727"], [1546502258.612, "2741"], [1546502272.612, "2755"], [1546502286.612, "2769"], [1546502300.612, "2783"], [1546502314.612, "2797"] ] }] } } |
所以,prometheus的API可以按照一定的分辨率(也叫做resolution,这里是14秒),在某个时间区间内,多次执行你传入的promql,计算出一组instant-vector,画成graph。
关于range-vector
那么range-vector有啥用?根据我的了解,它存在的意义就是为了计算输出instant-vector,没有直接使用range-vector的场景。
我们打开https://prometheus.io/docs/prometheus/latest/querying/functions/, 然后搜索range-vector,你会发现只有极少数的promql函数支持range-vector。
比如rate的输入就是range-vector,它基于某个时间之前的N分钟的数据,基于这些数据计算出平均增长速率instant-vector:
既然rate输出的是Instant-vector,那么就可以基于不同的时间点多次执行该promql,得到多个时间点的平均速率,画出graph:
对应的ajax应答如下:
|
{ "status": "success", "data": { "resultType": "matrix", "result": [{ "metric": { "instance": "localhost:8000", "job": "prometheus-py" }, "values": [ [1546499398.215, "0.026396267733028656"], [1546499412.215, "0.07305640195518313"], [1546499426.215, "0.11972317449434353"], [1546499440.215, "0.166386525184505"], [1546499454.215, "0.2130632889961165"], [1546499468.215, "0.25971655888285766"], [1546499482.215, "0.30638659048037653"], [1546499496.215, "0.35305326730473147"], [1546499510.215, "0.396371639525178"], [1546499524.215, "0.44303290027667513"], [1546499538.215, "0.48970388031408923"], [1546499552.215, "0.5363713689466918"], [1546499566.215, "0.5830453701322819"], [1546499580.215, "0.6297159495742595"], [1546499594.215, "0.6763665075604202"], [1546499608.215, "0.7230336145812694"], [1546499622.215, "0.7697039929353097"], [1546499636.215, "0.8163743280804222"], [1546499650.215, "0.8630446270227975"], [1546499664.215, "0.9097015910342392"], [1546499678.215, "0.9563751580971893"], [1546499692.215, "0.9966321858685584"], [1546499706.215, "0.9966355190196854"], [1546499720.215, "0.9966388521931071"], [1546499734.215, "0.9932944265146069"], [1546499748.215, "0.9932711730632883"], [1546499762.215, "0.9932844606164385"], [1546499776.215, "0.9932944265146069"], [1546499790.215, "0.9932944265146069"], [1546499804.215, "0.9932977485251032"], [1546499818.215, "0.9966221865489449"], [1546499832.215, "0.9966321858685584"], [1546499846.215, "0.9966255196331881"], [1546499860.215, "0.9966355190196853"], [1546499874.215, "0.9966321858685584"], [1546499888.215, "0.9966788520132578"], [1546499902.215, "0.9966288527397262"], [1546499916.215, "0.9966255196331881"], [1546499930.215, "0.9966321858685584"], [1546499944.215, "0.9966355190196853"], [1546499958.215, "0.9966388521931071"], [1546499972.215, "0.9966321858685584"], [1546499986.215, "0.9966321858685584"], [1546500000.215, "0.9966388521931071"], [1546500014.215, "0.9966255196331881"], [1546500028.215, "0.9999732448630139"], [1546500042.215, "0.9999832778716072"], [1546500056.215, "0.9999832778716075"], [1546500070.215, "0.9999665563024648"], [1546500084.215, "0.9999732448630139"], [1546500098.215, "0.9966288527397262"], [1546500112.215, "0.9966221865489449"], [1546500126.215, "0.9966388521931071"], [1546500140.215, "0.9966321858685584"], [1546500154.215, "0.9966355190196854"], [1546500168.215, "0.9966221865489446"], [1546500182.215, "0.9966221865489449"], [1546500196.215, "0.996628852739726"], [1546500210.215, "0.9966388521931071"], [1546500224.215, "0.9966388521931071"], [1546500238.215, "0.9966255196331881"], [1546500252.215, "0.9966321858685584"], [1546500266.215, "0.9966288527397262"], [1546500280.215, "0.9966255196331881"], [1546500294.215, "0.996642185388824"], [1546500308.215, "0.9966388521931071"], [1546500322.215, "0.996628852739726"], [1546500336.215, "0.996628852739726"], [1546500350.215, "0.9966355190196853"], [1546500364.215, "0.9966288527397262"], [1546500378.215, "0.9966388521931071"], [1546500392.215, "0.9999732448630139"], [1546500406.215, "0.9999832778716075"], [1546500420.215, "0.9999765891768421"], [1546500434.215, "0.9999732448630139"], [1546500448.215, "0.99997993351304"], [1546500462.215, "0.9999765891768421"], [1546500476.215, "0.9966255196331881"], [1546500490.215, "0.9966188534869955"], [1546500504.215, "0.996642185388824"], [1546500518.215, "0.9966355190196853"], [1546500532.215, "0.996638852193107"], [1546500546.215, "0.9966355190196853"], [1546500560.215, "0.9966221865489449"], [1546500574.215, "0.9966321858685584"], [1546500588.215, "0.9966388521931071"], [1546500602.215, "0.9966221865489449"], [1546500616.215, "0.9966355190196853"], [1546500630.215, "0.9966388521931071"], [1546500644.215, "0.9966355190196853"], [1546500658.215, "0.9966355190196854"], [1546500672.215, "0.9966321858685584"], [1546500686.215, "0.9966355190196854"], [1546500700.215, "0.9966255196331881"], [1546500714.215, "0.9966255196331881"], [1546500728.215, "0.9966288527397262"], [1546500742.215, "0.9966321858685584"], [1546500756.215, "0.9966421853888243"], [1546500770.215, "0.9999765891768421"], [1546500784.215, "0.9999665563024648"], [1546500798.215, "0.9999665563024648"], [1546500812.215, "0.9999699005715545"], [1546500826.215, "0.9966288527397262"], [1546500840.215, "0.9966221865489449"], [1546500854.215, "0.9966321858685584"], [1546500868.215, "0.9966255196331881"], [1546500882.215, "0.9966188534869955"], [1546500896.215, "0.9966355190196854"], [1546500910.215, "0.9966288527397262"], [1546500924.215, "0.9966288527397262"], [1546500938.215, "0.9966355190196854"], [1546500952.215, "0.9966255196331881"], [1546500966.215, "0.9966188534869955"], [1546500980.215, "0.9966188534869955"], [1546500994.215, "0.9966221865489446"], [1546501008.215, "0.9966188534869955"], [1546501022.215, "0.9966255196331881"], [1546501036.215, "0.9966255196331881"], [1546501050.215, "0.9966121874299779"], [1546501064.215, "0.9966288527397262"], [1546501078.215, "0.9966188534869955"], [1546501092.215, "0.9966221865489449"], [1546501106.215, "0.9966255196331881"], [1546501120.215, "0.9999799335130399"], [1546501134.215, "0.9999765891768421"], [1546501148.215, "0.996628852739726"], [1546501162.215, "0.9966388521931071"], [1546501176.215, "0.9966355190196854"], [1546501190.215, "0.996628852739726"], [1546501204.215, "0.9966321858685584"], [1546501218.215, "0.9966321858685584"], [1546501232.215, "0.9966221865489449"], [1546501246.215, "0.9966321858685584"], [1546501260.215, "0.9966221865489449"], [1546501274.215, "0.9966321858685584"], [1546501288.215, "0.9966321858685584"], [1546501302.215, "0.9966388521931071"], [1546501316.215, "0.9966321858685584"], [1546501330.215, "0.9966355190196853"], [1546501344.215, "0.9966321858685584"], [1546501358.215, "0.9966355190196854"], [1546501372.215, "0.9966355190196854"], [1546501386.215, "0.9966355190196854"], [1546501400.215, "0.9966355190196853"], [1546501414.215, "0.9966321858685584"], [1546501428.215, "0.9966355190196853"], [1546501442.215, "0.9999732448630139"], [1546501456.215, "0.9999732448630139"], [1546501470.215, "0.9966255196331884"], [1546501484.215, "0.9966321858685584"], [1546501498.215, "0.9966321858685584"], [1546501512.215, "0.9966321858685584"], [1546501526.215, "0.9966321858685584"], [1546501540.215, "0.9966355190196853"], [1546501554.215, "0.9966355190196853"], [1546501568.215, "0.9966321858685584"], [1546501582.215, "0.9966321858685584"], [1546501596.215, "0.9966355190196854"], [1546501610.215, "0.9966355190196854"], [1546501624.215, "0.9966321858685584"], [1546501638.215, "0.9966355190196854"], [1546501652.215, "0.996642185388824"], [1546501666.215, "0.996628852739726"], [1546501680.215, "0.9966388521931071"], [1546501694.215, "0.9966455186068367"], [1546501708.215, "0.9966355190196854"], [1546501722.215, "0.9966288527397262"], [1546501736.215, "0.9966321858685584"], [1546501750.215, "0.9966321858685584"], [1546501764.215, "0.9966388521931071"], [1546501778.215, "0.9999799335130399"], [1546501792.215, "0.9999832778716072"], [1546501806.215, "0.9999799335130399"], [1546501820.215, "0.999989966655853"], [1546501834.215, "0.9999765891768421"], [1546501848.215, "0.9966388521931071"], [1546501862.215, "0.9966355190196854"], [1546501876.215, "0.9966355190196853"], [1546501890.215, "0.9966321858685584"], [1546501904.215, "0.9966388521931071"], [1546501918.215, "0.9966388521931071"], [1546501932.215, "0.9966288527397262"], [1546501946.215, "0.9966355190196854"], [1546501960.215, "0.996628852739726"], [1546501974.215, "0.996628852739726"], [1546501988.215, "0.9966355190196853"], [1546502002.215, "0.9966321858685584"], [1546502016.215, "0.9966255196331881"], [1546502030.215, "0.9966388521931071"], [1546502044.215, "0.9966321858685584"], [1546502058.215, "0.996628852739726"], [1546502072.215, "0.9966221865489449"], [1546502086.215, "0.9966388521931071"], [1546502100.215, "0.9966321858685584"], [1546502114.215, "0.9966321858685584"], [1546502128.215, "0.9966355190196854"], [1546502142.215, "0.9999832778716075"], [1546502156.215, "0.9999732448630135"], [1546502170.215, "0.9999765891768421"], [1546502184.215, "0.9999665563024648"], [1546502198.215, "0.9999699005715545"], [1546502212.215, "0.9966288527397262"], [1546502226.215, "0.9966188534869955"], [1546502240.215, "0.996628852739726"], [1546502254.215, "0.996628852739726"], [1546502268.215, "0.9966321858685584"], [1546502282.215, "0.9966255196331884"], [1546502296.215, "0.9966355190196853"], [1546502310.215, "0.9966188534869955"], [1546502324.215, "0.9966188534869957"], [1546502338.215, "0.9966188534869957"], [1546502352.215, "0.9966255196331881"], [1546502366.215, "0.996628852739726"], [1546502380.215, "0.9966255196331884"], [1546502394.215, "0.9966455186068367"], [1546502408.215, "0.996642185388824"], [1546502422.215, "0.9966288527397262"], [1546502436.215, "0.996628852739726"], [1546502450.215, "0.9966388521931071"], [1546502464.215, "0.9966455186068367"], [1546502478.215, "0.9966355190196854"], [1546502492.215, "0.9966388521931071"], [1546502506.215, "0.999989966655853"], [1546502520.215, "0.9999765891768421"], [1546502534.215, "0.9999832778716075"], [1546502548.215, "0.999989966655853"], [1546502562.215, "0.999989966655853"], [1546502576.215, "0.9999933110815313"], [1546502590.215, "0.999989966655853"], [1546502604.215, "0.9966321858685584"], [1546502618.215, "0.9966355190196853"], [1546502632.215, "0.996642185388824"], [1546502646.215, "0.9966221865489449"], [1546502660.215, "0.9966355190196853"], [1546502674.215, "0.9966488518471448"], [1546502688.215, "0.9966288527397262"], [1546502702.215, "0.9966455186068367"], [1546502716.215, "0.9966221865489449"], [1546502730.215, "0.9966355190196854"], [1546502744.215, "0.9966388521931071"], [1546502758.215, "0.9966221865489449"], [1546502772.215, "0.9966288527397262"], [1546502786.215, "0.9966355190196853"], [1546502800.215, "0.9966421853888243"], [1546502814.215, "0.9966355190196853"], [1546502828.215, "0.9966321858685584"], [1546502842.215, "0.9966188534869955"], [1546502856.215, "0.9966355190196854"], [1546502870.215, "0.9966221865489449"], [1546502884.215, "0.9966255196331884"] ] }] } } |
再说一个容易陷入误区的理解,就是想当然的认为avg,sum等函数是对range-vector作用的,这是完全错误的!
avg/sum等聚合函数,都是作用在instant-vector上的,它们在一组不同label的instant-vector之间求它们的sum或者求avg,而不是对同一个metrics的range-vector做avg/sum。
instant-vector向量之间运算
不同的instant-vector之间可以做加减乘除运算,比如:rate(cc_total[5m]) / rate(cc_total[6m]) 这样毫无意义的计算,这只是举个栗子。
向量之间运算分为2个过程,先匹配、后计算。
匹配只指通过on/ignoring/group_left/group_right语法,令左侧instant-vector的label和右侧instant-vector的可以基于同样的label以及label value而匹配。比如,左侧vector有一个label叫做port=8080,而右侧vector没有这个label或者label=9999,那么此时可以通过ignoring(port)来忽略掉port,从而实现左右两侧label的匹配。(仅仅语法是这样工作的,实际怎么匹配需要根据需求来定,不是为了匹配而匹配)
匹配的vector之间可以进行后续加减乘除计算,就和mysql的跨表join一样,如果一个左侧vector可以匹配多个右侧vector就用group_right,这叫做one-to-many;如果一个右侧vector可以匹配多个左侧vector就用group_right,这叫做many-to-one。如果一个左侧匹配一个右侧就不用group指令,直接加减乘除运算即可,这就是one-to-one。
prometheus不支持笛卡尔乘积的计算,即不支持many-to-many:
左侧hh_bucket有4个instant-vector,右侧是同样的hh_bucket的4个instant-vector。
如果按instantce标签匹配的话,左侧的4个vector和右侧的4个vector是many-to-many的匹配,所以无法计算。
但如果我让右侧通过avg把4个vector计算成1个vector,然后再用左侧的4个vector做many-to-one到右侧的1个vector,这样就不报错了:
当然,上面的例子没有任何意义,仅仅是语法正确。
最后
关于prometheus先总结这些,后续grafana通过调用prometheus API执行promql得到数据点,可以直接画出漂亮的图表,有了prometheus认识基础就会容易很多。
如果文章帮助您解决了工作难题,您可以帮我点击屏幕上的任意广告,或者赞助少量费用来支持我的持续创作,谢谢~

老师可否转发你的文章
可否转发你的文章
可以呀,注明出处~
没搞懂 puthon的这个prometheus-client到底是干啥的?
没搞懂 puthon的这个prometheus-client到底是干啥的?
理解的很到位,学习了
1