淄博网站建设卓迅网络,网站建设少用控件,网站开发实训周报,国际新闻最新消息今天20230.文章系列链接
SLS机器学习介绍#xff08;01#xff09;#xff1a;时序统计建模SLS机器学习介绍#xff08;02#xff09;#xff1a;时序聚类建模SLS机器学习介绍#xff08;03#xff09;#xff1a;时序异常检测建模SLS机器学习介绍#xff08;04#xff09;…0.文章系列链接
SLS机器学习介绍01时序统计建模SLS机器学习介绍02时序聚类建模SLS机器学习介绍03时序异常检测建模SLS机器学习介绍04规则模式挖掘SLS机器学习介绍05时间序列预测一眼看尽上亿日志-SLS智能聚类(LogReduce)发布SLS机器学习最佳实战时序异常检测和报警SLS机器学习最佳实战时序预测1.手中的锤子都有啥
围绕日志挖掘其中更大价值一直是我们团队所关注。在原有日志实时查询基础上今年SLS在DevOps领域完善了如下功能
上下文查询实时Tail和智能聚类以提高问题调查效率提供多种时序数据的异常检测和预测函数来做更智能的检查和预测数据分析的结果可视化强大的告警设置和通知通过调用webhook进行关联行动
今天我们重点介绍下日志只能聚类和异常告警如何配合更好的进行异常发现和告警
2.平台实验
2.1 实验数据
一份Sys Log的原始数据并且开启了日志聚类服务具体的状态截图如下
通过调整下面截图中红色框1的大小可以改变图中红色框2的结果但是对于每个最细粒度的pattern并不会改变也就是说子Pattern的结果是稳定且唯一的我们可以通过子Pattern的Signature找到对应的原始日志条目。
2.2 生成子模式的时序信息
假设我们对这个子Pattern要进行监控 msg:vm-111932.tc su: pam_unix(*:session): session closed for user root 对应的 signature_id : __log_signature__: 1814836459146662485 我们得到了上述pattern对应的原始日志可以看下具体的数量在时间轴上的直返图
上图中我们可以发现这个模式的日志分布不是很均衡其中还有一些是没有的如果直接按照时间窗口统计数量得到的时序图如下
__log_signature__: 1814836459146662485 |
select date_trunc(minute, __time__) as time, COUNT(*) as num
from log GROUP BY time order by time ASC limit 10000 上述图中我们发现时间上并不是连续的。因此我们需要对这条时序进行补点操作。 __log_signature__: 1814836459146662485 |
select time_series(time, 1m, %Y-%m-%d %H:%i:%s, 0) as time, avg(num) as num
from ( select __time__ - __time__ % 60 as time, COUNT(*) as num from log GROUP BY time order by time desc )
GROUP by time order by time ASC limit 10000 2.3 对时序进行异常检测
使用时序异常检测函数 ts_predicate_arma
__log_signature__: 1814836459146662485 |
select ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, avg)
from ( select time_series(time, 1m, %Y-%m-%d %H:%i:%s, 0) as time, avg(num) as num from ( select __time__ - __time__ % 60 as time, COUNT(*) as num from log GROUP BY time order by time desc ) GROUP by time order by time ASC ) limit 10000 2.4 告警该如何设置
将机器学习函数的结果拆解开
__log_signature__: 1814836459146662485 |
select t1[1] as unixtime, t1[2] as src, t1[3] as pred, t1[4] as up, t1[5] as lower, t1[6] as prob
from ( select ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, avg) as res from ( select time_series(time, 1m, %Y-%m-%d %H:%i:%s, 0) as time, avg(num) as num from ( select __time__ - __time__ % 60 as time, COUNT(*) as num from log GROUP BY time order by time desc ) GROUP by time order by time ASC )) , unnest(res) as t(t1) 针对最近两分钟的结果进行告警
__log_signature__: 1814836459146662485 |
select unixtime, src, pred, up, lower, prob
from ( select t1[1] as unixtime, t1[2] as src, t1[3] as pred, t1[4] as up, t1[5] as lower, t1[6] as prob from ( select ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, avg) as res from ( select time_series(time, 1m, %Y-%m-%d %H:%i:%s, 0) as time, avg(num) as num from ( select __time__ - __time__ % 60 as time, COUNT(*) as num from log GROUP BY time order by time desc ) GROUP by time order by time ASC )) , unnest(res) as t(t1) ) where is_nan(src) false order by unixtime desc limit 2 针对上升点进行告警并设置兜底策略
__log_signature__: 1814836459146662485 |
select sum(prob) as sumProb, max(src) as srcMax, max(up) as upMax
from ( select unixtime, src, pred, up, lower, prob from ( select t1[1] as unixtime, t1[2] as src, t1[3] as pred, t1[4] as up, t1[5] as lower, t1[6] as prob from ( select ts_predicate_arma(to_unixtime(time), num, 5, 1, 1, 1, avg) as res from ( select time_series(time, 1m, %Y-%m-%d %H:%i:%s, 0) as time, avg(num) as num from ( select __time__ - __time__ % 60 as time, COUNT(*) as num from log GROUP BY time order by time desc ) GROUP by time order by time ASC )) , unnest(res) as t(t1) ) where is_nan(src) false order by unixtime desc limit 2 ) 具体的告警设置如下 3.硬广时间
3.1 日志进阶
这里是日志服务的各种功能的演示 日志服务整体介绍各种Demo
原文链接 本文为云栖社区原创内容未经允许不得转载。