First touch Telegraf + InfluxDB
Telegraf 的 plugins 非常多,配置很灵活,参考文档使用。
写入 InfluxDB 前,需要手动配置数据库的 RP (Retention Policy) https://docs.influxdata.com/influxdb/v1.8/query_language/manage-database/#retention-policy-management
CREATE RETENTION POLICY "7d" on "telegraf" DURATION 168h REPLICATION 1 DEFAULT
Telegraf config
[agent]
flush_interval = "3s"
precision = "1ms" # 设置时间精度
[[inputs.kafka_consumer]]
brokers = [ "kafka0:9092", "kafka1:9092", "kafka2:9092" ]
topics = ["uplog_log_hub"]
consumer_group = "telegraf_log_hub_20220511"
offset = "newest"
name_override = "log-hub" # 覆盖 meaturement 名称
data_format = "json"
json_time_key = "@timestamp"
json_time_format = "unix_ms"
tag_keys = ["log_time", "ip"]
json_string_fields = ["error", "date", "h", "m", "path"]
[[outputs.influxdb]]
urls = ["http://influxdb.prometheus.svc.cluster.fud3:8086"]
retention_policy = "7d" # 保留策略的名称
Continuous Query
为了方便 grafana 作告警监控,使用连续查询聚合 5min 的数据,写入新的 measurement 。
CREATE CONTINUOUS QUERY "query5m" ON "telegraf"
BEGIN
SELECT count(date)
INTO telegraf."7d"."log-hub-count-5m"
FROM telegraf."7d"."log-hub"
GROUP BY time(5m) fill(0)
END
Errors
运行三天后 telegraf 报如下错误
2022-05-15T02:21:33Z E! [outputs.influxdb] E! [outputs.influxdb] Failed to write metric (will be dropped: 400 Bad Request): partial write: max-series-per-database limit exceeded: (1000000) dropped=11
2022-05-15T02:21:36Z E! [outputs.influxdb] E! [outputs.influxdb] Failed to write metric (will be dropped: 400 Bad Request): partial write: max-series-per-database limit exceeded: (1000000) dropped=13
2022-05-15T02:21:39Z E! [outputs.influxdb] E! [outputs.influxdb] Failed to write metric (will be dropped: 400 Bad Request): partial write: max-series-per-database limit exceeded: (1000000) dropped=15
2022-05-15T02:21:42Z E! [outputs.influxdb] E! [outputs.influxdb] Failed to write metric (will be dropped: 400 Bad Request): partial write: max-series-per-database limit exceeded: (1000000) dropped=13
临时解决办法:
InfluxDB 配置解除 max-series-per-database
限制,启动时添加环境变量 INFLUXDB_DATA_MAX_SERIES_PER_DATABASE=0