Logging Operator 介绍

Logging operator 基于 CRD 规定和管理日志收集架构,我们通过相关规定的资源可以在 K8S 中轻松的部署日志采集器、日志转发器与相关的日志路由规则

自定义资源

每次修改完相关 CRD 时,对应的资源需要一段时间之后才会收到影响,需要等待一段时间

Logging

Logging 定义用于收集和传输日志消息的集群的日志记录基础架构

  • Logging 包含 Fluent Bit 日志收集器(基于 DemonSet 部署)以及 FluentdSyslog-ng 日志转发器(基于 StatefulSet 部署)的配置,在新版本中,可以使用 FluentbitAgent 代替 Fluent Bit 的配置,将其与日志转发器隔离开
  • 日志收集器(Fluent Bit)作为 Daemonset 部署在节点上,主要用于收集节点上的日志并传入日志转发器
  • 日志转发器可以接收、过滤和转换传入的日志,并将它们传输到一个或多个目标输出,Logging Operator 支持 FluentdSyslog-ng 作为日志转发器,Syslog-ng 支持多线程处理可提供更高的性能,Fluentd 支持丰富的输入输出源以及各种插件,可以根据各种需要选择不同的日志转发器
  • 在创建 Logging 时,会建立 controlNamespace,即 Logging Operator 的管理命名空间,Fluentd|Syslog-ngFluent Bit 部署在此命名空间中,默认情况下,仅在此命名空间中评估诸如 ClusterOutputClusterFlow 之类的全局资源(即使它们在任何其他命名空间,除非 allowClusterResourcesFromAllNamespaces 设置为 true)

Flow

Flow 将选定的日志消息路由到指定的输出,它包含了 FlowClusterFlow

  • Flow 是一个 namespaced 资源,因此仅收集来自相同命名空间的日志。可以指定 match 语句根据 Kubernetes labels、容器和主机名来选择或排除日志(匹配语句按照定义和处理的顺序进行评估,直到第一个匹配的 select 或 exclude 规则应用为止)
  • ClusterFlow 定义了一个没有命名空间限制的 Flow。它也只在 controlNamespace 中有效。 ClusterFlow 从所有命名空间中选择日志
  • FlowClusterFlow 是针对 Fluentd Forwarder 的 CRD 资源,如果我们要使用 Syslog-ng 作为 Forwarder,需要将对应的名称改为:SyslogNGFlowSyslogNGClusterFlow

OutPut

OutPut 是日志转发器将日志消息发送到的目的地,如常用的 Elasticsearch、Loki 或 Kafka 等

  • Output 也是 namespaced 资源,定义了 Flow 可以发送日志消息的输出。意味着只有同一命名空间内的 Flow 可以访问它。可以在这些定义中使用 secrets,但它们也必须位于同一命名空间中。输出是日志转发的最后阶段
  • ClusterOutput 定义没有命名空间限制的输出
  • Flow,如果要使用Syslog-ng,要将对应的名称改为:SyslogNGFlowSyslogNGClusterOutput

官方文档及架构图

https://kube-logging.dev/docs/

Logging Operator 安装

需要提前安装 helm

helm upgrade --install --wait \
--create-namespace --namespace logging \
--set testReceiver.enabled=true \
logging-operator oci://ghcr.io/kube-logging/helm-charts/logging-operator

以上命令除了安装 Logging Operator 外,还会安装一个测试用的 deployment logging-operator-test-receiver,它侦听 HTTP 端口,接收 JSON 消息,并将它们写入标准输出 (stdout),我们在配置日志转发器时,可以接入这个服务,一边检查我们的日志格式是否有问题。

# 查看服务运行情况
kubectl get deploy -n logging

配置日志收集器和日志转发器

在本文中,我们使用 Fluentd 作为日志转发器

Fluentd(CRD:Logging)

这里配置了一个三分片的 Fluentd 并且配置了 pod 亲和性使其不调度在一个节点,使用 pvc 对 buffer 数据进行了持久化

apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
metadata:
name: logging-collector
spec:
controlNamespace: logging
fluentd:
scaling:
replicas: 3
bufferStorageVolume:
pvc:
spec:
storageClassName: 【这里修改成集群的存储类】
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 5Gi
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- fluentd
topologyKey: kubernetes.io/hostname

Fluent-bit(CRD:FluentbitAgent)

这里额外配置了 /data/docker/containers 是因为修改过 docker 的默认 data-root 文件夹,如果默认的 docker 数据保存在 /var/lib/docker 则不需要添加
配置 tolerations 使 daemonset 可以调度到 master 上让我们后续可以收集 kube-system 相关日志

apiVersion: logging.banzaicloud.io/v1beta1
kind: FluentbitAgent
metadata:
name: logging-collector-agent
spec:
extraVolumeMounts:
- source: /data/docker/containers/
destination: /data/docker/containers/
readOnly: true
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"

当我们 apply LoggingFluentbitAgent 时,会启动名为 logging-collector-fluentdFluentd(statefulset)和名为 logging-collector-agent-fluentbitFluent-Bit(daemonset),此时 Fluent-Bit 就会采集节点上的日志到 Fluentd

检查

文件挂载检查

查看 daemonset 的详细信息查看我们配置的额外挂载是否生效

kubectl describe daemonsets.apps -n logging logging-collector-agent-fluentbit
Volumes:
 varlibcontainers:
  Type:          HostPath (bare host directory volume)
  Path:          /var/lib/docker/containers
  HostPathType:
 varlogs:
  Type:          HostPath (bare host directory volume)
  Path:          /var/log
  HostPathType:
 extravolumemount0:
  Type:          HostPath (bare host directory volume)
Path:          /data/docker/containers/
HostPathType:
 config:
  Type:        Secret (a volume populated by a Secret)
  SecretName:  logging-collector-agent-fluentbit
  Optional:    false
 positiondb:
  Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
  Medium:
  SizeLimit:  <unset>
 buffers:
  Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
  Medium:
  SizeLimit:  <unset>

配置检查

kubectl get secret logging-collector-agent-fluentbit -n logging -o jsonpath='{.data.fluent-bit\.conf}'|base64 --decode
[SERVICE]
Flush 1
Grace 5
Daemon Off
Log_Level info
Parsers_File /fluent-bit/etc/parsers.conf
Coro_Stack_Size 24576
storage.path /buffers
[INPUT]
Name tail
DB /tail-db/tail-containers-state.db
DB.locking true
Mem_Buf_Limit 5MB
Parser docker
Path /var/log/containers/*.log
Refresh_Interval 5
Skip_Long_Lines On
Tag kubernetes.*
[FILTER]
Name kubernetes
Buffer_Size 0
K8S-Logging.Exclude On
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Tag_Prefix kubernetes.var.log.containers
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Token_TTL 600
Kube_URL https://kubernetes.default.svc:443
Match kubernetes.*
Merge_Log On
Use_Kubelet Off
[OUTPUT]
Name tcp
Match *
Host logging-collector-syslog-ng.logging.svc.cluster.local.
Port 601
Format json_lines
json_date_key ts
json_date_format iso8601

日志等级(可选)

修改日志级别可以查看采集是否有问题,如下所示将 logLevel 配置为 trace

apiVersion: logging.banzaicloud.io/v1beta1
kind: FluentbitAgent
metadata:
name: logging-collector-agent
spec:
logLevel: trace
......

Fluentd 同上

apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
metadata:
name: logging-collector
spec:
controlNamespace: logging
fluentd:
logLevel: trace
......

Flow 和 OutPut

部署打印测试日志的容器

通过 golang 编写一个一直打印多行错误日志的 Deployment 用于测试,日志格式如下:

2024-03-20 06:31:44.934 ERROR log_test/main.go:19 main.main 日志测试 2024-03-20 06:31:44.934
main.main
        log_test/main.go:19
runtime.main
        runtime/proc.go:271

代码如下:

package main

import (
"os"
"time"

"go.uber.org/zap"
"go.uber.org/zap/zapcore"
)

func init() {
InitZapLogger()
}

var Logger *zap.Logger

func InitZapLogger() {
Logger = zap.New(
zapcore.NewTee(
zapcore.NewCore(
encoderConfig(),
zapcore.AddSync(os.Stdout),
zapcore.DebugLevel,
),
),
zap.Development(),
zap.AddCaller(),
zap.AddStacktrace(zap.ErrorLevel),
)
}

func encoderConfig() zapcore.Encoder {
zapEncode := zapcore.EncoderConfig{
MessageKey: "Message",
LevelKey: "Level",
TimeKey: "Timestamp",
NameKey: "Name",
CallerKey: "Caller",
FunctionKey: "Function",
StacktraceKey: "Stacktrace",
SkipLineEnding: false,
LineEnding: zapcore.DefaultLineEnding,
EncodeLevel: zapcore.CapitalLevelEncoder,
EncodeTime: encodeTime,
EncodeDuration: zapcore.SecondsDurationEncoder,
EncodeCaller: zapcore.ShortCallerEncoder,
EncodeName: zapcore.FullNameEncoder,
NewReflectedEncoder: nil,
ConsoleSeparator: " ",
}
return zapcore.NewConsoleEncoder(zapEncode)
}

func encodeTime(t time.Time, enc zapcore.PrimitiveArrayEncoder) {
enc.AppendString(t.Format("2006-01-02 15:04:05.000"))
}

func main() {
ticker := time.NewTicker(5 * time.Second)
defer func() {
ticker.Stop()
}()
for range ticker.C {
Logger.Sugar().Errorf("日志测试 %s", time.Now().Format("2006-01-02 15:04:05.000"))
}
}
apiVersion: apps/v1
kind: Deployment
metadata:
name: print-logs
labels:
app: print-logs
logging: golang
spec:
selector:
matchLabels:
app: print-logs
replicas: 1
template:
metadata:
labels:
app: print-logs
logging: golang
spec:
containers:
- name: print-logs
image: print-test-log
restartPolicy: Always

配置 Flow 和 OutPut (使用logging-operator-test-receiver测试)

apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
name: test-logging
spec:
match:
- select:
labels:
logging: golang
localOutputRefs:
- test-receiver
---
apiVersion: logging.banzaicloud.io/v1beta1
kind: Output
metadata:
name: test-receiver
spec:
http:
endpoint: http://logging-operator-test-receiver:8080
content_type: application/json
buffer:
type: memory
tags: time
timekey: 1s
timekey_wait: 0s

创建完毕后,会在 default 命名空间创建 Flow:log-generatorOutPut:test-receiver,并将 kubernetes labellogging=golang 的日志传输到logging-operator-test-receiver打印

日志的格式化

多行日志的初步合并

在使用 docker 作为 kubernetes 的容器运行时时,容器日志会将每一行打印的日志拆分开
实际日志为:

2024-03-20 06:31:44.934 ERROR log_test/main.go:19 main.main 日志测试 2024-03-20 06:31:44.934
main.main
        log_test/main.go:19
runtime.main
        runtime/proc.go:271

会被拆分成:

{"log":"2024-03-20 06:31:44.934 ERROR log_test/main.go:19 main.main 日志测试 2024-03-20 06:31:44.934\n","stream":"stdout","time":"2024-03-20T06:29:59.935462228Z"}
{"log":"main.main\n","stream":"stdout","time":"2024-03-20T06:29:59.935581674Z"}
{"log":"\u0009log_test/main.go:19\n","stream":"stdout","time":"2024-03-20T06:29:59.935610097Z"}
{"log":"runtime.main\n","stream":"stdout","time":"2024-03-20T06:29:59.935633418Z"}
{"log":"\u0009runtime/proc.go:271\n","stream":"stdout","time":"2024-03-20T06:29:59.93565774Z"}

需要在日志转发器中先合并相同的 log,再将对应的 log 进行格式化,这里仍然使用 logging-operator-test-receiver 测试

apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
name: test-logging
spec:
match:
- select:
labels:
logging: golang
filters:
- concat:
key: log
multiline_start_regexp: '/^\d{4}-\d{2}-\d{2}/'
multiline_end_regexp: '/\Z/'
separator: ''
flush_interval: 5
localOutputRefs:
- test-receiver
---
apiVersion: logging.banzaicloud.io/v1beta1
kind: Output
metadata:
name: test-receiver
spec:
http:
endpoint: http://logging-operator-test-receiver:8080
content_type: application/json
buffer:
type: memory
tags: time
timekey: 1s
timekey_wait: 0s

运行之后发现,日志已经被合并到了 keylog 的部分

[0] http.0: [[1710923000.932884258, {}], {"log"=>"2024-03-20 08:23:14.937 ERROR log_test/main.go:19 main.main 日志测试 2024-03-20 08:23:14.937
main.main
        D:/GolandProjects/log_test/main.go:19
runtime.main
        C:/Program Files/Go/src/runtime/proc.go:271
", "stream"=>"stdout", "time"=>"2024-03-20T08:23:14.937900447Z", "kubernetes"=>{"pod_name"=>"print-logs-64fb98db85-c4zdz", "namespace_name"=>"default", "pod_id"=>"d72c4042-cb45-4fe1-a684-eb4e56861049", "labels"=>{"app"=>"print-logs", "logging"=>"golang", "pod-template-hash"=>"64fb98db85"}, "annotations"=>{"cni.projectcalico.org/containerID"=>"24c125288decf36a19b5d333569258a45b2f41b2dfbbef407854145234ec7323", "cni.projectcalico.org/podIP"=>"10.244.166.182/32", "cni.projectcalico.org/podIPs"=>"10.244.166.182/32"}, "host"=>"node1", "container_name"=>"print-logs", "docker_id"=>"ffb6313adb1347f828348f128852ba110f608124f3a8aca68b400f238bfde71a","container_hash"=>"print-test-log@sha256:ad1a3e5bb60d81a6b13e8085c618244055c35807e7a05caabf50f77adc7a11e0", "container_image"=>"print-test-log"}}]

如果使用的容器运行时为 containerd,则需要对应的日志为:

2024-04-10T02:15:17.527436711Z stdout F 2024-04-10 02:15:17.527 ERROR log_test/main.go:19 main.main 日志测试 2024-04-10 02:15:17.527
2024-04-10T02:15:17.527496993Z stdout F main.main
2024-04-10T02:15:17.527501224Z stdout F         D:/GolandProjects/log_test/main.go:19
2024-04-10T02:15:17.527503316Z stdout F runtime.main
2024-04-10T02:15:17.527505153Z stdout F         C:/Program Files/Go/src/runtime/proc.go:271

这时,fluentbit 会将后面的部分收集到 key 为 message 中,需要修改 Flow 的合并配置合并的 key 为 message

filters:
- concat:
key: message

日志的字段格式化

2024-03-20 06:31:44.934 ERROR log_test/main.go:19 main.main 日志测试 2024-03-20 06:31:44.934
main.main
        log_test/main.go:19
runtime.main
        runtime/proc.go:271

使用上述日志格式的作为参考编写正则表达式,将我们的数据使用正则命令将其拆分为如下部分

  • time:2024-03-20 08:23:14.937
  • loglevel:ERROR
  • line:log_test/main.go:19
  • func:main.main
  • log部分保持不变

正则表达式如下:(正则测试网站:https://regex101.com/)

/^(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3})(\s+)(?<loglevel>\w+)(\s+)(?<line>[\w\.\:\/\-]+)(\s+)(?<func>[\w\.\:\/\-]+)/

配置日志字段格式化

apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
name: test-logging
spec:
match:
- select:
labels:
logging: golang
filters:
- concat:
key: log
multiline_start_regexp: '/^\d{4}-\d{2}-\d{2}/'
multiline_end_regexp: '/\Z/'
separator: ''
flush_interval: 5
- parser:
remove_key_name_field: false
reserve_data: true
parse:
type: multi_format
patterns:
- format: regexp
expression: /^(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3})(\s+)(?<loglevel>\w+)(\s+)(?<line>[\w\.\:\/\-]+)(\s+)(?<func>[\w\.\:\/\-]+)/
localOutputRefs:
- test-receiver
---
apiVersion: logging.banzaicloud.io/v1beta1
kind: Output
metadata:
name: test-receiver
spec:
http:
endpoint: http://logging-operator-test-receiver:8080
content_type: application/json
buffer:
type: memory
tags: time
timekey: 1s
timekey_wait: 0s

运行效果:

    [0] http.0: [[1710928830.558128299, {}], {"log"=>"2024-03-20 10:00:24.938 ERROR log_test/main.go:19 main.main 日志测试 2024-03-20 10:00:24.938
    main.main
            D:/GolandProjects/log_test/main.go:19
    runtime.main
            C:/Program Files/Go/src/runtime/proc.go:271
    ", "stream"=>"stdout", "time"=>"2024-03-20T10:00:24.939195916Z", "kubernetes"=>{"pod_name"=>"print-logs-64fb98db85-c4zdz", "namespace_name"=>"default", "pod_id"=>"d72c4042-cb45-4fe1-a684-eb4e56861049", "labels"=>{"app"=>"print-logs", "logging"=>"golang", "pod-template-hash"=>"64fb98db85"}, "annotations"=>{"cni.projectcalico.org/containerID"=>"24c125288decf36a19b5d333569258a45b2f41b2dfbbef407854145234ec7323", "cni.projectcalico.org/podIP"=>"10.244.166.182/32", "cni.projectcalico.org/podIPs"=>"10.244.166.182/32"}, "host"=>"node1", "container_name"=>"print-logs", "docker_id"=>"ffb6313adb1347f828348f128852ba110f608124f3a8aca68b400f238bfde71a", "container_hash"=>"print-test-log@sha256:ad1a3e5bb60d81a6b13e8085c618244055c35807e7a05caabf50f77adc7a11e0", "container_image"=>"print-test-log"}, "loglevel"=>"ERROR", "line"=>"log_test/main.go:19", "func"=>"main.main"}]

删除不需要的字段

看上述日志发现其中有很多数据我们都不需要,可以使用 record_transformer 删除不需要的字段

apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
name: test-logging
spec:
match:
- select:
labels:
logging: golang
filters:
- concat:
key: log
multiline_start_regexp: '/^\d{4}-\d{2}-\d{2}/'
multiline_end_regexp: '/\Z/'
separator: ''
flush_interval: 5
- parser:
remove_key_name_field: false
reserve_data: true
parse:
type: multi_format
patterns:
- format: regexp
expression: /^(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3})(\s+)(?<loglevel>\w+)(\s+)(?<line>[\w\.\:\/\-]+)(\s+)(?<func>[\w\.\:\/\-]+)/
- record_transformer:
remove_keys: '$.kubernetes.pod_id,$.kubernetes.annotations,$.kubernetes.labels,$.kubernetes.docker_id,$.kubernetes.container_hash,$.kubernetes.container_image'
localOutputRefs:
- test-receiver
---
apiVersion: logging.banzaicloud.io/v1beta1
kind: Output
metadata:
name: test-receiver
spec:
http:
endpoint: http://logging-operator-test-receiver:8080
content_type: application/json
buffer:
type: memory
tags: time
timekey: 1s
timekey_wait: 0s

日志输出到 ElasticSearch

ElasticSearch OutPut 配置

在 default 命名空间创建 elastic 密码的 secret

kubectl create secret generic elastic-password --from-literal=password='密码'
apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
name: test-logging
spec:
match:
- select:
labels:
logging: golang
filters:
- concat:
key: log
multiline_start_regexp: '/^\d{4}-\d{2}-\d{2}/'
multiline_end_regexp: '/\Z/'
separator: ''
flush_interval: 5
- parser:
remove_key_name_field: false
reserve_data: true
parse:
type: multi_format
patterns:
- format: regexp
expression: /^(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3})(\s+)(?<loglevel>\w+)(\s+)(?<line>[\w\.\:\/\-]+)(\s+)(?<func>[\w\.\:\/\-]+)/
- record_transformer:
remove_keys: '$.kubernetes.pod_id,$.kubernetes.annotations,$.kubernetes.labels,$.kubernetes.docker_id,$.kubernetes.container_hash,$.kubernetes.container_image'
localOutputRefs:
- elastic-receiver
---
apiVersion: logging.banzaicloud.io/v1beta1
kind: Output
metadata:
name: elastic-receiver
spec:
elasticsearch:
host: 10.0.16.2
port: 9200
logstash_format: true
logstash_prefix: my-test
scheme: http
user: elastic
password:
valueFrom:
secretKeyRef:
name: elastic-password
key: password
buffer:
timekey: 1m
timekey_wait: 30s
timekey_use_utc: true

使用动态的索引名称

如果需要配置动态索引名称,需要在 buffer 中添加对应的 key 值,如我们需要在索引名称中添加命名空间与容器名称,配置如下:

apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
name: test-logging
spec:
match:
- select:
labels:
logging: golang
filters:
- concat:
key: log
multiline_start_regexp: '/^\d{4}-\d{2}-\d{2}/'
multiline_end_regexp: '/\Z/'
separator: ''
flush_interval: 5
- parser:
remove_key_name_field: false
reserve_data: true
parse:
type: multi_format
patterns:
- format: regexp
expression: /^(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3})(\s+)(?<loglevel>\w+)(\s+)(?<line>[\w\.\:\/\-]+)(\s+)(?<func>[\w\.\:\/\-]+)/
- record_transformer:
remove_keys: '$.kubernetes.pod_id,$.kubernetes.annotations,$.kubernetes.labels,$.kubernetes.docker_id,$.kubernetes.container_hash,$.kubernetes.container_image'
localOutputRefs:
- elastic-receiver
---
apiVersion: logging.banzaicloud.io/v1beta1
kind: Output
metadata:
name: elastic-receiver
spec:
elasticsearch:
host: 10.0.16.2
port: 9200
logstash_format: true
logstash_prefix: my-test-${$.kubernetes.namespace_name}-${$.kubernetes.container_name}
scheme: http
user: elastic
password:
valueFrom:
secretKeyRef:
name: elastic-password
key: password
buffer:
tags: tag,time,$.kubernetes.namespace_name,$.kubernetes.container_name
timekey: 1m
timekey_wait: 30s
timekey_use_utc: true

ElasticSearch 使用数据流模式

apiVersion: logging.banzaicloud.io/v1beta1
kind: Output
metadata:
name: elastic-receiver
spec:
elasticsearch:
host: 10.0.16.2
port: 9200
logstash_format: false
index_name: my-test-${$.kubernetes.namespace_name}-${$.kubernetes.container_name}
include_timestamp: true
data_stream_enable: true
data_stream_name: my-test-${$.kubernetes.namespace_name}-${$.kubernetes.container_name}
data_stream_ilm_name: my-test
data_stream_template_name: my-test
scheme: https
ssl_verify: false
ssl_version: TLSv1_2
user: elastic
log_es_400_reason: true
default_elasticsearch_version: "8.10.4"
password:
valueFrom:
secretKeyRef:
name: elastic-password
key: password
buffer:
tags: tag,time,$.kubernetes.namespace_name,$.kubernetes.container_name
timekey: 1m
timekey_wait: 1m
timekey_use_utc: true