环境准备
已经部署好的Kubernetes环境(这里不在阐述部署过程)
主机环境:
192.168.1.20 kube-master
192.168.1.21 node1
192.168.1.22 node2
部署ingress
1、ingress必须避暑,否则部署Prometheus、Grafana、node_exporter无法访问,如果贝蒂有Harbor私有仓库,可将此镜像上转到私有仓库,这里未部署所以各个节点需要下载此镜像
docker pull k8s.gcr.io/ingress-nginx/controller:v0.41.2
2、准备ingress的yaml文件,如下
cat /root/nginx-ingress-controller.yaml
apiVersion: v1
kind: Namespace
metadata:
name: ingress-nginx
labels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
---
# Source: ingress-nginx/templates/controller-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx
namespace: ingress-nginx
---
# Source: ingress-nginx/templates/controller-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx-controller
namespace: ingress-nginx
data:
---
# Source: ingress-nginx/templates/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
name: ingress-nginx
rules:
- apiGroups:
- ''
resources:
- configmaps
- endpoints
- nodes
- pods
- secrets
verbs:
- list
- watch
- apiGroups:
- ''
resources:
- nodes
verbs:
- get
- apiGroups:
- ''
resources:
- services
verbs:
- get
- list
- update
- watch
- apiGroups:
- extensions
- networking.k8s.io # k8s 1.14+
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ''
resources:
- events
verbs:
- create
- patch
- apiGroups:
- extensions
- networking.k8s.io # k8s 1.14+
resources:
- ingresses/status
verbs:
- update
- apiGroups:
- networking.k8s.io # k8s 1.14+
resources:
- ingressclasses
verbs:
- get
- list
- watch
---
# Source: ingress-nginx/templates/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
name: ingress-nginx
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: ingress-nginx
subjects:
- kind: ServiceAccount
name: ingress-nginx
namespace: ingress-nginx
---
# Source: ingress-nginx/templates/controller-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx
namespace: ingress-nginx
rules:
- apiGroups:
- ''
resources:
- namespaces
verbs:
- get
- apiGroups:
- ''
resources:
- configmaps
- pods
- secrets
- endpoints
verbs:
- get
- list
- watch
- apiGroups:
- ''
resources:
- services
verbs:
- get
- list
- update
- watch
- apiGroups:
- extensions
- networking.k8s.io # k8s 1.14+
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- extensions
- networking.k8s.io # k8s 1.14+
resources:
- ingresses/status
verbs:
- update
- apiGroups:
- networking.k8s.io # k8s 1.14+
resources:
- ingressclasses
verbs:
- get
- list
- watch
- apiGroups:
- ''
resources:
- configmaps
resourceNames:
- ingress-controller-leader-nginx
verbs:
- get
- update
- apiGroups:
- ''
resources:
- configmaps
verbs:
- create
- apiGroups:
- ''
resources:
- endpoints
verbs:
- create
- get
- update
- apiGroups:
- ''
resources:
- events
verbs:
- create
- patch
---
# Source: ingress-nginx/templates/controller-rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx
namespace: ingress-nginx
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: ingress-nginx
subjects:
- kind: ServiceAccount
name: ingress-nginx
namespace: ingress-nginx
---
# Source: ingress-nginx/templates/controller-service-webhook.yaml
apiVersion: v1
kind: Service
metadata:
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx-controller-admission
namespace: ingress-nginx
spec:
type: ClusterIP
ports:
- name: https-webhook
port: 443
targetPort: webhook
selector:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
---
# Source: ingress-nginx/templates/controller-service.yaml
apiVersion: v1
kind: Service
metadata:
annotations:
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
type: NodePort
externalTrafficPolicy: Local
ports:
- name: http
port: 80
protocol: TCP
targetPort: http
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
---
# Source: ingress-nginx/templates/controller-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
selector:
matchLabels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
revisionHistoryLimit: 10
minReadySeconds: 0
template:
metadata:
labels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
spec:
hostNetwork: true
dnsPolicy: ClusterFirst
containers:
- name: controller
image: k8s.gcr.io/ingress-nginx/controller:v0.41.2
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /wait-shutdown
args:
- /nginx-ingress-controller
- --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
- --election-id=ingress-controller-leader
- --ingress-class=nginx
- --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
- --validating-webhook=:8443
- --validating-webhook-certificate=/usr/local/certificates/cert
- --validating-webhook-key=/usr/local/certificates/key
securityContext:
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
runAsUser: 101
allowPrivilegeEscalation: true
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: LD_PRELOAD
value: /usr/local/lib/libmimalloc.so
livenessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 3
ports:
- name: http
containerPort: 80
protocol: TCP
- name: https
containerPort: 443
protocol: TCP
- name: webhook
containerPort: 8443
protocol: TCP
volumeMounts:
- name: webhook-cert
mountPath: /usr/local/certificates/
readOnly: true
resources:
requests:
cpu: 100m
memory: 90Mi
nodeSelector:
kubernetes.io/os: linux
serviceAccountName: ingress-nginx
terminationGracePeriodSeconds: 300
volumes:
- name: webhook-cert
secret:
secretName: ingress-nginx-admission
---
# Source: ingress-nginx/templates/admission-webhooks/validating-webhook.yaml
# before changing this value, check the required kubernetes version
# https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#prerequisites
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
name: ingress-nginx-admission
webhooks:
- name: validate.nginx.ingress.kubernetes.io
matchPolicy: Equivalent
rules:
- apiGroups:
- networking.k8s.io
apiVersions:
- v1beta1
operations:
- CREATE
- UPDATE
resources:
- ingresses
failurePolicy: Fail
sideEffects: None
admissionReviewVersions:
- v1
- v1beta1
clientConfig:
service:
namespace: ingress-nginx
name: ingress-nginx-controller-admission
path: /networking/v1beta1/ingresses
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: ingress-nginx-admission
annotations:
helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
namespace: ingress-nginx
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: ingress-nginx-admission
annotations:
helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
rules:
- apiGroups:
- admissionregistration.k8s.io
resources:
- validatingwebhookconfigurations
verbs:
- get
- update
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: ingress-nginx-admission
annotations:
helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: ingress-nginx-admission
subjects:
- kind: ServiceAccount
name: ingress-nginx-admission
namespace: ingress-nginx
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: ingress-nginx-admission
annotations:
helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
namespace: ingress-nginx
rules:
- apiGroups:
- ''
resources:
- secrets
verbs:
- get
- create
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: ingress-nginx-admission
annotations:
helm.sh/hook: pre-install,pre-upgrade,post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
namespace: ingress-nginx
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: ingress-nginx-admission
subjects:
- kind: ServiceAccount
name: ingress-nginx-admission
namespace: ingress-nginx
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/job-createSecret.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: ingress-nginx-admission-create
annotations:
helm.sh/hook: pre-install,pre-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
namespace: ingress-nginx
spec:
template:
metadata:
name: ingress-nginx-admission-create
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
spec:
containers:
- name: create
image: docker.io/jettech/kube-webhook-certgen:v1.5.0
imagePullPolicy: IfNotPresent
args:
- create
- --host=ingress-nginx-controller-admission,ingress-nginx-controller-admission.$(POD_NAMESPACE).svc
- --namespace=$(POD_NAMESPACE)
- --secret-name=ingress-nginx-admission
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
restartPolicy: OnFailure
serviceAccountName: ingress-nginx-admission
securityContext:
runAsNonRoot: true
runAsUser: 2000
---
# Source: ingress-nginx/templates/admission-webhooks/job-patch/job-patchWebhook.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: ingress-nginx-admission-patch
annotations:
helm.sh/hook: post-install,post-upgrade
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
namespace: ingress-nginx
spec:
template:
metadata:
name: ingress-nginx-admission-patch
labels:
helm.sh/chart: ingress-nginx-3.10.1
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.41.2
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: admission-webhook
spec:
containers:
- name: patch
image: docker.io/jettech/kube-webhook-certgen:v1.5.0
imagePullPolicy: IfNotPresent
args:
- patch
- --webhook-name=ingress-nginx-admission
- --namespace=$(POD_NAMESPACE)
- --patch-mutating=false
- --secret-name=ingress-nginx-admission
- --patch-failure-policy=Fail
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
restartPolicy: OnFailure
serviceAccountName: ingress-nginx-admission
securityContext:
runAsNonRoot: true
runAsUser: 2000
3、启动ingress
kubectl apply -f /root/nginx-ingress-controller.yaml
4、查看ingress启动pod(注意如果出现如下ingress的ingress-nginx-admission-patch的pod运行一会儿自己删除了,自己不必在乎这个原因)
kubectl get pod -n ingress-nginx
准备Prometheus、node_exporter、Grafana镜像
这里各个节点都需要进行下载如下镜像(也可以一台主机下载后,docker save -o镜像打包,然后拷贝到别的节点主机执行docker load -i 导入镜像)
docker pull prom/prometheus:v2.26.0
docker pull prom/node-exporter:latest
docker pull grafana/grafana:9.3.2
准备需要的yaml文件
创建prometheus目录并进入
mkdir /var/lib/grafana
chmod 777 /var/lib/grafana
mkdir /prometheus
cd /prometheus
1、alertmanager-configmap.yaml
cat > alertmanager-configmap.yaml << EOF
apiVersion: v1
kind: ConfigMap
metadata:
# 配置文件名称
name: alertmanager
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
alertmanager.yml: |
global:
resolve_timeout: 5m
# 告警自定义邮件
smtp_smarthost: 'smtp.163.com:25'
smtp_from: 'wgkgood@163.com'
smtp_auth_username: 'wgkgood@163.com'
smtp_auth_password: 'FNGDCYDBFSXRLUUC'
receivers:
- name: default-receiver
email_configs:
- to: "wgkgood@163.com"
route:
group_interval: 1m
group_wait: 10s
receiver: default-receiver
repeat_interval: 1m
EOF
2、alertmanager-pvc.yaml
cat > alertmanager-pvc.yaml << EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: alertmanager
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
spec:
# 使用自己的动态PV
storageClassName: managed-nfs-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "2Gi"
EOF
3、alertmanager-deployment.yaml
cat > alertmanager-deployment.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
name: alertmanager
name: alertmanager
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: alertmanager
template:
metadata:
labels:
app: alertmanager
spec:
containers:
- image: prom/alertmanager:v0.16.1
name: alertmanager
ports:
- containerPort: 9093
protocol: TCP
volumeMounts:
- mountPath: "/alertmanager"
name: data
- mountPath: "/etc/alertmanager"
name: config-volume
resources:
requests:
cpu: 50m
memory: 50Mi
limits:
cpu: 200m
memory: 200Mi
volumes:
- name: data
emptyDir: {}
- name: config-volume
configMap:
name: alertmanager
---
apiVersion: v1
kind: Service
metadata:
labels:
app: alertmanager
annotations:
prometheus.io/scrape: 'true'
name: alertmanager
namespace: kube-system
spec:
type: NodePort
ports:
- port: 9093
targetPort: 9093
nodePort: 31113
selector:
app: alertmanager
EOF
4、grafana-deploy.yaml
cat > grafana-deploy.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana-core
namespace: kube-system
labels:
app: grafana
component: core
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
component: core
spec:
containers:
- image: grafana/grafana:9.3.2
name: grafana-core
imagePullPolicy: IfNotPresent
# env:
resources:
# keep request = limit to keep this container in guaranteed class
limits:
cpu: 100m
memory: 500Mi
requests:
cpu: 100m
memory: 500Mi
env:
# The following env variables set up basic auth twith the default admin user and admin password.
- name: GF_AUTH_BASIC_ENABLED
value: "true"
- name: GF_AUTH_ANONYMOUS_ENABLED
value: "false"
# - name: GF_AUTH_ANONYMOUS_ORG_ROLE
# value: Admin
# does not really work, because of template variables in exported dashboards:
# - name: GF_DASHBOARDS_JSON_ENABLED
# value: "true"
readinessProbe:
httpGet:
path: /login
port: 3000
# initialDelaySeconds: 30
# timeoutSeconds: 1
volumeMounts:
- name: grafana-persistent-storage
mountPath: /var
- name: grafana-plugins
mountPath: /var/lib/grafana
volumes:
- name: grafana-persistent-storage
- name: grafana-plugins
emptyDir: {}
EOF
5、grafana-service.yaml
cat > grafana-service.yaml << EOF
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: kube-system
labels:
app: grafana
component: core
spec:
type: NodePort
ports:
- port: 3000
selector:
app: grafana
component: core
EOF
6、configmap.yaml
cat > configmap.yaml << EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: kube-system
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
- job_name: 'kubernetes-cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- job_name: 'kubernetes-services'
kubernetes_sd_configs:
- role: service
metrics_path: /probe
params:
module: [http_2xx]
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter.example.com:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
- job_name: 'kubernetes-ingresses'
kubernetes_sd_configs:
- role: ingress
relabel_configs:
- source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
regex: (.+);(.+);(.+)
replacement: ${1}://${2}${3}
target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter.example.com:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_ingress_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_ingress_name]
target_label: kubernetes_name
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
EOF
7、node-exporter.yaml
cat > node-exporter.yaml << EOF
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: kube-system
labels:
k8s-app: node-exporter
spec:
selector:
matchLabels:
k8s-app: node-exporter
template:
metadata:
labels:
k8s-app: node-exporter
spec:
containers:
- image: prom/node-exporter
name: node-exporter
ports:
- containerPort: 9100
protocol: TCP
name: http
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: node-exporter
name: node-exporter
namespace: kube-system
spec:
ports:
- name: http
port: 9100
nodePort: 31672
protocol: TCP
type: NodePort
selector:
k8s-app: node-exporter
EOF
8、prometheus.deploy.yml
cat > prometheus.deploy.yml << EOF
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
name: prometheus-deployment
name: prometheus
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- image: prom/prometheus:v2.26.0
name: prometheus
command:
- "/bin/prometheus"
args:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention=24h"
ports:
- containerPort: 9090
protocol: TCP
volumeMounts:
- mountPath: "/prometheus"
name: data
- mountPath: "/etc/prometheus"
name: config-volume
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 500m
memory: 2500Mi
serviceAccountName: prometheus
volumes:
- name: data
emptyDir: {}
- name: config-volume
configMap:
name: prometheus-config
EOF
9、prometheus.svc.yaml
cat > prometheus.svc.yaml << EOF
kind: Service
apiVersion: v1
metadata:
labels:
app: prometheus
name: prometheus
namespace: kube-system
spec:
type: NodePort
ports:
- port: 9090
targetPort: 9090
nodePort: 30003
selector:
app: prometheus
EOF
10、prometheus-rules.yaml
cat > prometheus-rules.yaml << EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-rules
namespace: kube-system
data:
# 通用角色
general.rules: |
groups:
- name: general.rules
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: error
annotations:
summary: "Instance {{ $labels.instance }} 停止工作"
description: "{{ $labels.instance }} job {{ $labels.job }} 已经停止5分钟以上."
# Node对所有资源的监控
node.rules: |
groups:
- name: node.rules
rules:
- alert: NodeFilesystemUsage
expr: 100 - (node_filesystem_free_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 80
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} : {{ $labels.mountpoint }} 分区使用率过高"
description: "{{ $labels.instance }}: {{ $labels.mountpoint }} 分区使用大于80% (当前值: {{ $value }})"
- alert: NodeMemoryUsage
expr: 100 - (node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 100 > 80
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} 内存使用率过高"
description: "{{ $labels.instance }}内存使用大于80% (当前值: {{ $value }})"
- alert: NodeCPUUsage
expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100) > 60
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} CPU使用率过高"
description: "{{ $labels.instance }}CPU使用大于60% (当前值: {{ $value }})"
EOF
11、rbac-setup.yaml
cat > rbac-setup.yaml << EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups:
- extensions
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: kube-system
EOF
执行yaml启动pod
操作如下:
for i in alertmanager-configmap.yaml alertmanager-deployment.yaml alertmanager-pvc.yaml configmap.yaml grafana-deploy.yaml grafana-service.yaml node-exporter.yaml prometheus.deploy.yml prometheus-rules.yaml prometheus.svc.yaml rbac-setup.yaml ;do kubectl apply -f $i ;sleep 3 ;done
查看启动pod过程状态
kubectl get pod -n kube-system -o wide
查看service
kubectl get service -n kube-system -o wide
访问端口查看服务
web查看node_exporter
浏览器访问:http://192.168.1.20:31672/metrics
如下图为node_exporter的信息
web查看Prometheus服务
浏览器访问:http://192.168.1.20:30003/classic/graph
如下图为Prometheus的web界面,可以看到node_exporter的注册信息,显示的集群状态
web访问Grafana服务
浏览器访问:http://192.168.1.20:30569
如下图为Grafana的界面,注意初始账号和密码均为“admin”,首次登录会提示修改密码,按照操作即可
Grafana设置
选择数据源
1、添加数据源,点击下图所示
2、点击Prometheus数据源
3、填写数据源信息,如下所示
4、添加模板,这里选择官方定义好的模板,按照选题输入url或者id也可以导入json文件,这里采用3id方式进行导入模板(选择官方模板地址:https://grafana.com/grafana/dashboards)
5、查看模板,创建成功