在 Kubernetes 集群中通过 kube-prometheus
部署 Prometheus 后,尝试监控一个 Spring Boot 服务时,发现 Prometheus 的 Targets 页面未显示监控目标。经过多次排查,最终发现问题的根本原因包括:
ServiceMonitor
的 namespaceSelector
和 selector
与 Service 的标签匹配。/actuator/prometheus
端点。prometheus-k8s
ClusterRole 缺少对 services
、endpoints
、pods
的 list
和 watch
权限。kubectl auth can-i get services --as=system:serviceaccount:monitoring:prometheus-k8s -n 【需要监控的namescpace】
app=prometheus
标签删除 Pod 时失败,因实际标签为 app.kubernetes.io/name=prometheus
。更新 prometheus-k8s
ClusterRole,添加对关键资源的访问权限:
yaml
复制
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-k8s
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.54.1
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- nonResourceURLs: ["/metrics", "/metrics/slis"]
verbs: ["get"]
bash
复制
kubectl apply -f prometheus-clusterrole.yaml --force
确保 Prometheus 的 ServiceAccount 有权访问目标命名空间:
yaml
复制
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: prometheus-k8s-epoch-test
namespace: epoch-test
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus-k8s
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: monitoring
Service 配置:
yaml
复制
apiVersion: v1
kind: Service
metadata:
name: kexin-module-system-test-svc
namespace: epoch-test
labels:
project: epoch
run: kexin-module-system-test
spec:
selector:
project: epoch
run: kexin-module-system-test
ports:
- name: http-metrics
port: 8080
targetPort: 8080
ServiceMonitor 配置:
yaml
复制
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kexin-module-system-test-monitor
namespace: monitoring
spec:
endpoints:
- port: http-metrics
path: /actuator/prometheus
interval: 15s
selector:
matchLabels:
project: epoch
run: kexin-module-system-test
namespaceSelector:
matchNames:
- epoch-test
使用正确的标签选择器删除 Pod:
kubectl delete pod -n monitoring -l app.kubernetes.io/name=prometheus
检查 Prometheus Targets:
bash
复制
kubectl port-forward svc/prometheus-k8s -n monitoring 9090:9090
访问 http://localhost:9090/targets
,确认目标状态为 UP。
验证权限:
bash
复制
kubectl auth can-i get services --as=system:serviceaccount:monitoring:prometheus-k8s -n epoch-test
检查 Endpoints:
bash
复制
kubectl get endpoints -n epoch-test kexin-module-system-test-svc
问题类型 | 解决方法 |
---|---|
权限不足 | 更新 ClusterRole,添加services 、endpoints 、pods 的 list/watch 权限 |
标签不匹配 | 使用app.kubernetes.io/name=prometheus 替代 app=prometheus |
ServiceMonitor 未生效 | 确保 Service 的端口名称与 ServiceMonitor 的endpoints.port 一致 |
网络策略限制 | 允许来自monitoring 命名空间的流量 |
micrometer-registry-prometheus