Bug 1679500
Summary: | Failed to attach PVs for monitoring | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Junqi Zhao <juzhao> | ||||||
Component: | Monitoring | Assignee: | Sergiusz Urbaniak <surbania> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 4.1.0 | CC: | fan-wxa, fbranczy, hongkliu, juzhao, mloibl, surbania | ||||||
Target Milestone: | --- | ||||||||
Target Release: | 4.1.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2019-06-04 10:44:14 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Junqi Zhao
2019-02-21 09:35:06 UTC
PVs could be attached, but this fix bring other problems, could not scrape kubelet from worker nodes, " x509: certificate signed by unknown authority" for the 10250/metrics/cadvisor and 10250/metrics targets on worker node See from below, alertmanager-main pods and prometheus-k8s pods are recreated after attaching PVs, and allocated to worker nodes $ oc -n openshift-monitoring get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE alertmanager-main-db-alertmanager-main-0 Bound pvc-f509119d-398e-11e9-8827-0e9060eacf7c 2Gi RWO gp2 62m alertmanager-main-db-alertmanager-main-1 Bound pvc-05d73872-398f-11e9-8827-0e9060eacf7c 2Gi RWO gp2 62m alertmanager-main-db-alertmanager-main-2 Bound pvc-166daa08-398f-11e9-8827-0e9060eacf7c 2Gi RWO gp2 61m prometheus-k8s-db-prometheus-k8s-0 Bound pvc-d16161d8-398e-11e9-8827-0e9060eacf7c 4Gi RWO gp2 63m prometheus-k8s-db-prometheus-k8s-1 Bound pvc-d16637ec-398e-11e9-8827-0e9060eacf7c 4Gi RWO gp2 63m $ oc -n openshift-monitoring get pod -o wide | grep -e alertmanager-main -e prometheus-k8s alertmanager-main-0 3/3 Running 0 116m 10.129.2.34 ip-10-0-174-68.us-east-2.compute.internal <none> alertmanager-main-1 3/3 Running 0 115m 10.128.2.11 ip-10-0-143-223.us-east-2.compute.internal <none> alertmanager-main-2 3/3 Running 0 115m 10.131.0.93 ip-10-0-146-225.us-east-2.compute.internal <none> prometheus-k8s-0 6/6 Running 1 117m 10.128.2.10 ip-10-0-143-223.us-east-2.compute.internal <none> prometheus-k8s-1 6/6 Running 1 117m 10.131.0.92 ip-10-0-146-225.us-east-2.compute.internal <none> $ oc get node -o wide | grep worker | awk '{print $1" "$3" "$6}' ip-10-0-143-223.us-east-2.compute.internal worker 10.0.143.223 ip-10-0-146-225.us-east-2.compute.internal worker 10.0.146.225 ip-10-0-174-68.us-east-2.compute.internal worker 10.0.174.68 See from the picture, "x509: certificate signed by unknown authority" for all the worker nodes BTW, due to Bug 1678645 is not fixed, used following to check targets $ prometheus_route=$(oc -n openshift-monitoring get route | grep prometheus-k8s | awk '{print $2}');curl -k -H "Authorization: Bearer $(oc sa get-token prometheus-k8s -n openshift-monitoring)" https://${prometheus_route}/targets > page_targets.html $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-02-25-194625 True False 6h32m Cluster version is 4.0.0-0.nightly-2019-02-25-194625 RHCOS build: 47.330 Created attachment 1538705 [details]
"x509: certificate signed by unknown authority" for worker nodes
Add info for Comment 3, all targets are UP before attaching PVs for monitoring, there is not error "x509: certificate signed by unknown authority" for the 10250/metrics/cadvisor and 10250/metrics targets on worker node You should be able to `kubectl port-forward` just fine to the Prometheus pod, for testing :) . Looking at the attachment, I find it striking, that this only applies to compute nodes. Did this maybe resolve itself after a few minutes? We may just need to wait for the kubelet serving certs CA to be (re-)mounted. Could you share the Prometheus StatefulSet as well as the content of the "openshift-monitoring/kubelet-serving-ca-bundle" and "openshift-config-managed/kubelet-serving-ca" ConfigMaps? Thanks! For what it's worth, I just tested the exact same thing on an origin cluster, and was not able to reproduce. I feel like what you saw was an unrelated thing to this bug. Created attachment 1538804 [details] info for Comment 6 BTW:PVs already attached to pod, such as volumes: - name: alertmanager-main-db persistentVolumeClaim: claimName: alertmanager-main-db-alertmanager-main-0 I agree having Kubernetes apply defaults (phase: Pending in status and creationTimestamp default) is a bit of a distraction, but the functionality works as expected. Should these beauty marks be an issue please file an RFE that we can schedule for later improvement. The TLS issue is distinct from using/provisioning persistence, and is being tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1683913. Due to all of these facts, I'm moving this concrete issue to modified. (In reply to Frederic Branczyk from comment #18) > I agree having Kubernetes apply defaults (phase: Pending in status and > creationTimestamp default) is a bit of a distraction, but the functionality > works as expected. Should these beauty marks be an issue please file an RFE > that we can schedule for later improvement. The TLS issue is distinct from > using/provisioning persistence, and is being tracked in > https://bugzilla.redhat.com/show_bug.cgi?id=1683913. Due to all of these > facts, I'm moving this concrete issue to modified. Agree, will verify this bug RFE mentioned in Comment 18, please see bug 1684352 Since PVs now could attach to monitoring, close this issue $ for i in $(oc -n openshift-monitoring get pod | grep -e alertmanager-main -e prometheus-k8s | grep -v NAME |awk '{print $1}'); do echo $i; oc -n openshift-monitoring get po $i -oyaml | grep -i claim;done alertmanager-main-0 persistentVolumeClaim: claimName: alertmanager-main-db-alertmanager-main-0 alertmanager-main-1 persistentVolumeClaim: claimName: alertmanager-main-db-alertmanager-main-1 alertmanager-main-2 persistentVolumeClaim: claimName: alertmanager-main-db-alertmanager-main-2 prometheus-k8s-0 persistentVolumeClaim: claimName: prometheus-k8s-db-prometheus-k8s-0 prometheus-k8s-1 persistentVolumeClaim: claimName: prometheus-k8s-db-prometheus-k8s-1 $ oc -n openshift-monitoring get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE alertmanager-main-db-alertmanager-main-0 Bound pvc-2adefe29-3bd1-11e9-8b6c-0ac2ab4d1ff2 2Gi RWO gp2 25m alertmanager-main-db-alertmanager-main-1 Bound pvc-3bb52eb9-3bd1-11e9-8b6c-0ac2ab4d1ff2 2Gi RWO gp2 24m alertmanager-main-db-alertmanager-main-2 Bound pvc-4c427826-3bd1-11e9-8b6c-0ac2ab4d1ff2 2Gi RWO gp2 24m prometheus-k8s-db-prometheus-k8s-0 Bound pvc-3208ccb8-3bd1-11e9-8b6c-0ac2ab4d1ff2 4Gi RWO gp2 24m prometheus-k8s-db-prometheus-k8s-1 Bound pvc-3214392e-3bd1-11e9-8b6c-0ac2ab4d1ff2 4Gi RWO gp2 24m $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-02-27-213933 True False 80m Cluster version is 4.0.0-0.nightly-2019-02-27-213933 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 *** Bug 1801023 has been marked as a duplicate of this bug. *** *** Bug 1801023 has been marked as a duplicate of this bug. *** |