Bug 1612783
Summary: | atomic-openshift-node complains "no fc disk information found" | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Franck Grosjean <fgrosjea> |
Component: | Storage | Assignee: | Jan Safranek <jsafrane> |
Status: | CLOSED ERRATA | QA Contact: | Liang Xia <lxia> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 3.9.0 | CC: | aos-bugs, aos-storage-staff, bchilds, fcami, fgrosjea, jsafrane, lxia |
Target Milestone: | --- | Flags: | lxia:
needinfo-
|
Target Release: | 3.9.z | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-12-13 19:27:05 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Franck Grosjean
2018-08-06 09:19:44 UTC
Franck, great analysis. The culprit is unimplemented FC volume reconstruction here: https://github.com/openshift/ose/blob/36e0be5a6226393c5745264bb727dafc496e6d6e/vendor/k8s.io/kubernetes/pkg/volume/fc/fc.go#L231 It returns empty FC{} and newMounter fails with the error you can see. This message is mostly harmless. It matters only if you restart openshift-node and delete pod that uses the volume when openshift-node is down. Then openshift-node will see a mounted volume, but no pod for it -> it should be unmounted. It will try to reconstruct the volume and it will fail with the error that the customer sees. So the volume will be mounted forever. It has been fixed upstream in https://github.com/kubernetes/kubernetes/pull/58089 and in 3.10. Is the customer considering moving to 3.10? QE setup a OCP cluster with version v3.9.33, and followed the steps in #comment 0, we can see exactly the same errors in node logs as in #comment 1. Then we download rpms from repo(as in #comment 9), and replaced the openshift binary on the nodes(we can not do via upgrade as package names are different), restart related services and check again, the error message does not appear(in 10 minutes) then. PR not merged yet. Needinfo myself to remind this bug need follow up actions. Move back for dev to debug. Checked again on a new OCP 3.9 cluster with different FC, the issue described in this bug does not occur now, move bug to verified. # oc version oc v3.9.57 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://storageqe-05 openshift v3.9.57 kubernetes v1.9.1+a0ce1bc657 Also we can see below info every 81 seconds which indicate that it is working fine. Dec 05 22:37:58 storageqe-25 atomic-openshift-node[38878]: I1205 22:37:58.042175 38878 desired_state_of_world_populator.go:368] Found bound PV for PVC (ClaimName "default"/"claim-fc" pvcUID 64d07827-f905-11e8-a86b-0026552abd02): pvName="fibrechannel" Dec 05 22:37:58 storageqe-25 atomic-openshift-node[38878]: I1205 22:37:58.044083 38878 desired_state_of_world_populator.go:386] Extracted volumeSpec (0x28ab610) from bound PV (pvName "fibrechannel") and PVC (ClaimName "default"/"claim-fc" pvcUID 64d07827-f905-11e8-a86b-0026552abd02) Dec 05 22:37:58 storageqe-25 atomic-openshift-node[38878]: I1205 22:37:58.044123 38878 desired_state_of_world_populator.go:298] Added volume "pvol" (volSpec="fibrechannel") for pod "8caa85f1-f905-11e8-a86b-0026552abd02" to desired state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3748 |