Description of problem: atomic-openshift-node journal is logging an error every 3 minutes, complaining about an fc mount that is correctly working. Version-Release number of selected component (if applicable): 3.9.33 How reproducible: Steps to Reproduce: 1. Create a FC Pv 2. Create a Pvc 3. Create a Pod which use this Pvc Actual results: atomic-openshift-node journal is logging an error every 3 minutes, complaining about an fc mount that is correctly working. Expected results: No error in logs
Franck, great analysis. The culprit is unimplemented FC volume reconstruction here: https://github.com/openshift/ose/blob/36e0be5a6226393c5745264bb727dafc496e6d6e/vendor/k8s.io/kubernetes/pkg/volume/fc/fc.go#L231 It returns empty FC{} and newMounter fails with the error you can see. This message is mostly harmless. It matters only if you restart openshift-node and delete pod that uses the volume when openshift-node is down. Then openshift-node will see a mounted volume, but no pod for it -> it should be unmounted. It will try to reconstruct the volume and it will fail with the error that the customer sees. So the volume will be mounted forever. It has been fixed upstream in https://github.com/kubernetes/kubernetes/pull/58089 and in 3.10. Is the customer considering moving to 3.10?
QE setup a OCP cluster with version v3.9.33, and followed the steps in #comment 0, we can see exactly the same errors in node logs as in #comment 1. Then we download rpms from repo(as in #comment 9), and replaced the openshift binary on the nodes(we can not do via upgrade as package names are different), restart related services and check again, the error message does not appear(in 10 minutes) then.
PR not merged yet.
Needinfo myself to remind this bug need follow up actions.
Move back for dev to debug.
Checked again on a new OCP 3.9 cluster with different FC, the issue described in this bug does not occur now, move bug to verified. # oc version oc v3.9.57 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://storageqe-05 openshift v3.9.57 kubernetes v1.9.1+a0ce1bc657 Also we can see below info every 81 seconds which indicate that it is working fine. Dec 05 22:37:58 storageqe-25 atomic-openshift-node[38878]: I1205 22:37:58.042175 38878 desired_state_of_world_populator.go:368] Found bound PV for PVC (ClaimName "default"/"claim-fc" pvcUID 64d07827-f905-11e8-a86b-0026552abd02): pvName="fibrechannel" Dec 05 22:37:58 storageqe-25 atomic-openshift-node[38878]: I1205 22:37:58.044083 38878 desired_state_of_world_populator.go:386] Extracted volumeSpec (0x28ab610) from bound PV (pvName "fibrechannel") and PVC (ClaimName "default"/"claim-fc" pvcUID 64d07827-f905-11e8-a86b-0026552abd02) Dec 05 22:37:58 storageqe-25 atomic-openshift-node[38878]: I1205 22:37:58.044123 38878 desired_state_of_world_populator.go:298] Added volume "pvol" (volSpec="fibrechannel") for pod "8caa85f1-f905-11e8-a86b-0026552abd02" to desired state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3748