Bug 1612783 - atomic-openshift-node complains "no fc disk information found"
Summary: atomic-openshift-node complains "no fc disk information found"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.9.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 3.9.z
Assignee: Jan Safranek
QA Contact: Liang Xia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-06 09:19 UTC by Franck Grosjean
Modified: 2019-03-11 17:06 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-12-13 19:27:05 UTC
Target Upstream Version:
Embargoed:
lxia: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:3748 0 None None None 2018-12-13 19:27:21 UTC

Description Franck Grosjean 2018-08-06 09:19:44 UTC
Description of problem:

atomic-openshift-node journal is logging an error every 3 minutes, complaining about an fc mount that is correctly working.

Version-Release number of selected component (if applicable):
3.9.33

How reproducible:

Steps to Reproduce:
1. Create a FC Pv
2. Create a Pvc
3. Create a Pod which use this Pvc

Actual results:
atomic-openshift-node journal is logging an error every 3 minutes, complaining about an fc mount that is correctly working.

Expected results:
No error in logs

Comment 6 Jan Safranek 2018-08-16 09:09:40 UTC
Franck, great analysis. The culprit is unimplemented FC volume reconstruction here: https://github.com/openshift/ose/blob/36e0be5a6226393c5745264bb727dafc496e6d6e/vendor/k8s.io/kubernetes/pkg/volume/fc/fc.go#L231

It returns empty FC{} and newMounter fails with the error you can see.

This message is mostly harmless. It matters only if you restart openshift-node and delete pod that uses the volume when openshift-node is down. Then openshift-node will see a mounted volume, but no pod for it -> it should be unmounted. It will try to reconstruct the volume and it will fail with the error that the customer sees. So the volume will be mounted forever.

It has been fixed upstream in https://github.com/kubernetes/kubernetes/pull/58089 and in 3.10.

Is the customer considering moving to 3.10?

Comment 11 Liang Xia 2018-08-30 08:34:36 UTC
QE setup a OCP cluster with version v3.9.33, and followed the steps in #comment 0, we can see exactly the same errors in node logs as in #comment 1.

Then we download rpms from repo(as in #comment 9), and replaced the openshift binary on the nodes(we can not do via upgrade as package names are different), restart related services and check again, the error message does not appear(in 10 minutes) then.

Comment 16 Liang Xia 2018-11-13 09:44:34 UTC
PR not merged yet.

Comment 29 Liang Xia 2018-11-30 06:10:44 UTC
Needinfo myself to remind this bug need follow up actions.

Comment 32 Liang Xia 2018-12-04 01:42:12 UTC
Move back for dev to debug.

Comment 34 Liang Xia 2018-12-06 03:47:07 UTC
Checked again on a new OCP 3.9 cluster with different FC, the issue described in   this bug does not occur now, move bug to verified. 

# oc version
oc v3.9.57
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://storageqe-05
openshift v3.9.57
kubernetes v1.9.1+a0ce1bc657



Also we can see below info every 81 seconds which indicate that it is working fine.

Dec 05 22:37:58 storageqe-25 atomic-openshift-node[38878]: I1205 22:37:58.042175   38878 desired_state_of_world_populator.go:368] Found bound PV for PVC (ClaimName "default"/"claim-fc" pvcUID 64d07827-f905-11e8-a86b-0026552abd02): pvName="fibrechannel"
Dec 05 22:37:58 storageqe-25 atomic-openshift-node[38878]: I1205 22:37:58.044083   38878 desired_state_of_world_populator.go:386] Extracted volumeSpec (0x28ab610) from bound PV (pvName "fibrechannel") and PVC (ClaimName "default"/"claim-fc" pvcUID 64d07827-f905-11e8-a86b-0026552abd02)
Dec 05 22:37:58 storageqe-25 atomic-openshift-node[38878]: I1205 22:37:58.044123   38878 desired_state_of_world_populator.go:298] Added volume "pvol" (volSpec="fibrechannel") for pod "8caa85f1-f905-11e8-a86b-0026552abd02" to desired state.

Comment 36 errata-xmlrpc 2018-12-13 19:27:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3748


Note You need to log in before you can comment on or make changes to this bug.