Bug 1612783

Summary:	atomic-openshift-node complains "no fc disk information found"
Product:	OpenShift Container Platform	Reporter:	Franck Grosjean <fgrosjea>
Component:	Storage	Assignee:	Jan Safranek <jsafrane>
Status:	CLOSED ERRATA	QA Contact:	Liang Xia <lxia>
Severity:	high	Docs Contact:
Priority:	high
Version:	3.9.0	CC:	aos-bugs, aos-storage-staff, bchilds, fcami, fgrosjea, jsafrane, lxia
Target Milestone:	---	Flags:	lxia: needinfo-
Target Release:	3.9.z
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-12-13 19:27:05 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Franck Grosjean 2018-08-06 09:19:44 UTC

Description of problem:

atomic-openshift-node journal is logging an error every 3 minutes, complaining about an fc mount that is correctly working.

Version-Release number of selected component (if applicable):
3.9.33

How reproducible:

Steps to Reproduce:
1. Create a FC Pv
2. Create a Pvc
3. Create a Pod which use this Pvc

Actual results:
atomic-openshift-node journal is logging an error every 3 minutes, complaining about an fc mount that is correctly working.

Expected results:
No error in logs

Comment 6 Jan Safranek 2018-08-16 09:09:40 UTC

Franck, great analysis. The culprit is unimplemented FC volume reconstruction here: https://github.com/openshift/ose/blob/36e0be5a6226393c5745264bb727dafc496e6d6e/vendor/k8s.io/kubernetes/pkg/volume/fc/fc.go#L231

It returns empty FC{} and newMounter fails with the error you can see.

This message is mostly harmless. It matters only if you restart openshift-node and delete pod that uses the volume when openshift-node is down. Then openshift-node will see a mounted volume, but no pod for it -> it should be unmounted. It will try to reconstruct the volume and it will fail with the error that the customer sees. So the volume will be mounted forever.

It has been fixed upstream in https://github.com/kubernetes/kubernetes/pull/58089 and in 3.10.

Is the customer considering moving to 3.10?

Comment 11 Liang Xia 2018-08-30 08:34:36 UTC

QE setup a OCP cluster with version v3.9.33, and followed the steps in #comment 0, we can see exactly the same errors in node logs as in #comment 1.

Then we download rpms from repo(as in #comment 9), and replaced the openshift binary on the nodes(we can not do via upgrade as package names are different), restart related services and check again, the error message does not appear(in 10 minutes) then.

Comment 16 Liang Xia 2018-11-13 09:44:34 UTC

PR not merged yet.

Comment 29 Liang Xia 2018-11-30 06:10:44 UTC

Needinfo myself to remind this bug need follow up actions.

Comment 32 Liang Xia 2018-12-04 01:42:12 UTC

Move back for dev to debug.

Comment 34 Liang Xia 2018-12-06 03:47:07 UTC

Checked again on a new OCP 3.9 cluster with different FC, the issue described in   this bug does not occur now, move bug to verified. 

# oc version
oc v3.9.57
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://storageqe-05
openshift v3.9.57
kubernetes v1.9.1+a0ce1bc657



Also we can see below info every 81 seconds which indicate that it is working fine.

Dec 05 22:37:58 storageqe-25 atomic-openshift-node[38878]: I1205 22:37:58.042175   38878 desired_state_of_world_populator.go:368] Found bound PV for PVC (ClaimName "default"/"claim-fc" pvcUID 64d07827-f905-11e8-a86b-0026552abd02): pvName="fibrechannel"
Dec 05 22:37:58 storageqe-25 atomic-openshift-node[38878]: I1205 22:37:58.044083   38878 desired_state_of_world_populator.go:386] Extracted volumeSpec (0x28ab610) from bound PV (pvName "fibrechannel") and PVC (ClaimName "default"/"claim-fc" pvcUID 64d07827-f905-11e8-a86b-0026552abd02)
Dec 05 22:37:58 storageqe-25 atomic-openshift-node[38878]: I1205 22:37:58.044123   38878 desired_state_of_world_populator.go:298] Added volume "pvol" (volSpec="fibrechannel") for pod "8caa85f1-f905-11e8-a86b-0026552abd02" to desired state.

Comment 36 errata-xmlrpc 2018-12-13 19:27:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3748