Bug 1669560

Summary:	[3.11] mounting fails with multipath iscsi when one path is down
Product:	OpenShift Container Platform	Reporter:	Niels de Vos <ndevos>
Component:	Storage	Assignee:	Jan Safranek <jsafrane>
Status:	CLOSED ERRATA	QA Contact:	Chao Yang <chaoyang>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.11.0	CC:	aos-bugs, aos-storage-staff, hchiramm, jsafrane, ndevos, rgeorge, rhs-bugs, sankarshan, sponnaga
Target Milestone:	---	Keywords:	Regression, ZStream
Target Release:	3.11.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1669403
Clones:	1680012 (view as bug list)		Environment:
Last Closed:	2019-04-11 05:38:26 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1669403, 1680012

Comment 1 Humble Chirammal 2019-01-27 06:55:23 UTC

--snip-- from parent/depend bug#1669403.

I did a quick look at these bz comments and have few questions to isolate this issue better.

I apologize that, I didnt get time to go through the attached logs in detail due to pto or in transit.

Karthick/Rachael,

In which OCP 3.11 build we are seeing this issue ?

Are we hitting this issue on OCP 3.10 as well?

Have we seen this issue or have we tested this case against OCP 3.10 or OCP 3.11 builds available in last couple of months ?
or Is this the first time we are testing this case against OCP 3.10 or OCP 3.11 builds ?

Because, the upstream iscsi plugin has gone through good amount of changes in last few months.

We have brought some of the upstream PRs (bit heavy in code change) in OCP 3.10 and OCP 3.11 builds.

For ex:discovery/deletion of iscsi block devices,Retry attaching multipath iSCSI volumes, Add wait loop for multipath devices to appear,Fix iSCSI panic in DetachDisk..etc.

But, these changes were/are part of OCP 3.10 and OCP 3.11 builds for >=couple of months now!

OCP 3.11 errata: https://access.redhat.com/errata/RHBA-2018:3537 has these fixes. ( Published on 2018-11-20 and build is 3.11.43)
OCP 3.10 errata: https://access.redhat.com/errata/RHSA-2018:2709 has these fixes. ( Published on 2018-11-11 and build is 3.10.66)

Please note that, these fixes were available early internal builds any way, for ex: ocp 3.10.54-1 got these fixes for OCP 3.10.

If we have tested this case against any of the early builds mentioned above and it was working, the isolation becomes very easy. Because, I remember we were testing gluster block stability against various builds of OCP 3.10 and OCP 3.11 till last week and this is the first report of the failure.

Its also (very) difficult to look into the correct code ( as it has various changes based on the builds) unless we know the exact build of this issue and other details on above.

Some more questions:

IIUC, pod was running successfully and we brought down one target. The mpath detected this change and pod **continued** to work as expected , Isnt it ?

Then for a negative scenario, we 'respin' the pod. Can you let me know the exact step followed to respin the pod?

Considering that, the pod was in 'create' status, it continued its attempts to come up and landed on "crashloopbackstate" ?

Did the scheduler try to push this pod to a new NODE ? If yes, did the pod come up on new node ?

Were there any other pods on this problematic node using the same PVC ?

[Self note/Technical details for RCA/fix]

IMO, attach disk fails when we collect portalhostmap for the mentioned target from the host buses.
It fails because of existing session entry in sysfs while collecting the map. Attach disk failure
soon after we fail while collecting sessions address is problematic at certain scenarios.
One other thought is that, the teardown of the mount and wiping of the session entries should have happened
while detach or pod teardown scenario. May be it attempted and failed or a race between the teardown/setup.
Some more details from above mentioned questions and logs/reproducer should confirm this thought.

https://github.com/kubernetes/kubernetes/pull/63176 -> May be the concerns which are left out about race
condition is causing this issue.

ps # ps# I am unavailable next week and lets continue on bz#1669560 which has opened against openshift storage.
Please include the update to this comment in 1669560 , so that everyone is in same page and we dont miss
the critical information.
--/snip--

Comment 6 Jan Safranek 2019-02-20 12:56:39 UTC

I can't reproduce it with v3.11.86. Kubelet tries to log into shutdown node several times and times out, but it gives up and continues with just 2 paths. My pod is running in ~45 seconds.

Can you please retry with current OCP version and leave your cluster running for investigation? Find me on IRC in #aos in CET (UTC+1) business hours.

Comment 7 Jan Safranek 2019-02-20 13:08:55 UTC

I admin that iscsi volume plugin may have issues with this:

> IMO, attach disk fails when we collect portalhostmap for the mentioned
> target from the host buses. 
> It fails because of existing session entry in sysfs  while collecting the
> map. Attach disk failure
> soon after we fail while collecting sessions address is problematic at
> certain scenarios.
> One other thought is that, the teardown of the mount and wiping of the
> session entries should have happened
> while detach or pod teardown scenario. May be it attempted and failed or a
> race between the teardown/setup. 
> Some more details from above mentioned questions and logs/reproducer should
> confirm this thought. 

But the plugin should never setup and teardown a single volume in parallel. It does things in sequence. How did you reach such state? Simple shutdown of gluster node + deletion of pod was not enough for me.

Comment 9 Jan Safranek 2019-02-20 17:06:45 UTC

Upstream PR: https://github.com/kubernetes/kubernetes/pull/74306

Comment 10 Humble Chirammal 2019-02-22 04:27:41 UTC

(In reply to Jan Safranek from comment #7)
> I admin that iscsi volume plugin may have issues with this:
> 
> > IMO, attach disk fails when we collect portalhostmap for the mentioned
> > target from the host buses. 
> > It fails because of existing session entry in sysfs  while collecting the
> > map. Attach disk failure
> > soon after we fail while collecting sessions address is problematic at
> > certain scenarios.
> > One other thought is that, the teardown of the mount and wiping of the
> > session entries should have happened
> > while detach or pod teardown scenario. May be it attempted and failed or a
> > race between the teardown/setup. 
> > Some more details from above mentioned questions and logs/reproducer should
> > confirm this thought. 
> 
> But the plugin should never setup and teardown a single volume in parallel.
> It does things in sequence. How did you reach such state? Simple shutdown of
> gluster node + deletion of pod was not enough for me.

Yeah, it looks to me a RACE as mentioned above.

Comment 11 Jan Safranek 2019-02-22 14:37:20 UTC

Origin 3.11 PR: https://github.com/openshift/origin/pull/22137

Comment 14 Jan Safranek 2019-03-15 08:57:43 UTC

> After 10.0.77.71 is down, “TargetPortal" in pv info is still 10.0.77.71, do we need to update the pv info?

This is ok. PV contains long-term properties of the PV, not what is actually used in a pod (and there is no Kubernetes API to get that). When 10.0.77.71 comes back to life, it may be still used when a new pod is started.

Comment 15 Chao Yang 2019-03-15 09:12:39 UTC

Based on the above comments,update the bug status to "verified".

Comment 17 errata-xmlrpc 2019-04-11 05:38:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0636