Bug 1672492

Summary: [Tracking] [OCS 3.11.1]: Few block PVCs fails to mount with the error 'failed to get any path for iscsi disk'
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Ashmitha Ambastha <asambast>
Component: gluster-blockAssignee: Xiubo Li <xiubli>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Ashmitha Ambastha <asambast>
Severity: high Docs Contact:
Priority: unspecified    
Version: ocs-3.11CC: asambast, hchiramm, jahernan, knarra, kramdoss, madam, pkarampu, pprakash, prasanna.kalever, rhs-bugs, rtalur, sankarshan, sselvan, vbellur, xiubli
Target Milestone: ---Keywords: ZStream
Target Release: OCS 3.11.z Batch Update 4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1701858 (view as bug list) Environment:
Last Closed: 2019-07-22 10:39:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1701858    
Bug Blocks: 1707226    

Description Ashmitha Ambastha 2019-02-05 06:55:52 UTC
Description of problem:

On a 4 node OCS 3.11.1 cluster, created 50 Cirros app pods. Out of the 50 app pods, only 20 were in Running and 1/1 state. The rest, faced an issue of FailedMount because it failed to get any path for iscsi disk. 


Version-Release number of selected component (if applicable): OCS 3.11.1

How reproducible: On 25-30 block backed Cirros pods 

Steps to Reproduce:
1. Create a setup OCP 3.11(live) and OCS 3.11.1 builds. 
2. Create 50 Cirros pods and PVCs. All the 50 PVCs were in Bound state. But only 20 pods were in running state after deleting and restarting the pod creation thrice. 

Actual results: The mount failed and the cirros pods didn't come up. 

Expected results: The mount should pass and the cirros pods should come up.

Comment 3 Ashmitha Ambastha 2019-02-05 07:13:12 UTC
The pod has not come up at all even after trying to delete and recreate it.

Comment 4 Prasanna Kumar Kalever 2019-02-05 09:54:00 UTC
Can you please attach the sos-reports and other respective logs, such as block volume name, targetcli ls output and etc ..

Comment 12 Ashmitha Ambastha 2019-02-11 09:20:26 UTC
Hi, 

I've attached the gluster block prov pod logs, list of all necessary versions and the oc describe of the pod where the issue was seen. While taking sosreports of the gluster nodes, I was facing this error, 

 [plugin:lvm2] command 'pvs -a -v -o +pv_mda_free,pv_mda_size,pv_mda_count,pv_mda_used_count,pe_start --config="global{locking_type=0}"' timed out after 300s

and hence was not able to collect all the sosreports. 

Xiubo is looking into the set up now.

Comment 15 Prasanna Kumar Kalever 2019-03-05 07:28:29 UTC
Ashmitha, do we have any update here ?

Comment 16 Ashmitha Ambastha 2019-03-15 08:06:26 UTC
Prasanna, while I've not been able to hit the issue with the pkg which Xuibo has added. And have shared setups with the issue to Xuibo. The set up gets ruined with Nodes not being ready while collecting logs. This is the reason I'm unable to add logs to this bug. I have not tried this on a fresh set up as for 3.11.2 testings.

Comment 19 Prasanna Kumar Kalever 2019-03-28 05:43:55 UTC
Hello Ashmitha,

By any chance are these BHV's used by block volumes here, got created on lower version than OCS-3.11.1 ?

Thanks!

Comment 22 RamaKasturi 2019-03-28 17:23:54 UTC
Hello Vignesh,

   IIRC, you saying that once the custom packages which has some debug logs added we are no more able to reproduce the issue. Is that true ? If yes, can you please update the bug on how we are updating the custom package given in the setup ?

Thanks
kasturi