Bug 825126

Summary: vgs hungs due to the failing mpath
Product: Red Hat Enterprise Linux 6 Reporter: Xiaowei Li <xiaoli>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.2CC: agk, bmarzins, dwysocha, dyasny, heinzm, mbroz, msnitzer, prajnoha, prockai, qcai, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-25 10:53:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vgs.log none

Description Xiaowei Li 2012-05-25 08:04:08 UTC
Description of problem:
if the multipathed devices are failing, lvm command 'vgs' will hung due to the failed multipathed devices. 

Version-Release number of selected component (if applicable):
2.6.32-220.el6.x86_64 #1 SMP Wed Nov 9 08:03:13 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
lvm2-2.02.87-6.el6.x86_64
device-mapper-multipath-0.4.9-46.el6.x86_64
device-mapper-1.02.66-6.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. configure the iscsi initiator to login the iscsi target 
2. enable multipath and create partition on the iscsi lun
>>lvm.conf, only scan the multpathed devices
    filter = [ "a/mpath/" "r/.*/" ]
>>
# multipath -ll
mpathbq (360060e801047103004f2c4b30000001f) dm-2 HITACHI,DF600F
size=200G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 2:0:0:1 sda 8:0    active ready running
# vgs -o+devices
  VG            #PV #LV #SN Attr   VSize   VFree  Devices                     
  iscsi_vg   1   3   0 wz--n- 199.80g 78.80g /dev/mapper/mpathbqp2(0)    
  iscsi_vg   1   3   0 wz--n- 199.80g 78.80g /dev/mapper/mpathbqp2(5120) 
  iscsi_vg   1   3   0 wz--n- 199.80g 78.80g /dev/mapper/mpathbqp2(30720)

3. sysctl kernel.hung_task_timeout_secs=10 ( hung task can be easily to reproduce ) 
4. ifdown the NIC used by the default iscsi iface
5. execute 'vgs'
  
Actual results:
INFO: task vgs:1781 blocked for more than 10 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
....

Expected results:
vgs can report some error about the iscsi_vg but can display other VG(if have) and exit gracefully.

Additional info:

Comment 2 Xiaowei Li 2012-05-25 08:19:03 UTC
Created attachment 586798 [details]
vgs.log

Comment 3 Alasdair Kergon 2012-05-25 08:23:59 UTC
queue_if_no_path is set, so is this not correct behaviour?

- LVM needs to probe the device.  (But see also lvmetad as future alternative.) 

- You configured your system to say: if this device is unavailable and something tries to access it, wait indefinitely.

Comment 4 Peter Rajnoha 2012-05-25 08:35:02 UTC
(In reply to comment #0)
> Expected results:
> vgs can report some error about the iscsi_vg but can display other VG(if
> have) and exit gracefully.

I'd say this is just a misconfiguration - if "error" is expected instead, you should consider using the "error if no path" policy that does exactly that ("no_path_retry=fail" setting).

If you encounter this problem while doing a system shutdown, we already track that by bug #800801 (see also original bug #672530 comment #15).

Comment 5 Xiaowei Li 2012-05-25 10:53:52 UTC
Thanks for the clarification. I got the expected behavior after setting no_path_retry=fail. So close this bug.