Bug 798635

Summary: 3.1 - getVGInfo returns with partial luns list on domains with more than one lun which causes hsm's to fail in ConnectStorageServer
Product: Red Hat Enterprise Linux 6 Reporter: Dafna Ron <dron>
Component: vdsmAssignee: Eduardo Warszawski <ewarszaw>
Status: CLOSED ERRATA QA Contact: Daniel Paikov <dpaikov>
Severity: urgent Docs Contact:
Priority: medium    
Version: 6.2CC: abaron, aburden, ashoham, bazulay, bsanford, cpelland, danken, dpaikov, hateya, iheim, ilvovsky, jbiddle, lyarwood, mgoldboi, mkrcmari, pvine, ykaul
Target Milestone: rcKeywords: Regression, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: vdsm-4.9.6-41.0 Doc Type: Bug Fix
Doc Text:
Previously, the getVGInfo call would only show a partial list of LUNs when adding storage domains consisting of more than one LUN. Subsequently the HSM would only log into the LUNs returned and HSM hosts became non-operational. Now, storage domains are added correctly and the LUNs are always returned.
Story Points: ---
Clone Of:
: 896505 (view as bug list) Environment:
Last Closed: 2012-12-04 18:54:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 788096, 836663    
Bug Blocks: 896505    
Attachments:
Description Flags
logs none

Description Dafna Ron 2012-02-29 13:07:47 UTC
Created attachment 566546 [details]
logs

Description of problem:

when adding domains consisting of more then one lun, backend will send getVGInfo and results of query will show partial list of the luns. 
as a result, backend will only send ConnectStorageServer to the luns returned and hsm will only log in to the luns returned. 
the hsm hosts will become non-operational since not all of the luns were connected to. 

Version-Release number of selected component (if applicable):

vdsm-4.9-112.7.el6_2.x86_64
lvm2-2.02.87-6.el6.x86_64

How reproducible:

100%

Steps to Reproduce:
1. in a two host cluster, create and attach a new domain consisting of several luns
2.
3.
  
Actual results:

getVGInfo will get a partial list of the luns and we will not log in to all luns on hsm. host will become non-operational 

Expected results:

getVGInfo should get the list of all luns 

Additional info: full logs attached

Comment 1 Haim 2012-02-29 13:12:45 UTC
the following script reproduces the problem without engine:

in order to run it, just replace lun list in createVG command with your (make sure multipath recognizes it)

vdsClient 0 createVG  d55e3a64-0615-47ba-a500-4d81cacff52c 1avihai_lun1-700G41337842,1avihai_lun2-700G41337843,1avihai_newdc_lun1-800G41 && sleep 5 && vdsClient  0 createStorageDomain 3 d55e3a64-0615-47ba-a500-4d81cacff52c avihai-domain `vgs d55e3a64-0615-47ba-a500-4d81cacff52c -o+uuid  |awk '!/VG/{print $8}'` 1 2 && sleep 5 && vdsClient 0 getVGInfo `vgs d55e3a64-0615-47ba-a500-4d81cacff52c -o+uuid  |awk '!/VG/{print $8}'`
1JOAgI-25y9-BdbI-Zfm1-NaJM-UQYi-XOtxxk

Comment 2 Dafna Ron 2012-02-29 14:27:31 UTC
Haim and Eduardo investigated this issue further and it seems like this is cause because of lvm cache issue described in bug 

https://bugzilla.redhat.com/show_bug.cgi?id=788096

However, we need a workaround from vdsm for this issue until lvm bug is fixed since currently users will not be able to create vg's from multiple targets on our hosts without a manual login to the luns.

Comment 3 Bill Sanford 2012-03-16 18:17:03 UTC
I just installed ic155.1 on a RHEL 6.2 GA server. I added two hosts. I then added  ISCSI storage and as soon as the SPM was determined, the other host went non-operational. 

On both hosts are installed vdsm-4.9-112.6.el6_2.x86_64.

Comment 4 Bill Sanford 2012-03-16 18:20:09 UTC
Also the message in RHEV-M was "Host <hostname> cannot access one of the Storage Domains attached to it, or the Data Center object. Setting the Host state to Non-Operational.

Comment 6 Eduardo Warszawski 2012-07-03 15:32:06 UTC
*** Bug 836720 has been marked as a duplicate of this bug. ***

Comment 7 Eduardo Warszawski 2012-07-03 16:13:22 UTC
In getVGInfo the output of pvs is parsed.
This output can be inconsistent due to lvm BZ#836663.

When a device list is passed as an argument for pvs, and not all the devices have metadata in use (as in vdsm), the device order changes the pvs output.
(See below.)

Possible solutions:
1) Require an lvm version including BZ# 836663 fix for use with vdsm.
2) Select the pv with the mda in use as the 1st parameter in vdsm pvs cmd.
2) Parse the output of vgs -o +pv_name instead of the pvs response.

[root@derez ~]# vgs -o pv_name 07622fad-381b-4b1d-b534-f9db364032c2
  PV                                           
  /dev/mapper/3600144f09dbd050000004e1994c60005
  /dev/mapper/3600144f09dbd050000004ddbe989001b
[root@derez ~]# 
[root@derez ~]# pvs -o pv_name,vg_name,pv_attr,pv_mda_used_count /dev/mapper/3600144f09dbd050000004e1994c60005 /dev/mapper/3600144f09dbd050000004ddbe989001b
  PV                                            VG                                   Attr #PMdaUse
  /dev/mapper/3600144f09dbd050000004ddbe989001b 07622fad-381b-4b1d-b534-f9db364032c2 a--         0
  /dev/mapper/3600144f09dbd050000004e1994c60005 07622fad-381b-4b1d-b534-f9db364032c2 a--         2
[root@derez ~]# 
[root@derez ~]# pvs -o pv_name,vg_name,pv_attr,pv_mda_used_count /dev/mapper/3600144f09dbd050000004ddbe989001b /dev/mapper/3600144f09dbd050000004e1994c60005
  PV                                            VG                                   Attr #PMdaUse
  /dev/mapper/3600144f09dbd050000004ddbe989001b                                      a--         0
  /dev/mapper/3600144f09dbd050000004e1994c60005 07622fad-381b-4b1d-b534-f9db364032c2 a--         2

Comment 18 Eduardo Warszawski 2012-10-17 16:51:10 UTC
Change-Id: I9fdf09a2637f096201f094918654e3f52663bc2d

Comment 20 Daniel Paikov 2012-11-06 17:01:53 UTC
Checked on si24.

Comment 22 errata-xmlrpc 2012-12-04 18:54:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-1508.html

Comment 23 Allon Mureinik 2013-01-20 20:48:52 UTC
*** Bug 896505 has been marked as a duplicate of this bug. ***