Bug 1269338

Summary: RHEV hypervisor installation fails if any of the mapped lun to the server contains partial Volume Group
Product: Red Hat Enterprise Virtualization Manager Reporter: nijin ashok <nashok>
Component: rhev-hypervisor-ngAssignee: Fabian Deutsch <fdeutsch>
Status: CLOSED ERRATA QA Contact: cshao <cshao>
Severity: medium Docs Contact:
Priority: high    
Version: 3.5.4CC: cshao, dfediuck, fdeutsch, lsurette, melewis, mgoldboi, nashok, ycui, ykaul
Target Milestone: ovirt-4.0.0-betaFlags: cshao: testing_plan_complete+
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, assigning disks with partial volume groups to the Red Hat Virtualization Host (RHVH) caused an exception in the installer because LVM returns non-zero exit codes. Now, the -P flag is used for LVM so that the RHVH attempts to work around errors related to partial volume groups in the installer. It is still advised to clean those LUNs or provide complete volume groups.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-23 21:04:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Failed-3luns.png none

Description nijin ashok 2015-10-07 04:36:51 UTC
Description of problem:

The installation of RHEV hypervisor is getting failed if any of the LUN which is mapped to the server contains a partial volume group. The installation failed with error "couldn't find any device with uuid".


Version-Release number of selected component (if applicable):
rhevh-6.7-20150911.0.el6ev

How reproducible:
100%

Steps to Reproduce:
1. Map a LUN to the server which is having a partial volume group
2. Install the RHEV hypervisor in any of the disk other than the LUN which contains partial volume group
3. Installation will fail just after allocating the space.

Actual results:
Installation fails if the mapped LUN contains partial volume group


Expected results:
Installation should succeed

Additional info:

Comment 3 Ryan Barry 2015-10-07 15:08:35 UTC
(In reply to nijin ashok from comment #0)
> Description of problem:
> 
> The installation of RHEV hypervisor is getting failed if any of the LUN
> which is mapped to the server contains a partial volume group. The
> installation failed with error "couldn't find any device with uuid".
> 
> 
> Version-Release number of selected component (if applicable):
> rhevh-6.7-20150911.0.el6ev
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1. Map a LUN to the server which is having a partial volume group
> 2. Install the RHEV hypervisor in any of the disk other than the LUN which
> contains partial volume group
> 3. Installation will fail just after allocating the space.
> 
> Actual results:
> Installation fails if the mapped LUN contains partial volume group
> 
> 
> Expected results:
> Installation should succeed
> 
> Additional info:

After reading the case and the sosreport, the failure to install is unrelated to any number of devices in the /dev directory.

I'll provide a "best effort" patch which attempts to use -P on nondestructive LVM commands (vgs, vgdisplay, pvdisplay, etc) and does not throw an exception if there's a non-zero exit code in this case.

However, since removing/masking the problem LUNs from the customer's system resolves it, there is a clear fix.

I am very hesitant to ignore LVM warnings in potentially destructive cases elsewhere in the installer (removing vgs/pvs/lvs, creating them), and no changes will be made there. If LVM also warns about not being able to find a device in those cases and the installer catches the exception, the recommended solution will be to mask or unmap the problematic LUNs, since any changes there could lead to a destructive overwrite of data the customer does care about.

If it continues to throw exceptions and a behavior change is desired, I'd suggest you open a bug against LVM, since RHEV-H cannot be expected to know the return codes, risks, and behaviors of all the LVM utilities, what is dangerous, and what is simply a warning.

Comment 7 cshao 2015-12-03 10:32:11 UTC
I can't reproduce this issue.

Test version:
rhevh-6.7-20150911.0.el6ev + ovirt-node-3.2.3-20.el6.noarch
rhev-hypervisor6-6.7-20150828.0  +  ovirt-node-3.2.3-20.el6.noarch

Test machine:
multipath fc machine: dell-per510-01

Test scenario 1- auto install.
Test steps:
1. Install RHEV-H on lun 1.
2. Register to RHEV-M
3. Add fc/iscsi storage(lun 2)
4. Reboot
5. Auto install RHEV-H on lun 1.

Test result:
It will report "Device specified in storage_init does not exist"
Automatic installation failed.

Test scenario 2 - tui install.
Test steps:
1. TUI Install RHEV-H on lun 1.
2. Register to RHEV-M
3. Add fc/iscsi storage(lun 2)
4. Reboot
5. TUI install RHEV-H on lun 1.


Test result:
Install can successful.


Test scenario 2 - tui install.
Test steps:
1. TUI Install RHEV-H on lun 1.
2. Register to RHEV-M
3. Add fc/iscsi storage(lun 2)
4. Reboot
5. TUI install RHEV-H on lun 1.


Test result:
Install can successful.


Test scenario 3 - manually create pv.
Test steps:
1. TUI Install RHEV-H on lun 1.
2. Register to RHEV-M
3. Add fc/iscsi storage(lun 2).
4. manually create pv on lun 3.
5. Reboot
5. Auto install RHEV-H on lun 1.


Test result:
It will report "Device specified in storage_init does not exist"
Automatic installation failed.



Hi nashok,

 could you please provide the detail test steps for reproduce this bug?

Thanks!

Comment 8 Fabian Deutsch 2015-12-03 11:20:42 UTC
To my understanding the bug should appear in the following setup:

1. Get a SAN with at least 3 LUNs
2. Create a VG (using RHEL or any other distro) over the first two LUNs (both LUNs become PVs)
3. Try to install RHEV-H on LUN 3

During step 4 the RHEV-H Installer will raise an exception

Comment 9 Fabian Deutsch 2015-12-03 11:21:04 UTC
Instead of a SAN you can also use a host with three (3) disks

Comment 10 cshao 2015-12-04 04:42:27 UTC
yes, the multipath fc machine just has 3 LUNs.

[root@unused admin]# multipath -ll
360050763008084e6e000000000000058 dm-7 IBM,2145
size=100G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 5:0:0:3 sdd 8:48 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 5:0:1:3 sdg 8:96 active ready running
360050763008084e6e000000000000057 dm-6 IBM,2145
size=100G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 5:0:1:2 sdf 8:80 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 5:0:0:2 sdc 8:32 active ready running
36782bcb03cdfa200174636ff055184dc dm-1 DELL,PERC 6/i
size=544G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 4:2:0:0 sda 8:0  active ready running
360050763008084e6e000000000000056 dm-0 IBM,2145
size=200G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 5:0:0:1 sdb 8:16 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 5:0:1:1 sde 8:64 active ready running

[root@unused admin]# pvs
  PV                                              VG                                   Fmt  Attr PSize   PFree 
  /dev/mapper/360050763008084e6e000000000000056p4 HostVG                               lvm2 a--  199.28g 28.00m
  /dev/mapper/360050763008084e6e000000000000057   988c2726-07b7-43bb-adec-f421f7fb3969 lvm2 a--   99.62g 95.75g
[root@unused admin]# vgs
  VG                                   #PV #LV #SN Attr   VSize   VFree 
  988c2726-07b7-43bb-adec-f421f7fb3969   1   6   0 wz--n-  99.62g 95.75g
  HostVG                                 1   4   0 wz--n- 199.28g 28.00m

Test steps:
1. Get a SAN with at least 3 LUNs
2. Create a VG (using RHEL or any other distro) over the first two LUNs (both LUNs become PVs) (360050763008084e6e000000000000056+360050763008084e6e000000000000057)
3. Try to auto install RHEV-H on LUN 3(360050763008084e6e000000000000058)

Test result:
Auto install failed, it will report "HostVG/APPVG exists on separate disk", manual intervention required

Comment 11 cshao 2015-12-04 04:43:58 UTC
Created attachment 1102105 [details]
Failed-3luns.png

Comment 12 Fabian Deutsch 2015-12-04 12:06:48 UTC
My test steps were wrong, they should have been:

1. Get a SAN with at least 3 LUNs
2. Create a VG (using RHEL or any other distro) over the first two LUNs (both LUNs become PVs)
3. (NEW STEP) Blacklist/remove/disconnect one of the first two LUNs to make the VG losse one of it's PVs
4. Try to install RHEV-H on LUN 3

During step 4 the RHEV-H Installer will raise an exception

Comment 13 cshao 2015-12-08 11:26:30 UTC
(In reply to Fabian Deutsch from comment #12)
> My test steps were wrong, they should have been:
> 
> 1. Get a SAN with at least 3 LUNs
> 2. Create a VG (using RHEL or any other distro) over the first two LUNs
> (both LUNs become PVs)
> 3. (NEW STEP) Blacklist/remove/disconnect one of the first two LUNs to make
> the VG losse one of it's PVs
> 4. Try to install RHEV-H on LUN 3
> 
> During step 4 the RHEV-H Installer will raise an exception

Still can't reproduce this issue with above steps.

It still report "HostVG/APPVG exists on separate disk", manual intervention required.

Comment 14 Fabian Deutsch 2015-12-08 16:38:35 UTC
Please try it with a VG name different than HostVG.

Comment 15 cshao 2015-12-10 08:25:00 UTC
(In reply to Fabian Deutsch from comment #14)
> Please try it with a VG name different than HostVG.

Reproduce it now.

Test version:
rhevh-6.7-20150911.0.el6ev
ovirt-node-3.2.3-20.el6.noarch

Test steps:
1. Insert USB disk to local machine.
2. Create a VG over the two disks (disk 1 + disk 2)
# pvcreate /dev/sdb;  pvcreate /dev/sdc; 
# vgcreate testVG /dev/sdb /dev/sdc
# lvcreate -L3000 testlv testVG

3. Pull one of the disk(disk 1) to make the VG losse one of it's PVs
4. Try to install RHEV-H on disk 2.

Thanks!

Comment 17 Fabian Deutsch 2015-12-23 13:49:21 UTC
This is non critical, thus moving it out to a z-stream

Comment 21 Fabian Deutsch 2016-05-17 09:26:43 UTC
Chen,

could you please check how anaconda behaves with partial volume groups?

Comment 22 cshao 2016-05-18 10:22:09 UTC
OK, I will find time to trace this issue.

Comment 23 cshao 2016-05-19 11:07:13 UTC
(In reply to Fabian Deutsch from comment #21)
> Chen,
> 
> could you please check how anaconda behaves with partial volume groups?


Anaconda can handle this correctly.

Test version:
rhev-hypervisor7-ng-3.6-20160518.0
imgbased-0.6-0.1.el7ev.noarch


Test steps:
1. Insert USB disk to local machine.
2. Anaconda interactive install NGN.
3. Press ctrl+alt=F2 enter shell mode
4. Create a VG over the two disks (disk 1 + disk 2)
# pvcreate /dev/sdb;  pvcreate /dev/sdc; 
# vgcreate testVG /dev/sdb /dev/sdc
# lvcreate -L3000 testVG
5. Pull one of the disk(disk 1) to make the VG losse one of it's PVs
6. Anaconda interactive install NGN on disk 2.

Test result:
Anaconda interactive install NGN on disk 2 can successful.

Comment 24 Fabian Deutsch 2016-05-19 11:22:10 UTC
Does anaconda raise a warning or alike?

At least I am moving this bug to ON_QA according to comment 23

Comment 25 cshao 2016-05-20 01:27:35 UTC
(In reply to Fabian Deutsch from comment #24)
> Does anaconda raise a warning or alike?
> 
> At least I am moving this bug to ON_QA according to comment 23

No warning, anaconda can remove all partial volume group smoothly.

Comment 27 cshao 2016-05-31 06:03:17 UTC
Test version:
rhev-hypervisor7-ng-4.0-20160527.0 
imgbased-0.7.0-0.1.el7ev.noarch


Test steps:
1. Insert USB disk to local machine.
2. Anaconda interactive install NGN.
3. Press ctrl+alt=F2 enter shell mode
4. Create a VG over the two disks (disk 1 + disk 2)
# pvcreate /dev/sdb;  pvcreate /dev/sdc; 
# vgcreate testVG /dev/sdb /dev/sdc
# lvcreate -L3000 testVG
5. Pull one of the disk(disk 1) to make the VG losse one of it's PVs
6. Anaconda interactive install NGN on disk 2.

Test result:
Anaconda interactive install NGN on disk 2 can successful.

So the bug is fixed, change bug status to VERIFIED.

Comment 29 errata-xmlrpc 2016-08-23 21:04:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1702.html