Bug 1204212

Summary: Half of the paths to HostVG lost post-upgrade of the rhev-h host
Product: Red Hat Enterprise Virtualization Manager Reporter: akotov
Component: ovirt-nodeAssignee: Fabian Deutsch <fdeutsch>
Status: CLOSED INSUFFICIENT_DATA QA Contact: cshao <cshao>
Severity: high Docs Contact:
Priority: high    
Version: 3.5.0CC: amureini, bmarzins, cshao, ecohen, fdeutsch, gklein, jbuchta, lsurette, mgoldboi, pstehlik, rhodain, ycui, yeylon
Target Milestone: ovirt-3.6.2Keywords: OtherQA
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: node
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-02 06:37:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1051742, 1235965    
Bug Blocks:    

Description akotov 2015-03-20 15:37:22 UTC
Description of problem:

After performing standard upgrade procedure for RHEV-H host, 2 paths out of 4 were missing. No reboot or scsi-rescan commands were able to reveal the paths.
 
AFTER UPGRADE PROCEDURE:

Mar 18 10:20:47 | *word = A, len = 1
Mar 18 10:20:47 | *word = 0, len = 1
360060e8007e2b9000030e2b90000xxxx dm-8 HITACHI,OPEN-V
size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 0:0:0:0  sda  8:0     active ready running
  `- 1:0:0:0  sdk  8:160   active ready running
Mar 18 10:20:47 | params = 1 queue_if_no_path 0 1 1 round-robin 0 4 1 65:96 1 130:128 1 69:80 1 134:112 1 

AFTER CLEAN REINSTALL OF SAME HOST:

360060e8007e2b9000030e2b90000xxxx dm-15 HITACHI,OPEN-V
size=500G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 1:0:0:0  sda  8:0     active ready running
  |- 4:0:0:0  sdeg 128:128 active ready running
  |- 0:0:0:0  sdjw 65:416  active ready running
  `- 5:0:0:0  sdeq 129:32  active ready running



Version-Release number of selected component (if applicable):

Red Hat Enterprise Virtualization Hypervisor 6.6 (20150128.0.el6ev)

How reproducible:

Unknown, do not have the h/w for reproducer yet

Steps to Reproduce:
1. Upgrade rhev-h

Actual results:

2 paths for HostVG

Expected results:

4 paths for HostVG

Additional info:

Noticeable difference is the addition of the mpath.wwid parameter to the grub configuration after the clean-reinstall was complete: 

mpath.wwid=360060e8007e2b9000030e2b90000xxxx

Comment 4 Fabian Deutsch 2015-04-01 14:05:29 UTC
The post-upgrade sosreport from comment 1 does only shows messages from a RHEV-H 6.5 runtime.

It would be necessary to get the logs from the RHEV-H 6.6 runtime, after the upgrade.

Thanks to the findings of Ben, it is quite clear that the mpath.wwid= argument is missing from the kernel commandline.

This argument is normally getting added when a user upgrades from 6.5 to 6.6.

n case that the line is missing it is a bug, but it can be fixed manually.

Alexander, can you provide the logs after the upgrade to 6.6?

Comment 5 akotov 2015-04-06 09:35:11 UTC
Fabian, i see that first sosreport is from 6.6 runtime

[cash@dhcp-26-166 mtafscpq2800bh01-2015031810181426673917]$ cat etc/redhat-release 
Red Hat Enterprise Virtualization Hypervisor 6.6 (20150128.0.el6ev)


$ cat etc/multipath/wwids  | grep 360060e8007e2b9000030e2b900001003
/360060e8007e2b9000030e2b900001003/
[cash@dhcp-26-166 mtafscpq2800bh01-2015031810181426673917]$ pwd
/home/cash/sos/01385391/mtafscpq2800bh01-2015031810181426673917

I also want to add, that customer tried to add boot LUN  (360060e8007e2b9000030e2b900001003) back into /etc/multipath/wwids - it was missing after the upgrade, cleared the LVM cache (/etc/lvm/cache/.cache), and rebooted the hypervisor. It did not resolve the issue, only fresh reinstall and adding wwid to kernel cmdline fixed it.

Comment 6 Ben Marzinski 2015-04-08 20:21:37 UTC
(In reply to akotov from comment #5)
> I also want to add, that customer tried to add boot LUN 
> (360060e8007e2b9000030e2b900001003) back into /etc/multipath/wwids - it was
> missing after the upgrade, cleared the LVM cache (/etc/lvm/cache/.cache),
> and rebooted the hypervisor. It did not resolve the issue, only fresh
> reinstall and adding wwid to kernel cmdline fixed it.

The issues with adding it the the wwids file is that if those paths are claimed in the initramfs and the wwids file there doesn't have it, it won't help.  You would have to either remake the initramfs, or edit the kernel comdline to make the wwid appear in the initramfs.

Comment 18 Ying Cui 2015-11-24 08:27:41 UTC
Chen, could we check this bug description and try to reproduce this issue on our test env.?

Comment 19 cshao 2015-11-27 02:47:04 UTC
(In reply to Ying Cui from comment #18)
> Chen, could we check this bug description and try to reproduce this issue on
> our test env.?

Hi ycui, 

 Seem the machine that have NetApp FC Storage(2 paths) + Emulex HBA can't access now, I will send ticket to admin to ask them fix this issue asap, and then I will try to reproduce this issue on our test env.

Thanks!

Comment 20 Yaniv Kaul 2015-11-29 11:58:52 UTC
Still needinfo on QE to reproduce.

Comment 21 cshao 2015-11-30 07:06:38 UTC
Still can't reproduce this issue on VIRT-QE ENV.

Test version:
RHEV-H 6.5 20150115 + ovirt-node-3.0.1-19.el6_5.18.noarch 
RHEV-H 6.6 20150128.0.el6ev + ovirt-node-3.2.1-6.el6.noarch

Test machine:
dell-per510-01 multipath FC
NetApp FC Storage(2 paths) + Emulex HBA.


Test steps:
1. Install RHEV-H 6.5 20150115.
2. Upgrade to RHEV-H 6.6 20150128.0.el6ev
3. Check all paths

Test result:
Before upgrade
# cat /etc/redhat-release 
Red Hat Enterprise Virtualization Hypervisor release 6.5 (20150115.0.el6ev)
[root@unused admin]# multipath -ll
360050763008084e6e000000000000058 dm-1 IBM,2145
size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 1:0:1:3 sdg 8:96 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 1:0:0:3 sdd 8:48 active ready running
360050763008084e6e000000000000057 dm-0 IBM,2145
size=100G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 1:0:0:2 sdc 8:32 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 1:0:1:2 sdf 8:80 active ready running
36782bcb03cdfa200174636ff055184dc dm-7 DELL,PERC 6/i
size=544G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 0:2:0:0 sda 8:0  active ready running
360050763008084e6e000000000000056 dm-2 IBM,2145
size=200G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 1:0:1:1 sde 8:64 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 1:0:0:1 sdb 8:16 active ready running


After upgrade
# cat /etc/redhat-release 
Red Hat Enterprise Virtualization Hypervisor 6.6 (20150128.0.el6ev) 
[root@unused admin]# 
[root@unused admin]# multipath -ll
360050763008084e6e000000000000058 dm-1 IBM,2145
size=100G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 0:0:1:3 sdf 8:80 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 0:0:0:3 sdc 8:32 active ready running
360050763008084e6e000000000000057 dm-4 IBM,2145
size=100G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 0:0:0:2 sdb 8:16 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 0:0:1:2 sde 8:64 active ready running
36782bcb03cdfa200174636ff055184dc dm-12 DELL,PERC 6/i
size=544G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 2:2:0:0 sdg 8:96 active ready running
360050763008084e6e000000000000056 dm-0 IBM,2145
size=200G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 0:0:1:1 sdd 8:48 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 0:0:0:1 sda 8:0  active ready running