Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1450831

Summary: Failed to upgrade RHVH host on RHVM side
Product: [oVirt] ovirt-node Reporter: Qin Yuan <qiyuan>
Component: Installation & UpdateAssignee: Ryan Barry <rbarry>
Status: CLOSED CURRENTRELEASE QA Contact: Huijuan Zhao <huzhao>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.1CC: bugs, cshao, dguo, huzhao, jiawu, mgoldboi, mperina, pbrilla, qiyuan, rbarry, sbonazzo, stirabos, weiwang, yaniwang, ycui, yzhao
Target Milestone: ovirt-4.1.3Flags: rule-engine: ovirt-4.1+
rule-engine: blocker+
huzhao: testing_plan_complete+
rbarry: devel_ack+
cshao: testing_ack+
Target Release: 4.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: imgbased-0.9.27-0.1.el7ev Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-06 14:04:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1455667    
Bug Blocks:    
Attachments:
Description Flags
/var/log*,/tmp/*
none
Comment 1: All logs in /var/log, /tmp and sosreport from host
none
comment 1: log from engine none

Description Qin Yuan 2017-05-15 09:15:05 UTC
Created attachment 1278892 [details]
/var/log*,/tmp/*

Description of problem:
On RHVM side, upgrade RHVH host failed. There are error messages on RHVM:

Failed to upgrade Host atu_amd (User: admin@internal-authz).
Failed to install Host atu_amd. Processing stopped due to timeout.

Tested 3 dell machines, dell-per510-01(multipath FC), dell-pet105-01(singlepath ISCSI), and dell-per515-01(multipath ISCSI), they all failed with the same reason.
But for ibm-3650m5-04 machine, upgrading host on RHVM side could succeed.


Version-Release number of selected component (if applicable):
From:
redhat-virtualization-host-4.1-20170421.0
To:
redhat-virtualization-host-4.1-20170506.0


How reproducible:
For dell machines, always can reproduce, but no such issue for ibm-3650m5-04.


Steps to Reproduce:
1. Install redhat-virtualization-host-4.1-20170421.0 on a dell machine, such as dell-per515-01.
2. Set local repo, which could lead to upgrade RHVH host to rhvh-4.1-20170506.0
3. Add host to RHVM.
4. Click "Check for Upgrade" on RHVM.
5. Click "Upgrade" on RHVM when it available.


Actual results:
1. After step5, there are failed messages on RHVM:
Failed to upgrade Host atu_amd (User: admin@internal-authz).
Failed to install Host atu_amd. Processing stopped due to timeout.


Expected results:
1. After step5, upgrading RHVH host could succeed.


Additional info:
1. For dell machines, upgrading host to redhat-virtualization-host-4.1-20170421.0 on RHVM side could succeed.

2. For dell machines, upgrading host to redhat-virtualization-host-4.1-20170506.0 on host side, using "yum update", could succeed.

Comment 1 Huijuan Zhao 2017-05-19 09:18:35 UTC
Still encountered this issue in imgbased-0.9.26-0.1.el7ev on machine dell-per515-01(multipath ISCSI).

Test version:
From:
redhat-virtualization-host-4.1-20170421.0
To:
redhat-virtualization-host-4.1-20170518.0
imgbased-0.9.26-0.1.el7ev.noarch

Test steps:
Same as comment 0

Actual results:
1. After step5, there are failed messages on RHVM:
Failed to upgrade Host dell-515-01 (User: admin@internal-authz).
Failed to install Host dell-515-01. Processing stopped due to timeout.

Expected results:
1. After step5, upgrading RHVH host should be successful.


Additional info:
No such issue with machine dell-optiplex-9010(local disk).


So this issue is not fixed in imgbased-0.9.26-0.1.el7ev.noarch, change the status to VERIFIED.

Comment 3 Red Hat Bugzilla Rules Engine 2017-05-19 09:19:47 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 4 Huijuan Zhao 2017-05-19 09:25:06 UTC
Created attachment 1280342 [details]
Comment 1: All logs in /var/log, /tmp and sosreport from host

Comment 5 Huijuan Zhao 2017-05-19 09:26:27 UTC
Created attachment 1280343 [details]
comment 1: log from engine

Comment 6 Huijuan Zhao 2017-05-23 12:15:34 UTC
Still encountered this issue in imgbased-0.9.27-0.1.el7ev, but lower probability compared with imgbased-0.9.26-0.1.el7ev.noarch.


Test version:
From:
redhat-virtualization-host-4.1-20170421.0
To:
redhat-virtualization-host-4.1-20170522.0
imgbased-0.9.27-0.1.el7ev.noarch


Test steps:
Same as comment 0.


Actual results: 
1. There is higher probability to reproduce this issue when install rhvh to more than 3 iSCSI/FC luns
2. Can not reproduce this issue when install rhvh to only 1 iSCSI/FC lun(200GB)
3. Can not reproduce this issue when install rhvh to 1 local disk(1TB SATA or 600GB SAS)


So this bug is not fixed completely in imgbased-0.9.27-0.1.el7ev.noarch, change the status to ASSIGNED.

Comment 7 Red Hat Bugzilla Rules Engine 2017-05-23 12:15:43 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 9 Huijuan Zhao 2017-06-14 10:43:48 UTC
Test version:
From:
redhat-virtualization-host-4.1-20170421.0
To:
redhat-virtualization-host-4.1-20170609.2
imgbased-0.9.31-0.1.el7ev.noarch


Test steps:
Same as comment 0.


Test results: 
1. Upgrade successful in machine with 3 FC disks(300GB, 150GB, 150GB)

2. Upgrade failed same as comment 0 in machine with 21 iscsi disks(200GB, 100GB*20) and 1 local disk(3TB).
   This machine needs about 30 minutes to complete the upgrade process via "#yum update". Please refer to https://bugzilla.redhat.com/show_bug.cgi?id=1368420#c5.


So for the test results 2, seems like more disks need longer upgrade time, and upgrade failed.


Ryan, I am not sure whether QE should verify this bug? Thanks!
Test ENVs already sent to you via email.

Comment 10 Huijuan Zhao 2017-06-26 07:07:22 UTC
According to comment 9, there are still failed scenarios, QE can not verify this bug currently, waiting for Dev's view.

So change the status to MODIFIED.

Comment 11 Ryan Barry 2017-06-26 11:03:50 UTC
From my point of view, this can be VERIFIED, since systems with large number of disks are essentially a separate bug around Anaconda's "put all disks into one VG when autopart is used" logic.

Even with a number of large, time-consuming operations added to imgbased update, we still succeed on one disk. imgbased cannot solve mkfs taking ~20 minutes

Comment 12 Huijuan Zhao 2017-06-27 05:27:57 UTC
(In reply to Ryan Barry from comment #11)
> From my point of view, this can be VERIFIED, since systems with large number
> of disks are essentially a separate bug around Anaconda's "put all disks
> into one VG when autopart is used" logic.
> 
> Even with a number of large, time-consuming operations added to imgbased
> update, we still succeed on one disk. imgbased cannot solve mkfs taking ~20
> minutes

Thanks Ryan, after discussion in QE team, the issue of large number disks consume too much time during upgrade is traced by Bug 1461457.

So according to comment 9 and comment 11, change the status to VERIFIED.