Bug 1368329 - Update takes very long, triggering a second update fails
Summary: Update takes very long, triggering a second update fails
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-node
Classification: oVirt
Component: Installation & Update
Version: 4.0
Hardware: Unspecified
OS: Unspecified
high
high vote
Target Milestone: ovirt-4.1.0-rc
: 4.1
Assignee: Douglas Schilling Landgraf
QA Contact: Huijuan Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-19 05:54 UTC by Roman Hodain
Modified: 2017-02-15 15:03 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-15 15:03:57 UTC
oVirt Team: Node
rule-engine: ovirt-4.1+
rule-engine: planning_ack+
rule-engine: devel_ack+
cshao: testing_ack+


Attachments (Terms of Use)
First failure (70.49 KB, text/plain)
2016-08-19 05:57 UTC, Roman Hodain
no flags Details
Second failure (183.16 KB, text/plain)
2016-08-19 05:58 UTC, Roman Hodain
no flags Details

Description Roman Hodain 2016-08-19 05:54:59 UTC
Description of problem:
When clicking on Upgrade button the host is flipped to maintenance mode and the installation failed.

Version-Release number of selected component (if applicable):
from       RHVH-7.2-20160718.1-RHVH-x86_64-dvd1.iso
to         redhat-virtualization-host-image-update-4.0-20160812.0.el7_2.noarch

How reproducible:
Tried just ones

Steps to Reproduce:
1. click on Upgrade button inthe UI

Actual results:
The hypervisor is marked as "Installation Failed"
2016-08-19 05:27:35 DEBUG otopi.context context._executeMethod:142 method exception
Traceback (most recent call last):
  File "/tmp/ovirt-zuExrQSj1l/pythonlib/otopi/context.py", line 132, in _executeMethod
    method['method']()
  File "/tmp/ovirt-zuExrQSj1l/otopi-plugins/otopi/packagers/yumpackager.py", line 216, in _setup
    with self._miniyum.transaction():
  File "/tmp/ovirt-zuExrQSj1l/pythonlib/otopi/miniyum.py", line 336, in __enter__
    self._managed.beginTransaction()
  File "/tmp/ovirt-zuExrQSj1l/pythonlib/otopi/miniyum.py", line 719, in beginTransaction
    self._yb.doLock()
  File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 2208, in doLock
    raise Errors.LockError(0, msg, oldpid)
LockError: Existing lock /var/run/yum.pid: another copy is running as pid 947.
2016-08-19 05:27:35 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Environment setup': Existing lock /var/run/yum.pid: another copy is running as pid 947.

The installation continues with downloading the image and timeouts on
    Installing Host 10.34.84.222. Yum obsoleting: 1/2: redhat-virtualization-host-image-update-4.0-20160812.0.el7_2.noarch.

Expected results:
System is upgraded and flipped to UP state

Additional info:
First failure is in ovirt-host-mgmt-20160819090742-10.34.84.222-adc53fa.log
Second ovirt-host-mgmt-20160819052734-wyt41r.log
I have checked if vdsm is running after the first failure and it was not.

Comment 1 Roman Hodain 2016-08-19 05:57:08 UTC
Created attachment 1192033 [details]
First failure

Comment 2 Roman Hodain 2016-08-19 05:58:55 UTC
Created attachment 1192035 [details]
Second failure

Comment 3 Fabian Deutsch 2016-08-19 07:49:29 UTC
The RC was the first build whch allowed upgrades, thus it is expected that upgrades form RHVH-7.2-20160718.1-RHVH-x86_64-dvd1.iso to anything do not work.

However, the traceback looks unrelated to this fact.

Sandro, any idea?

Comment 4 Fabian Deutsch 2016-08-19 10:27:34 UTC
It might be that a previously startde upgrade is still running.
Trying to trigger a second update causes this bug.

The problem is that updates of Node take quite long.

Comment 5 Sandro Bonazzola 2016-08-23 13:59:24 UTC
(In reply to Fabian Deutsch from comment #3)
> Sandro, any idea?

Nothing more than what you suggested on comment #4 after our discussion.

Comment 6 Fabian Deutsch 2016-08-24 09:25:38 UTC
Moran, we discussed this - We can improve the update speed from node side (bug 1368420), but we can also consider to make updates a long running, async, task, but this would be a change on the engine side.

Where should it be moved to?

Comment 7 Huijuan Zhao 2016-12-30 06:44:46 UTC
I upgrade RHVH(same version as comment 0) in RHVM side with the latest RHVM, no such issue. But after upgrade successful, the RHVH status is "Maintenance" in RHVM UI.

Test version:
1. RHVH
from       RHVH-7.2-20160718.1-RHVH-x86_64-dvd1.iso
to         redhat-virtualization-host-image-update-4.0-20160812.0.el7_2.noarch
2. RHVM
Red Hat Virtualization Manager Version: 4.0.6.3-0.1.el7ev

Test steps:
1. Install RHVH-7.2-20160718.1-RHVH-x86_64-dvd1.iso
2. Setup local repo in RHVH, and add RHVH to RHVM
3. Click "Upgrade" button in RHVM UI

Test results:
After step3, upgrade successful. 
But the RHVH status is "Maintenance" in RHVM UI. Click "Activate" button, RHVH is up successful in RHVM side.

Comment 8 Huijuan Zhao 2016-12-30 08:14:46 UTC
Fabian, please review Comment 7, the test results is different with Comment 0,  
could you please help to check if this bug has been partly fixed in rhvm side? But there is still a small issue listed in Comment 7.

Comment 9 Fabian Deutsch 2017-01-03 09:16:49 UTC
From the data we have it looks like the root cause is that the rpm we download for the upgrade is large, and it takes a long time to download (and install) it.

To reproduce this you need to reduce the bandwidth between the host and the repository (rhn).

For comment 7 you could try in step 3
a) subscribe to CDN
OR
b) use a traffic shaping technique to reduce the bandwidth to simulate a slow internet connection

Comment 10 cshao 2017-02-07 03:20:35 UTC
Huzhao,

Please have a try according #c9.
Thanks.

Comment 11 Douglas Schilling Landgraf 2017-02-07 06:28:12 UTC
(In reply to shaochen from comment #10)
> Huzhao,
> 
> Please have a try according #c9.
> Thanks.

For the record only:
I have been trying to reproduce this report and I am unable so far. I will try few more times.

Comment 12 Douglas Schilling Landgraf 2017-02-07 23:19:36 UTC
Followed the RHEVM upgrade process available to try to reproduce this report and looks like at this moment we have a good lock conditional.

#1) I have tried to simulate this race by REST API script calling upgrade several times during a upgrade task (on low or high network speed should be similar).

All times I got the below error which makes sense (no upgrade is possible during an upgrade in progress):

host id 92c62349-fdee-466d-9081-aea88274b66a
Connecting to: https://192.168.122.5:443/ovirt-engine/api/hosts/92c62349-fdee-466d-9081-aea88274b66a/upgrade
HTTP Error 409: Conflict
Are you trying to add an existing item?
Traceback (most recent call last):
  File "upgrade-node.py", line 91, in <module>
    ret = urllib2.urlopen(request, xml_request, context=context)
  File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib64/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/usr/lib64/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib64/python2.7/urllib2.py", line 473, in error
    return self._call_chain(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 409: Conflict

Here you can find the script:
https://raw.githubusercontent.com/dougsland/ovirt-restapi-scripts/master/upgrade-node.py


#2 - By RHEVM Interface, during an upgrade in progress I cannot trigger a new upgrade when clicking several times in the button upgrade:

"""
Error while executing action:
Cannot upgrade Host. Valid Host statuses for upgrade are Up, Maintenance or Non-Operational.
"""

The only way I was able to reproduce:
=======================================
#1) Install redhat-virtualization-host-4.1-20170111.0.x86_64.liveimg.squashfs

#2) Create repo with a higher RPM for make upgrade available:
    # mkdir /var/www/html/host-update
    # copy redhat-virtualization-host-image-update-4.1-20170202.0.el7_3.noarch.rpm  to /var/www/html/host-update
    # createrepo /var/www/html/

#3) RHEVM machine, set upgrade checks for 1hour
    # engine-config -s HostPackagesUpdateTimeInHours=1 
    # Restart ovirt-engine

#4) In RHEVM, put the host in maint.
   (After 1 hour, the upgrade button will appear)

#5) Execute an install or upgrade command via yum and do not accept:
    In the RHEV-H type: yum upgrade (do not accept or use -y)

#6) In RHEVM, click in upgrade

Result: It will fail as expected, other yum instance is already running.

Versions:

RHVM: 
4.0.4-0.1.el7ev

RHVH: 
From: redhat-virtualization-host-4.1-20170111.0.x86_64.liveimg.squashfs
To: redhat-virtualization-host-image-update-4.1-20170202.0.el7_3.noarch.rpm 

Moving to ON_QA for their analyses.

Comment 13 Huijuan Zhao 2017-02-08 06:30:16 UTC
Douglas, thanks a lot for your testing.

And for the Comment 12, I tested with same results with you:
During an upgrade in progress I cannot trigger a new upgrade when clicking several times in the button upgrade.

So does it mean the issue has been fixed in RHVH-4.1? But the Target Milestone is 4.2.0 and it is ON_QA. So, could you please change the Target Milestone? Or should I verify this bug in RHVH-4.2.0?

Comment 14 Sandro Bonazzola 2017-02-08 06:40:57 UTC
Retargeted, thanks.

Comment 15 Huijuan Zhao 2017-02-08 09:16:48 UTC
Test version:
1. RHVH
from       
redhat-virtualization-host-4.1-20170116.0.x86_64.liveimg.squashfs
to         
redhat-virtualization-host-4.1-20170202.0.x86_64.liveimg.squashfs
imgbased-0.9.6-0.1.el7ev.noarch
2. RHVM
Red Hat Virtualization Manager Version: 4.1.0.4-0.1.el7

Test steps:
1. Install redhat-virtualization-host-4.1-20170116.0
2. Setup local repo in RHVH, and add RHVH to RHVM
3. In RHVM UI, set RHVH to maintenance, and click "Check for Upgrade", the upgrade button will appear
4. Login RHVH, execute an install or upgrade command via yum and do not accept:
   In the RHEV-H type: #yum update (do not accept or use -y)
5. Click "Upgrade" button in RHVM UI
6. In the RHEV-H, stop and quit #yum update
7. Click "Upgrade" button in RHVM UI  

Test results:
1. After step5, It will fail as expected, other yum instance is already running.
LockError: Existing lock /var/run/yum.pid: another copy is running as pid 22378.
2. After step7, upgrade successful. 

According to test results here and comment 12, this bug is fixed in RHVH 4.1(redhat-virtualization-host-4.1-20170202.0), change the status to VERIFIED.


Note You need to log in before you can comment on or make changes to this bug.