Hide Forgot
Description of problem: When clicking on Upgrade button the host is flipped to maintenance mode and the installation failed. Version-Release number of selected component (if applicable): from RHVH-7.2-20160718.1-RHVH-x86_64-dvd1.iso to redhat-virtualization-host-image-update-4.0-20160812.0.el7_2.noarch How reproducible: Tried just ones Steps to Reproduce: 1. click on Upgrade button inthe UI Actual results: The hypervisor is marked as "Installation Failed" 2016-08-19 05:27:35 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/tmp/ovirt-zuExrQSj1l/pythonlib/otopi/context.py", line 132, in _executeMethod method['method']() File "/tmp/ovirt-zuExrQSj1l/otopi-plugins/otopi/packagers/yumpackager.py", line 216, in _setup with self._miniyum.transaction(): File "/tmp/ovirt-zuExrQSj1l/pythonlib/otopi/miniyum.py", line 336, in __enter__ self._managed.beginTransaction() File "/tmp/ovirt-zuExrQSj1l/pythonlib/otopi/miniyum.py", line 719, in beginTransaction self._yb.doLock() File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 2208, in doLock raise Errors.LockError(0, msg, oldpid) LockError: Existing lock /var/run/yum.pid: another copy is running as pid 947. 2016-08-19 05:27:35 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Environment setup': Existing lock /var/run/yum.pid: another copy is running as pid 947. The installation continues with downloading the image and timeouts on Installing Host 10.34.84.222. Yum obsoleting: 1/2: redhat-virtualization-host-image-update-4.0-20160812.0.el7_2.noarch. Expected results: System is upgraded and flipped to UP state Additional info: First failure is in ovirt-host-mgmt-20160819090742-10.34.84.222-adc53fa.log Second ovirt-host-mgmt-20160819052734-wyt41r.log I have checked if vdsm is running after the first failure and it was not.
Created attachment 1192033 [details] First failure
Created attachment 1192035 [details] Second failure
The RC was the first build whch allowed upgrades, thus it is expected that upgrades form RHVH-7.2-20160718.1-RHVH-x86_64-dvd1.iso to anything do not work. However, the traceback looks unrelated to this fact. Sandro, any idea?
It might be that a previously startde upgrade is still running. Trying to trigger a second update causes this bug. The problem is that updates of Node take quite long.
(In reply to Fabian Deutsch from comment #3) > Sandro, any idea? Nothing more than what you suggested on comment #4 after our discussion.
Moran, we discussed this - We can improve the update speed from node side (bug 1368420), but we can also consider to make updates a long running, async, task, but this would be a change on the engine side. Where should it be moved to?
I upgrade RHVH(same version as comment 0) in RHVM side with the latest RHVM, no such issue. But after upgrade successful, the RHVH status is "Maintenance" in RHVM UI. Test version: 1. RHVH from RHVH-7.2-20160718.1-RHVH-x86_64-dvd1.iso to redhat-virtualization-host-image-update-4.0-20160812.0.el7_2.noarch 2. RHVM Red Hat Virtualization Manager Version: 4.0.6.3-0.1.el7ev Test steps: 1. Install RHVH-7.2-20160718.1-RHVH-x86_64-dvd1.iso 2. Setup local repo in RHVH, and add RHVH to RHVM 3. Click "Upgrade" button in RHVM UI Test results: After step3, upgrade successful. But the RHVH status is "Maintenance" in RHVM UI. Click "Activate" button, RHVH is up successful in RHVM side.
Fabian, please review Comment 7, the test results is different with Comment 0, could you please help to check if this bug has been partly fixed in rhvm side? But there is still a small issue listed in Comment 7.
From the data we have it looks like the root cause is that the rpm we download for the upgrade is large, and it takes a long time to download (and install) it. To reproduce this you need to reduce the bandwidth between the host and the repository (rhn). For comment 7 you could try in step 3 a) subscribe to CDN OR b) use a traffic shaping technique to reduce the bandwidth to simulate a slow internet connection
Huzhao, Please have a try according #c9. Thanks.
(In reply to shaochen from comment #10) > Huzhao, > > Please have a try according #c9. > Thanks. For the record only: I have been trying to reproduce this report and I am unable so far. I will try few more times.
Followed the RHEVM upgrade process available to try to reproduce this report and looks like at this moment we have a good lock conditional. #1) I have tried to simulate this race by REST API script calling upgrade several times during a upgrade task (on low or high network speed should be similar). All times I got the below error which makes sense (no upgrade is possible during an upgrade in progress): host id 92c62349-fdee-466d-9081-aea88274b66a Connecting to: https://192.168.122.5:443/ovirt-engine/api/hosts/92c62349-fdee-466d-9081-aea88274b66a/upgrade HTTP Error 409: Conflict Are you trying to add an existing item? Traceback (most recent call last): File "upgrade-node.py", line 91, in <module> ret = urllib2.urlopen(request, xml_request, context=context) File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "/usr/lib64/python2.7/urllib2.py", line 435, in open response = meth(req, response) File "/usr/lib64/python2.7/urllib2.py", line 548, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib64/python2.7/urllib2.py", line 473, in error return self._call_chain(*args) File "/usr/lib64/python2.7/urllib2.py", line 407, in _call_chain result = func(*args) File "/usr/lib64/python2.7/urllib2.py", line 556, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 409: Conflict Here you can find the script: https://raw.githubusercontent.com/dougsland/ovirt-restapi-scripts/master/upgrade-node.py #2 - By RHEVM Interface, during an upgrade in progress I cannot trigger a new upgrade when clicking several times in the button upgrade: """ Error while executing action: Cannot upgrade Host. Valid Host statuses for upgrade are Up, Maintenance or Non-Operational. """ The only way I was able to reproduce: ======================================= #1) Install redhat-virtualization-host-4.1-20170111.0.x86_64.liveimg.squashfs #2) Create repo with a higher RPM for make upgrade available: # mkdir /var/www/html/host-update # copy redhat-virtualization-host-image-update-4.1-20170202.0.el7_3.noarch.rpm to /var/www/html/host-update # createrepo /var/www/html/ #3) RHEVM machine, set upgrade checks for 1hour # engine-config -s HostPackagesUpdateTimeInHours=1 # Restart ovirt-engine #4) In RHEVM, put the host in maint. (After 1 hour, the upgrade button will appear) #5) Execute an install or upgrade command via yum and do not accept: In the RHEV-H type: yum upgrade (do not accept or use -y) #6) In RHEVM, click in upgrade Result: It will fail as expected, other yum instance is already running. Versions: RHVM: 4.0.4-0.1.el7ev RHVH: From: redhat-virtualization-host-4.1-20170111.0.x86_64.liveimg.squashfs To: redhat-virtualization-host-image-update-4.1-20170202.0.el7_3.noarch.rpm Moving to ON_QA for their analyses.
Douglas, thanks a lot for your testing. And for the Comment 12, I tested with same results with you: During an upgrade in progress I cannot trigger a new upgrade when clicking several times in the button upgrade. So does it mean the issue has been fixed in RHVH-4.1? But the Target Milestone is 4.2.0 and it is ON_QA. So, could you please change the Target Milestone? Or should I verify this bug in RHVH-4.2.0?
Retargeted, thanks.
Test version: 1. RHVH from redhat-virtualization-host-4.1-20170116.0.x86_64.liveimg.squashfs to redhat-virtualization-host-4.1-20170202.0.x86_64.liveimg.squashfs imgbased-0.9.6-0.1.el7ev.noarch 2. RHVM Red Hat Virtualization Manager Version: 4.1.0.4-0.1.el7 Test steps: 1. Install redhat-virtualization-host-4.1-20170116.0 2. Setup local repo in RHVH, and add RHVH to RHVM 3. In RHVM UI, set RHVH to maintenance, and click "Check for Upgrade", the upgrade button will appear 4. Login RHVH, execute an install or upgrade command via yum and do not accept: In the RHEV-H type: #yum update (do not accept or use -y) 5. Click "Upgrade" button in RHVM UI 6. In the RHEV-H, stop and quit #yum update 7. Click "Upgrade" button in RHVM UI Test results: 1. After step5, It will fail as expected, other yum instance is already running. LockError: Existing lock /var/run/yum.pid: another copy is running as pid 22378. 2. After step7, upgrade successful. According to test results here and comment 12, this bug is fixed in RHVH 4.1(redhat-virtualization-host-4.1-20170202.0), change the status to VERIFIED.