Bug 1200467 - Vdsm upgrade 3.4 >> 3.5.1 doesn't restart vdsmd service
Summary: Vdsm upgrade 3.4 >> 3.5.1 doesn't restart vdsmd service
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.5.1
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ovirt-3.6.0-rc
: 3.6.0
Assignee: Yaniv Bronhaim
QA Contact: Petr Kubica
URL:
Whiteboard:
Depends On:
Blocks: 1208752
TreeView+ depends on / blocked
 
Reported: 2015-03-10 15:50 UTC by Michael Burman
Modified: 2016-03-09 19:33 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1208752 (view as bug list)
Environment:
Last Closed: 2016-03-09 19:33:17 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Logs (393.23 KB, application/x-gzip)
2015-03-10 15:50 UTC, Michael Burman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0362 0 normal SHIPPED_LIVE vdsm 3.6.0 bug fix and enhancement update 2016-03-09 23:49:32 UTC
oVirt gerrit 39408 0 ovirt-3.5 MERGED Hack vdsm >=4.16.x for vdsm-tool import issues during upgrade Never
oVirt gerrit 39410 0 master MERGED Hack vdsm >=4.16.x for vdsm-tool import issues during upgrade Never
oVirt gerrit 39496 0 ovirt-3.5 MERGED Using vdsm-tool restore-conf in init script instead of direct call Never
oVirt gerrit 39497 0 master MERGED Using vdsm-tool restore-conf in init script instead of direct call Never

Description Michael Burman 2015-03-10 15:50:57 UTC
Created attachment 1000030 [details]
Logs

Description of problem:
Vdsm upgrade 3.4 >> 3.5.1 doesn't restarting vdsmd service.
If trying to activate host in 3.5 cluster failing with error:
Host navy-vds1.qa.lab.tlv.redhat.com is compatible with versions (3.0,3.1,3.2,3.3,3.4) and cannot join Cluster mb3_4_repro which is set to version 3.5.

- While performing upgrade vdsm from 3.4 to 3.5.1, using 'yum update' command, vdsmd service doesn't get restarted after update.

- vdsClient -s 0 getVdsCaps after update:
supportedENGINEs = ['3.0', '3.1', '3.2', '3.3', '3.4']

- var/log/messages:
Mar 10 16:52:54 navy-vds1 python: module upgrade_300_networks could not load to vdsm-tool: Traceback (most recent call last):#012  File "/usr/bin/vdsm-tool", line 81, in load_modules#012    mod_absp, mod_desc)#012  File "/usr/lib64/python2.6/site-packages/vdsm/tool/upgrade_300_networks.py", line 30, in <module>#012    from netconf import ifcfg#012ImportError: No module named netconf

- Looks like after upgrade the old code/script(from vdsm_python) is still running and vdsmd is not restarting.

- Looks like a critical issue.



Version-Release number of selected component (if applicable):
3.5.1-0.1.el6ev
vdsm 4.14 >> vdsm 4.16.12


Steps to Reproduce:
1. clean 3.4 server(vdsm 4.14.)
2. run 'yum update' to 3.5.1(vdsm 4.16.12-2), using the right repos
3. try to activate host in 3.5 cluster

Actual results:
Failed. vdsmd doesn't restarted. 

Expected results:
Should success. vdsmd service should restarted during vdsm update.

Additional info:

Comment 1 Dan Kenigsberg 2015-03-10 15:57:31 UTC
Note that for some reason, the post-un script executed old code from
/usr/lib64/python2.6/site-packages/vdsm/tool/upgrade_300_networks.py while new vdsm should take it from /usr/lib/python2.6... due to http://gerrit.ovirt.org/33738.

I suppose that it is a race, where lib64 code is still there, and is prior in search path to the new code.

Comment 2 Yaniv Bronhaim 2015-03-11 11:09:25 UTC
Regarding comment #1 - it doesn't really matter although its an issue that sounds risky - it happens only in the %post script which still uses vdsm-tool. we should merge https://gerrit.ovirt.org/#/c/34394/ already and such reports won't appear anymore.

Regarding the vdsmd restart - we do perform stop and start on upgrade if vdsmd up - this happens in the %postun script.

Yeela - as you already on fixing multipath configure on upgrade, please validate this issue as well and update what's going wrong there

Comment 3 Yeela Kaplan 2015-03-24 17:23:41 UTC
This issue is reproducible .
This is a different issue than the multipath one,
as vdsm does not even try to stop at all,
but keeps running as 4.14.

Comment 4 Yaniv Bronhaim 2015-03-31 11:21:40 UTC
Brrrrr.. It much worst regression than I thought.. 

The cause of it is actually upgrade_300_networks.py which used to import netconf which exist in 3.4 and removed in 3.5 :( that leads to exception while running the %postun script part when we call "vdsm-tool service-status vdsmd". This exception crashes the script and no restart nor configure will happen at all. User will must do it manually (running vdsm-tool configure --force) which is a workaround, but with full access to host.

To fix that we must backport to 3.4 https://gerrit.ovirt.org/#/c/23456 which I have no idea how we missed so far... 

Will we be able to respin Barak? 

The removal of netconf caused that, so it will exist between any 4.14.* vdsm to any 4.16 and above. respin or returning the netconf folder can save the current problem. Actually we must separate vdsm-tool from vdsm code at all to avoid those bugs for the long run - this mixes old and new code during installation.

Comment 5 Yaniv Bronhaim 2015-03-31 15:17:56 UTC
My comment does not fully right. The vdsm-tool code comes from the installed version and not the upgraded one. 

I explain it deeply in the commit message - please review https://gerrit.ovirt.org/39408

I remove the needinfo, respin to 3.4 will not be needed, but it must be fixed in 3.5

We need to consider if this needs to get into 3.6 as well to allow upgrade from 3.4 to 3.6

Comment 6 Michal Skrivanek 2015-04-02 12:20:07 UTC
note documentation still says you better reboot host after upgrade

Comment 7 Oved Ourfali 2015-04-02 12:25:44 UTC
Michael - do you see this issue also when upgrading from 3.4 to 3.5.0?

Comment 8 Michael Burman 2015-04-02 13:36:43 UTC
Hi Oved,

vdsm-4.14.18-7.el6ev.x86_64 >> vdsm-4.16.8.1-5.el6ev.x86_64 (vt13.7)
Same issue

Comment 9 Oved Ourfali 2015-04-02 13:39:56 UTC
Gil - anything you've done differently?
As Michael saw it happens also when upgrading from 3.4 to 3.5.0.

Thanks,
Oved

Comment 10 Gil Klein 2015-04-02 13:55:23 UTC
(In reply to Oved Ourfali from comment #9)
> Gil - anything you've done differently?
> As Michael saw it happens also when upgrading from 3.4 to 3.5.0.
> 
> Thanks,
> Oved
I'm pretty sure my flow was:
1. Move the host to maintenance
2. Update repos 
3. Run "yum update" on the host

And did not see this issue on the QE setup.

Comment 11 Michael Burman 2015-04-02 13:59:08 UTC
Gil, did you tried to activate this host after upgrade in 3.5 cluster?

Comment 12 Petr Kubica 2015-04-02 14:21:54 UTC
I tried upgrade from 3.4 to 3.5.0 (latest_av >> vt13.7 -> 3.5.0) and (latest_av >> vt13.13 -> 3.5.0-2) and in both cases I see this issue.

My flow was the same as Gil
1. Move host to maintenance
2. update repos
3. run yum update on the host
4. setting cluster level to 3.5
5. activate host

and after activation the host I see the failure
Host 10.34.60.156 is compatible with versions (3.0,3.1,3.2,3.3,3.4) and cannot join Cluster Default which is set to version 3.5.

Comment 13 Petr Kubica 2015-04-02 14:55:50 UTC
add comment #12, service vdsmd doesn't get restarted after update.

Comment 14 Michael Burman 2015-04-02 15:07:56 UTC
Petr, same here.

Comment 15 Gil Klein 2015-04-02 19:06:54 UTC
(In reply to Michael Burman from comment #11)
> Gil, did you tried to activate this host after upgrade in 3.5 cluster?
In fact I did not activated the host in a 3.5 cluster after the upgrade. I kept it on 3.4 and upgrade after a week or so.

Comment 17 Petr Kubica 2015-04-29 13:33:12 UTC
Verified in
vdsm.x86_64 0:4.16.14-0.el7 >> vdsm.x86_64 0:4.17.0-732.git57b00f9.el7

Comment 20 errata-xmlrpc 2016-03-09 19:33:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0362.html


Note You need to log in before you can comment on or make changes to this bug.