Created attachment 1000030 [details]
Description of problem:
Vdsm upgrade 3.4 >> 3.5.1 doesn't restarting vdsmd service.
If trying to activate host in 3.5 cluster failing with error:
Host navy-vds1.qa.lab.tlv.redhat.com is compatible with versions (3.0,3.1,3.2,3.3,3.4) and cannot join Cluster mb3_4_repro which is set to version 3.5.
- While performing upgrade vdsm from 3.4 to 3.5.1, using 'yum update' command, vdsmd service doesn't get restarted after update.
- vdsClient -s 0 getVdsCaps after update:
supportedENGINEs = ['3.0', '3.1', '3.2', '3.3', '3.4']
Mar 10 16:52:54 navy-vds1 python: module upgrade_300_networks could not load to vdsm-tool: Traceback (most recent call last):#012 File "/usr/bin/vdsm-tool", line 81, in load_modules#012 mod_absp, mod_desc)#012 File "/usr/lib64/python2.6/site-packages/vdsm/tool/upgrade_300_networks.py", line 30, in <module>#012 from netconf import ifcfg#012ImportError: No module named netconf
- Looks like after upgrade the old code/script(from vdsm_python) is still running and vdsmd is not restarting.
- Looks like a critical issue.
Version-Release number of selected component (if applicable):
vdsm 4.14 >> vdsm 4.16.12
Steps to Reproduce:
1. clean 3.4 server(vdsm 4.14.)
2. run 'yum update' to 3.5.1(vdsm 4.16.12-2), using the right repos
3. try to activate host in 3.5 cluster
Failed. vdsmd doesn't restarted.
Should success. vdsmd service should restarted during vdsm update.
Note that for some reason, the post-un script executed old code from
/usr/lib64/python2.6/site-packages/vdsm/tool/upgrade_300_networks.py while new vdsm should take it from /usr/lib/python2.6... due to http://gerrit.ovirt.org/33738.
I suppose that it is a race, where lib64 code is still there, and is prior in search path to the new code.
Regarding comment #1 - it doesn't really matter although its an issue that sounds risky - it happens only in the %post script which still uses vdsm-tool. we should merge https://gerrit.ovirt.org/#/c/34394/ already and such reports won't appear anymore.
Regarding the vdsmd restart - we do perform stop and start on upgrade if vdsmd up - this happens in the %postun script.
Yeela - as you already on fixing multipath configure on upgrade, please validate this issue as well and update what's going wrong there
This issue is reproducible .
This is a different issue than the multipath one,
as vdsm does not even try to stop at all,
but keeps running as 4.14.
Brrrrr.. It much worst regression than I thought..
The cause of it is actually upgrade_300_networks.py which used to import netconf which exist in 3.4 and removed in 3.5 :( that leads to exception while running the %postun script part when we call "vdsm-tool service-status vdsmd". This exception crashes the script and no restart nor configure will happen at all. User will must do it manually (running vdsm-tool configure --force) which is a workaround, but with full access to host.
To fix that we must backport to 3.4 https://gerrit.ovirt.org/#/c/23456 which I have no idea how we missed so far...
Will we be able to respin Barak?
The removal of netconf caused that, so it will exist between any 4.14.* vdsm to any 4.16 and above. respin or returning the netconf folder can save the current problem. Actually we must separate vdsm-tool from vdsm code at all to avoid those bugs for the long run - this mixes old and new code during installation.
My comment does not fully right. The vdsm-tool code comes from the installed version and not the upgraded one.
I explain it deeply in the commit message - please review https://gerrit.ovirt.org/39408
I remove the needinfo, respin to 3.4 will not be needed, but it must be fixed in 3.5
We need to consider if this needs to get into 3.6 as well to allow upgrade from 3.4 to 3.6
note documentation still says you better reboot host after upgrade
Michael - do you see this issue also when upgrading from 3.4 to 3.5.0?
vdsm-4.14.18-7.el6ev.x86_64 >> vdsm-18.104.22.168-5.el6ev.x86_64 (vt13.7)
Gil - anything you've done differently?
As Michael saw it happens also when upgrading from 3.4 to 3.5.0.
(In reply to Oved Ourfali from comment #9)
> Gil - anything you've done differently?
> As Michael saw it happens also when upgrading from 3.4 to 3.5.0.
I'm pretty sure my flow was:
1. Move the host to maintenance
2. Update repos
3. Run "yum update" on the host
And did not see this issue on the QE setup.
Gil, did you tried to activate this host after upgrade in 3.5 cluster?
I tried upgrade from 3.4 to 3.5.0 (latest_av >> vt13.7 -> 3.5.0) and (latest_av >> vt13.13 -> 3.5.0-2) and in both cases I see this issue.
My flow was the same as Gil
1. Move host to maintenance
2. update repos
3. run yum update on the host
4. setting cluster level to 3.5
5. activate host
and after activation the host I see the failure
Host 10.34.60.156 is compatible with versions (3.0,3.1,3.2,3.3,3.4) and cannot join Cluster Default which is set to version 3.5.
add comment #12, service vdsmd doesn't get restarted after update.
Petr, same here.
(In reply to Michael Burman from comment #11)
> Gil, did you tried to activate this host after upgrade in 3.5 cluster?
In fact I did not activated the host in a 3.5 cluster after the upgrade. I kept it on 3.4 and upgrade after a week or so.
vdsm.x86_64 0:4.16.14-0.el7 >> vdsm.x86_64 0:4.17.0-732.git57b00f9.el7
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.