Description of problem: After upgrading rhev-hypervisor from 20120510.0.el6_2 version to 20140725.0.el6ev version its going into Install_failed in the rhvem GUI and on the hypervisor 'vdsmd' service is not running. When manually trying to start the vdsmd, its giving the below error message: ------ # /etc/init.d/vdsmd start initctl: Job is already running: libvirtd vdsm: Running mkdirs vdsm: Running configure_coredump vdsm: Running configure_vdsm_logs vdsm: Running run_init_hooks vdsm: Running gencerts hostname: Unknown host vdsm: Running check_is_configured libvirt is already configured for vdsm Traceback (most recent call last): File "/usr/bin/vdsm-tool", line 145, in <module> sys.exit(main()) File "/usr/bin/vdsm-tool", line 142, in main return tool_command[cmd]["command"](*args[1:]) File "/usr/lib64/python2.6/site-packages/vdsm/tool/configurator.py", line 260, in isconfigured File "/usr/lib64/python2.6/site-packages/vdsm/tool/configurator.py", line 162, in isconfigured KeyError: 'getpwnam(): name not found: sanlock' vdsm: stopped during execute check_is_configured task (task returned with error code 1). vdsm start [FAILED] ------ Version-Release number of selected component (if applicable): rhevm 3.4.1 rhevh-6.5-20140725.0.el6ev.iso How reproducible: 100% Steps to Reproduce: 1. Upgrade any rhevm from rhevh-20120510.0.el6_2 to rhevh-20140725.0.el6, it will mark as Install failed in the rhevm webadmin GUI 2. If you will check in the command mode of the rhev-h you can see 'vdsmd' service is not running. 3. If you will try to start the vdsmd service, you will get the above error message. Actual results: upgrade failed as vdsmd is not able to start Expected results: upgrade should work well
Hi, During the analysis I observed 'vdsmd' is unable to start due to the below line specially: -- KeyError: 'getpwnam(): name not found: sanlock' <=== -- The above line indicates that its unable to get the user 'sanlock' in the /etc/passwd file. So I manually added the below link in the /etc/passwd file: --- sanlock:x:179:179:sanlock:/var/run/sanlock:/sbin/nologin --- Then tried to start the vdsmd service and it started as expected. So the BUG is, while upgrading the rhev-h from 6.2 to 6.5 its neither checking nor changing the /etc/passwd file for the existance of 'sanlock' user. Thanks, Udayendu
The corresponding vdsm versions are: vdsm-4.9-112.12.el6_2.x86_64 (RSA/8, Mon Apr 16 20:02:23 2012, Key ID 199e2f91fd431d51) vdsm-cli-4.9-112.12.el6_2.x86_64 (RSA/8, Mon Apr 16 20:02:25 2012, Key ID 199e2f91fd431d51) vdsm-hook-vhostmd-4.9-112.12.el6_2.x86_64 (RSA/8, Mon Apr 16 20:02:26 2012, Key ID 199e2f91fd431d51) vdsm-reg-4.9-112.12.el6_2.x86_64 (RSA/8, Mon Apr 16 20:02:27 2012, Key ID 199e2f91fd431d51)
Moving this to vdsm, as vdsm is creating the sanlock user.
The user is created by yum installing sanlock - shouldn't yum upgrading vdsm have taken care of it?
(In reply to Allon Mureinik from comment #7) > The user is created by yum installing sanlock - Yes. You are right. I missed interpreted a line in the specfile. > shouldn't yum upgrading vdsm > have taken care of it? That might be the problem on RHEV-H. The post scriptlet's of rpms - where I'd expect the handling of this case - are not run between RHEV-H image upgrades. And if the postlet's were not run, then it's likely that the migration/handling did not happen.
(In reply to Fabian Deutsch from comment #8) > (In reply to Allon Mureinik from comment #7) > > The user is created by yum installing sanlock - > > Yes. You are right. I missed interpreted a line in the specfile. > > > shouldn't yum upgrading vdsm > > have taken care of it? > > That might be the problem on RHEV-H. > > The post scriptlet's of rpms - where I'd expect the handling of this case - > are not run between RHEV-H image upgrades. > And if the postlet's were not run, then it's likely that the > migration/handling did not happen. Thanks Fabian. According to this, I think the bug should be returned to whiteboard=node - I don't see what we can do from the storage tean's side, no?
(In reply to Allon Mureinik from comment #9) ... > Thanks Fabian. > According to this, I think the bug should be returned to whiteboard=node - I > don't see what we can do from the storage tean's side, no? Even if it already happened, yes - It should return to node.
Node has now got a hook which is run when the image changes (i.e when an update is run) Please add the logic required for this bug as a script which is run when the hook is triggered. Please use the on-changed-boot-image hook for this. See src/ovirt/node/utils/hooks.py for more details.
(In reply to Fabian Deutsch from comment #17) > Please use the on-changed-boot-image hook for this. Please use the on-boot hook for this, as the on-changed-boot-image hook is only available in master.
Hi Pavel Stehlik, I noticed that you have changed the QA contact to huiwa. Since this bug is ovirt-node-plugin-vdsm component, I wander why the QA contact is me. If you need virt-qe team's help to verify this bug, then according to comment#15, hadong will be the correct one to track this bug. Thanks Hui Wang
taking to RHEVM System
Verified in rhev-hypervisor6-6.6-20141007.0.iso
RHEV 3.5.0 has been released. I am closing this bug, because it has been VERIFIED.