Bug 1127122

Summary: RHEVH upgrade should create the sanlock user if it does not exist
Product: Red Hat Enterprise Virtualization Manager Reporter: Udayendu Sekhar Kar <ukar>
Component: ovirt-node-plugin-vdsmAssignee: Douglas Schilling Landgraf <dougsland>
Status: CLOSED CURRENTRELEASE QA Contact: Petr Kubica <pkubica>
Severity: high Docs Contact:
Priority: medium    
Version: 3.4.1-1CC: aberezin, amureini, bazulay, chhudson, cshao, danken, dfediuck, dougsland, ecohen, fdeutsch, hadong, huiwa, iheim, leiwang, lpeer, mkalinin, pstehlik, rpai, scohen, yaniwang, ycui, yeylon
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: node
Fixed In Version: ovirt-node-plugin-vdsm-0.2.0-9 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-12 14:10:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Udayendu Sekhar Kar 2014-08-06 08:25:22 UTC
Description of problem:
After upgrading rhev-hypervisor from 20120510.0.el6_2 version to 20140725.0.el6ev version its going into Install_failed in the rhvem GUI and on the hypervisor 'vdsmd' service is not running. When manually trying to start the vdsmd, its giving the below error message:

------
# /etc/init.d/vdsmd start
initctl: Job is already running: libvirtd
vdsm: Running mkdirs
vdsm: Running configure_coredump
vdsm: Running configure_vdsm_logs
vdsm: Running run_init_hooks
vdsm: Running gencerts
hostname: Unknown host
vdsm: Running check_is_configured
libvirt is already configured for vdsm
Traceback (most recent call last):
  File "/usr/bin/vdsm-tool", line 145, in <module>
    sys.exit(main())
  File "/usr/bin/vdsm-tool", line 142, in main
    return tool_command[cmd]["command"](*args[1:])
  File "/usr/lib64/python2.6/site-packages/vdsm/tool/configurator.py", line 260, in isconfigured
  File "/usr/lib64/python2.6/site-packages/vdsm/tool/configurator.py", line 162, in isconfigured
KeyError: 'getpwnam(): name not found: sanlock'
vdsm: stopped during execute check_is_configured task (task returned with error code 1).
vdsm start                                                 [FAILED]
------


Version-Release number of selected component (if applicable):
rhevm 3.4.1
rhevh-6.5-20140725.0.el6ev.iso

How reproducible:
100%

Steps to Reproduce:
1. Upgrade any rhevm from  rhevh-20120510.0.el6_2 to rhevh-20140725.0.el6, it will mark as Install failed in the rhevm webadmin GUI 
2. If you will check in the command mode of the rhev-h you can see 'vdsmd' service is not running.
3. If you will try to start the vdsmd service, you will get the above error message.

Actual results:
upgrade failed as vdsmd is not able to start

Expected results:
upgrade should work well

Comment 1 Udayendu Sekhar Kar 2014-08-06 08:32:04 UTC
Hi,

During the analysis I observed 'vdsmd' is unable to start due to the below line specially:

--
KeyError: 'getpwnam(): name not found: sanlock'  <===
--

The above line indicates that its unable to get the user 'sanlock' in the /etc/passwd file.

So I manually added the below link in the /etc/passwd file:

---
sanlock:x:179:179:sanlock:/var/run/sanlock:/sbin/nologin
---

Then tried to start the vdsmd service and it started as expected.


So the BUG is, while upgrading the rhev-h from 6.2 to 6.5 its neither checking nor changing the /etc/passwd file for the existance of 'sanlock' user.

Thanks,
Udayendu

Comment 2 Fabian Deutsch 2014-08-06 13:36:02 UTC
The corresponding vdsm versions are:
vdsm-4.9-112.12.el6_2.x86_64 (RSA/8, Mon Apr 16 20:02:23 2012, Key ID 199e2f91fd431d51)
vdsm-cli-4.9-112.12.el6_2.x86_64 (RSA/8, Mon Apr 16 20:02:25 2012, Key ID 199e2f91fd431d51)
vdsm-hook-vhostmd-4.9-112.12.el6_2.x86_64 (RSA/8, Mon Apr 16 20:02:26 2012, Key ID 199e2f91fd431d51)
vdsm-reg-4.9-112.12.el6_2.x86_64 (RSA/8, Mon Apr 16 20:02:27 2012, Key ID 199e2f91fd431d51)

Comment 5 Fabian Deutsch 2014-08-06 13:51:26 UTC
Moving this to vdsm, as vdsm is creating the sanlock user.

Comment 7 Allon Mureinik 2014-08-13 08:23:48 UTC
The user is created by yum installing sanlock - shouldn't yum upgrading vdsm have taken care of it?

Comment 8 Fabian Deutsch 2014-08-13 09:35:08 UTC
(In reply to Allon Mureinik from comment #7)
> The user is created by yum installing sanlock - 

Yes. You are right. I missed interpreted a line in the specfile.

> shouldn't yum upgrading vdsm
> have taken care of it?

That might be the problem on RHEV-H.

The post scriptlet's of rpms - where I'd expect the handling of this case - are not run between RHEV-H image upgrades.
And if the postlet's were not run, then it's likely that the migration/handling did not happen.

Comment 9 Allon Mureinik 2014-08-13 10:32:49 UTC
(In reply to Fabian Deutsch from comment #8)
> (In reply to Allon Mureinik from comment #7)
> > The user is created by yum installing sanlock - 
> 
> Yes. You are right. I missed interpreted a line in the specfile.
> 
> > shouldn't yum upgrading vdsm
> > have taken care of it?
> 
> That might be the problem on RHEV-H.
> 
> The post scriptlet's of rpms - where I'd expect the handling of this case -
> are not run between RHEV-H image upgrades.
> And if the postlet's were not run, then it's likely that the
> migration/handling did not happen.

Thanks Fabian.
According to this, I think the bug should be returned to whiteboard=node - I don't see what we can do from the storage tean's side, no?

Comment 14 Fabian Deutsch 2014-09-02 11:06:30 UTC
(In reply to Allon Mureinik from comment #9)
...

> Thanks Fabian.
> According to this, I think the bug should be returned to whiteboard=node - I
> don't see what we can do from the storage tean's side, no?

Even if it already happened, yes - It should return to node.

Comment 17 Fabian Deutsch 2014-09-05 14:55:10 UTC
Node has now got a hook which is run when the image changes (i.e when an update is run)

Please add the logic required for this bug as a script which is run when the hook is triggered.

Please use the on-changed-boot-image hook for this.

See src/ovirt/node/utils/hooks.py for more details.

Comment 18 Fabian Deutsch 2014-09-05 14:57:11 UTC
(In reply to Fabian Deutsch from comment #17)

> Please use the on-changed-boot-image hook for this.

Please use the on-boot hook for this, as the on-changed-boot-image hook is only available in master.

Comment 22 wanghui 2014-09-11 07:23:01 UTC
Hi Pavel Stehlik,

I noticed that you have changed the QA contact to huiwa. Since this bug is ovirt-node-plugin-vdsm component, I wander why the QA contact is me.

If you need virt-qe team's help to verify this bug, then according to comment#15, hadong will be the correct one to track this bug.

Thanks
Hui Wang

Comment 23 Pavel Stehlik 2014-09-11 07:46:40 UTC
taking to RHEVM System

Comment 24 Petr Kubica 2014-10-16 14:24:23 UTC
Verified in rhev-hypervisor6-6.6-20141007.0.iso

Comment 27 Fabian Deutsch 2015-02-12 14:10:29 UTC
RHEV 3.5.0 has been released. I am closing this bug, because it has been VERIFIED.