Bug 1692685 - Bad/missing home directory for user vdsm causes a failure
Summary: Bad/missing home directory for user vdsm causes a failure
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Packaging.rpm
Version: 4.30.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ovirt-4.4.0
: ---
Assignee: Marcin Sobczyk
QA Contact: Petr Matyáš
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-26 08:56 UTC by Yedidyah Bar David
Modified: 2020-05-20 20:04 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-20 20:04:09 UTC
oVirt Team: Infra
Embargoed:
mperina: ovirt-4.4?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 105852 0 master MERGED vdsmd: Make sure 'vdsm' user has proper home dir 2020-03-26 14:55:29 UTC

Description Yedidyah Bar David 2019-03-26 08:56:31 UTC
Description of problem:

Not sure what the flow was, that lead to current state, but we have:

$ grep vdsm /etc/passwd
vdsm:x:36:36:Node Virtualization Manager:/:/sbin/nologin

With this, many things worked well (including e.g. creating a VM), but ovirt-engine-metrics installation failed. Didn't look exactly where, but it seems to try to create a template or something like that. vdsm log has:

2019-03-26 09:04:56,397+0200 INFO  (jsonrpc/5) [api.host] FINISH getJobs return={'status': {'message': 'Done', 'code': 0}, 'jobs': {u'61c75a07-ba9a-43da-9743-d7aca66b8a11': {'status': 'failed', 'error': {'message': 'General Exception: (\'Command [\\\'/usr/bin/virt-sysprep\\\', \\\'-a\\\', u\\\'/rhev/data-center/mnt/vserver-spider.eng.lab.tlv.redhat.com:_pub_eslutsky_rhev-setup-01_rhev/1ac36da8-d085-468c-9341-0805dc9863fe/images/3e20471d-8f03-4308-a889-8d325481b294/2c5041db-2200-48cc-945f-1fe1a88528a3\\\'] failed with rc=1 out=[\\\'[   0.0] Examining the guest ...\\\'] err=["libvirt: XML-RPC error : Cannot create user runtime directory \\\'//.cache/libvirt\\\': Permission denied", \\\'virt-sysprep: error: libguestfs error: could not connect to libvirt (URI = \\\', "qemu:///session): Cannot create user runtime directory \\\'//.cache/libvirt\\\': ", \\\'Permission denied [code=38 int1=13]\\\', \\\'\\\', \\\'If reporting bugs, run virt-sysprep with debugging enabled and include the \\\', \\\'complete output:\\\', \\\'\\\', \\\'  virt-sysprep -v -x [...]\\\']\',)', 'code': 100}, 'job_type': 'virt', 'id': u'61c75a07-ba9a-43da-9743-d7aca66b8a11', 'description': 'seal_vm'}}} from=::ffff:10.35.17.99,34428, flow_id=ed1fcca8-ed54-4305-9368-8fb25116f7a1 (api:52)

Not sure what's the best solution, if at all - perhaps check (where? in rpm scripts? service start?) that user has writable home directory or something like that. Or perhaps explicitly set libvirt cache dir, etc.

Version-Release number of selected component (if applicable):

4.3, but I think much longer before that as well

How reproducible:
Always

Steps to Reproduce:
1. Create a user vdsm with non-writable (by it) home directory (e.g. '/')
2. Add the machine as a host to an engine
3. Try to deploy metrics. Perhaps it's enough to try to create a template.

Actual results:
Fails as above

Expected results:
Should fail much earlier, perhaps on vdsm package installation (an error, at least) or on vdsm service start

Additional info:

Comment 1 Martin Perina 2019-03-26 09:52:19 UTC
AFAIK vdsm user is created during vdsm package installation:

https://github.com/oVirt/vdsm/blob/master/vdsm.spec.in#L756

So are you sure that vdsm user was not manually altered afterwards? Is this really always reproducible?

Comment 2 Yedidyah Bar David 2019-03-26 09:55:25 UTC
(In reply to Martin Perina from comment #1)
> AFAIK vdsm user is created during vdsm package installation:
> 
> https://github.com/oVirt/vdsm/blob/master/vdsm.spec.in#L756

Previous line is:
/usr/bin/getent passwd %{vdsm_user} >/dev/null || \

So if you pre-create it, it's not created again.

> 
> So are you sure that vdsm user was not manually altered afterwards? Is this
> really always reproducible?

I didn't try to reproduce myself, but I think it is, based on above.

Comment 3 Yedidyah Bar David 2019-03-26 09:58:19 UTC
BTW, on the specific machine where this happened, I failed to find who/what created the user. It might have been edited manually, or a result of a series of attempts to install/reinstall/upgrade RHEL/RHVH.

Comment 4 Dmitri Rubinstein 2019-07-24 11:14:11 UTC
I have exactly this situation that prevented me from creating a template from the VM. virt-sysprep failed with the following error message:

Error: Command ['/usr/bin/virt-sysprep', '-a', u'/rhev/data-center/mnt/192.168.84.114:_volume1_ovirt-asr-storage/5a5194ce-bb85-4371-831f-59980a263c82/images/bbd56095-6a14-494e-9a18-de4414064d7b/a2f49922-1dea-45f9-a27b-fb9febb26967'] failed with rc=1 out='[   0.0] Examining the guest ...\n' err="libvirt: XML-RPC error : Cannot create user runtime directory '//.cache/libvirt': Permission denied\nvirt-sysprep: error: libguestfs error: could not connect to libvirt (URI = \nqemu:///session): Cannot create user runtime directory '//.cache/libvirt': \nPermission denied [code=38 int1=13]\n\nIf reporting bugs, run virt-sysprep with debugging enabled and include the \ncomplete output:\n\n  virt-sysprep -v -x [...]\n"

oVirt installation was done on fresh CentOS 7.6.1810, in self-hosted engine mode. Before self-hosted engine mode installation I tried non self-hosted mode but uninstalled thereafter.
oVirt version: 4.3.4.3-1.el7.

$ grep vdsm /etc/passwd
vdsm:x:36:36:Node Virtualization Manager:/:/sbin/nologin

My solution was to setup home directory for the vdsm user:

hosted-engine --set-maintenance --mode=global
systemctl stop ovirt-ha-broker vdsmd ovirt-imageio-daemon && \
  usermod -d /home/vdsm vdsm && \
  mkdir /home/vdsm && \
  chown vdsm:kvm /home/vdsm && \
  chmod 0700 /home/vdsm && \
  sudo -u vdsm /usr/bin/bash -c 'ls -la $HOME' # <-- Last line for check
hosted-engine --set-maintenance --mode=none

I would like to know if there is a better way to fix this, since even after switching to the maintenance mode ovirt-ha-broker, vdsm and ovirt-imageio-daemon services still run and prevent me from running usermod.

Comment 5 Martin Perina 2019-07-24 12:03:19 UTC
There is no easy way how to remove oVirt installation, that's why don't support installation of oVirt to previously installed hosts. So the only way around is to reprovision the using clean CentOS installation and start hosted-engine installation from scratch

Comment 6 Dmitri Rubinstein 2019-07-24 12:12:28 UTC
Before reinstalling in self-hosted engine mode I used engine-cleanup as documented here: https://www.ovirt.org/documentation/install-guide/chap-oVirt_Engine_Related_Tasks.html

Comment 7 Martin Perina 2019-07-24 13:21:40 UTC
(In reply to Dmitri Rubinstein from comment #6)
> Before reinstalling in self-hosted engine mode I used engine-cleanup as
> documented here:
> https://www.ovirt.org/documentation/install-guide/chap-
> oVirt_Engine_Related_Tasks.html

engine-cleanup can used only to clean up engine host, but it cannot be used to clean up hypervisor host, where VDSM is installed.

Comment 8 Dmitri Rubinstein 2019-07-24 13:40:56 UTC
So the first time I installed oVirt in the not self-hosted engine mode, VDSM was installed without a home directory ?
As far as I understand virt-sysprep would not work in this case, so why it was done this way ?

Comment 9 Yedidyah Bar David 2019-07-25 06:20:55 UTC
(In reply to Dmitri Rubinstein from comment #8)
> So the first time I installed oVirt in the not self-hosted engine mode, VDSM
> was installed without a home directory ?

Perhaps. As I wrote above, I failed to find the root cause in my case. If you find it in yours, please provide details. Thanks.

To clarify - even fixing the root cause that made vdsm has '/' as home does not prevent manually doing that, thus leading to current bug.

IMO the fix to current bug is to identify this state and report/prevent/fail setup etc. This was still not decided upon, and even if the test itself is trivial, I am not sure what's the best place to add it. Currently the bug is on vdsm packaging, but perhaps it makes more sense to do during host deploy (which is going to be rewritten in ansible in 4.4). And like many other bugs, doing this is the small part - the large one is to test the patch in all relevant flows and see that it works as expected and does not break anything.

> As far as I understand virt-sysprep would not work in this case, so why it
> was done this way ?

Not sure what you ask. If it's 'Why didn't this bug got solved already?', then I guess it's simply considered low priority, mainly since no-one knows the exact flow leading to it and since we assume that it's a rare flow.

Also, we do not explicitly pass the cache directory to virt-sysprep, and I do not see an option for this. Perhaps one should be added there and we should use it. Using the home directory should still usually be safe and legitimate, imo.

Comment 10 Dmitri Rubinstein 2019-07-25 10:07:32 UTC
Btw, your link https://github.com/oVirt/vdsm/blob/master/vdsm.spec.in#L756
is now pointing to this line:

Allows to use custom device properties to connect a guest vNIC to a host

I assume we are talking about this line:

    /usr/sbin/useradd -r -u 36 -g %{vdsm_group} -d /var/lib/vdsm \
        -s /sbin/nologin -c "Node Virtualization Manager" %{vdsm_user}

> So the first time I installed oVirt in the not self-hosted engine mode,
> VDSM
> > was installed without a home directory ?
>
> Perhaps. As I wrote above, I failed to find the root cause in my case. If
> you
> find it in yours, please provide details. Thanks.
>

I defintely not changed manually vdsm account, and started with oVirt
installation directly after installing CentOS. Thus I assume home directory
was changed as a part of installation process.
The next time I perform the installation, I will check after which step
vdsm user gets / as home directory.


>
> To clarify - even fixing the root cause that made vdsm has '/' as home
> does not
> prevent manually doing that, thus leading to current bug.
>
> IMO the fix to current bug is to identify this state and
> report/prevent/fail
> setup etc. This was still not decided upon, and even if the test itself is
> trivial, I am not sure what's the best place to add it. Currently the bug
> is on
> vdsm packaging, but perhaps it makes more sense to do during host deploy
> (which
> is going to be rewritten in ansible in 4.4). And like many other bugs,
> doing
> this is the small part - the large one is to test the patch in all relevant
> flows and see that it works as expected and does not break anything.
>

IMHO, it would also be helpful if vdsm service would check for such
situations at startup, so that it is also displayed in oVirt UI. oVirt UI
only told me that the template creation failed, but it wasn't clear where
to look for the cause.


>
> > As far as I understand virt-sysprep would not work in this case, so why
> it
> > was done this way ?
>
> Not sure what you ask. If it's 'Why didn't this bug got solved already?',
> then
> I guess it's simply considered low priority, mainly since no-one knows the
> exact flow leading to it and since we assume that it's a rare flow.
>

No, I would like to know if it makes sense to create a vdsm account with a
missing home directory because virt-sysprep needs an existing home
directory. I assume that there is an installation routine that does this
because I didn't do it.

Comment 11 Yedidyah Bar David 2019-07-25 10:14:02 UTC
(In reply to Dmitri Rubinstein from comment #10)
> Btw, your link https://github.com/oVirt/vdsm/blob/master/vdsm.spec.in#L756
> is now pointing to this line:
> 
> Allows to use custom device properties to connect a guest vNIC to a host
> 
> I assume we are talking about this line:
> 
>     /usr/sbin/useradd -r -u 36 -g %{vdsm_group} -d /var/lib/vdsm \
>         -s /sbin/nologin -c "Node Virtualization Manager" %{vdsm_user}

I assumed so too, and see my comment 2.

> 
> > So the first time I installed oVirt in the not self-hosted engine mode,
> > VDSM
> > > was installed without a home directory ?
> >
> > Perhaps. As I wrote above, I failed to find the root cause in my case. If
> > you
> > find it in yours, please provide details. Thanks.
> >
> 
> I defintely not changed manually vdsm account, and started with oVirt
> installation directly after installing CentOS. Thus I assume home directory
> was changed as a part of installation process.

This makes sense, but I failed to find what causes this.

> The next time I perform the installation, I will check after which step
> vdsm user gets / as home directory.

Thanks.

> 
> 
> >
> > To clarify - even fixing the root cause that made vdsm has '/' as home
> > does not
> > prevent manually doing that, thus leading to current bug.
> >
> > IMO the fix to current bug is to identify this state and
> > report/prevent/fail
> > setup etc. This was still not decided upon, and even if the test itself is
> > trivial, I am not sure what's the best place to add it. Currently the bug
> > is on
> > vdsm packaging, but perhaps it makes more sense to do during host deploy
> > (which
> > is going to be rewritten in ansible in 4.4). And like many other bugs,
> > doing
> > this is the small part - the large one is to test the patch in all relevant
> > flows and see that it works as expected and does not break anything.
> >
> 
> IMHO, it would also be helpful if vdsm service would check for such
> situations at startup, so that it is also displayed in oVirt UI. oVirt UI
> only told me that the template creation failed, but it wasn't clear where
> to look for the cause.

Makes sense to me.

> 
> 
> >
> > > As far as I understand virt-sysprep would not work in this case, so why
> > it
> > > was done this way ?
> >
> > Not sure what you ask. If it's 'Why didn't this bug got solved already?',
> > then
> > I guess it's simply considered low priority, mainly since no-one knows the
> > exact flow leading to it and since we assume that it's a rare flow.
> >
> 
> No, I would like to know if it makes sense to create a vdsm account with a
> missing home directory because virt-sysprep needs an existing home
> directory. I assume that there is an installation routine that does this
> because I didn't do it.

OK, so I agree it does not make sense, but failed to find where this happens...

Anyway, thanks for your report!

Comment 12 Marcin Sobczyk 2020-03-02 14:00:40 UTC
Since we don't have a reproducible path for this, and it's likely that the problem was a result of a malconfigured/partially configured host, the fix consists only of checking whether the home directory for 'vdsm' user exists and has proper permissions. If not, we raise ASAP during the startup of vdsm.

Comment 13 Petr Matyáš 2020-03-11 11:16:21 UTC
Verified on vdsm-4.40.5-1.el8ev.x86_64

Comment 14 Sandro Bonazzola 2020-05-20 20:04:09 UTC
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.