Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1152137

Summary: VDSM fails to start on ppc host due to `user not found: unable to canonify user and get auxprops` error
Product: Red Hat Enterprise Virtualization Manager Reporter: Lukas Svaty <lsvaty>
Component: vdsmAssignee: Mooli Tayer <mtayer>
Status: CLOSED CURRENTRELEASE QA Contact: Lukas Svaty <lsvaty>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.4.1-1CC: bazulay, danken, dwalsh, ecohen, gklein, iheim, lpeer, lsurette, lsvaty, mgrepl, michal.skrivanek, oourfali, pstehlik, sherold, ybronhei, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.5.0   
Hardware: ppc64   
OS: Linux   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-26 12:00:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
host deploy log none

Description Lukas Svaty 2014-10-13 13:34:30 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. service vdsmd start
2.
3.

Actual results:
Host deployment fails on start of vdsm

service vdsmd start:
# service vdsmd start
Redirecting to /bin/systemctl start  vdsmd.service
Job for vdsmd.service failed. See 'systemctl status vdsmd.service' and 'journalctl -xn' for details.

# service vdsmd status
Redirecting to /bin/systemctl status  vdsmd.service
vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
   Active: activating (start-pre) since Mon 2014-10-13 09:32:27 EDT; 7s ago
  Process: 79742 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
  Control: 81164 (vdsmd_init_comm)
   CGroup: name=systemd:/system/vdsmd.service
           └─control
             ├─81164 /bin/sh /usr/libexec/vdsm/vdsmd_init_common.sh --pre-start
             └─81215 /usr/bin/python /usr/bin/vdsm-tool nwfilter

Oct 13 09:32:34 ibm-p8-rhevm-04.rhts.eng.bos.redhat.com vdsmd_init_common.sh[81164]: libvirt: XML-RPC error : authentication failed: authentication failed


Expected results:
Deployed successfully

Additional info:
libvirtd log spammed by this:
2014-10-13 13:30:38.077+0000: 69130: debug : virIdentitySetAttr:282 : ident=0x3fff6c000b80 attribute=0 value=root
2014-10-13 13:30:38.077+0000: 69130: debug : virIdentitySetAttr:282 : ident=0x3fff6c000b80 attribute=1 value=0
2014-10-13 13:30:38.077+0000: 69130: debug : virIdentitySetAttr:282 : ident=0x3fff6c000b80 attribute=2 value=root
2014-10-13 13:30:38.077+0000: 69130: debug : virIdentitySetAttr:282 : ident=0x3fff6c000b80 attribute=3 value=0
2014-10-13 13:30:38.077+0000: 69130: debug : virIdentitySetAttr:282 : ident=0x3fff6c000b80 attribute=4 value=80575
2014-10-13 13:30:38.077+0000: 69130: debug : virIdentitySetAttr:282 : ident=0x3fff6c000b80 attribute=5 value=140013
2014-10-13 13:30:38.077+0000: 69130: debug : virIdentitySetAttr:282 : ident=0x3fff6c000b80 attribute=8 value=system_u:system_r:initrc_t:s0
2014-10-13 13:30:38.083+0000: 69127: error : virNetSASLSessionServerStep:578 : authentication failed: Failed to start SASL negotiation: -20 (SASL(-13): user not found: unable to canonify user and get auxprops)
2014-10-13 13:30:38.083+0000: 69127: error : remoteDispatchAuthSaslStep:2679 : authentication failed: authentication failed
2014-10-13 13:30:38.084+0000: 68896: error : virNetSocketReadWire:1409 : End of file while reading data: Input/output error

Comment 1 Lukas Svaty 2014-10-13 14:23:53 UTC
trying to reinstall vdsm:

Running transaction
  Installing : vdsm-4.14.17-1.pkvm2_1.ppc64                                                                                                                                                                    1/1 

Checking configuration status...


Running configure...

Done configuring modules to VDSM.
Traceback (most recent call last):
  File "/usr/bin/vdsm-tool", line 145, in <module>
    sys.exit(main())
  File "/usr/bin/vdsm-tool", line 142, in main
    return tool_command[cmd]["command"](*args[1:])
  File "/usr/lib64/python2.7/site-packages/vdsm/tool/seboolsetup.py", line 59, in sebool_config
    setup_booleans(True)
  File "/usr/lib64/python2.7/site-packages/vdsm/tool/seboolsetup.py", line 41, in setup_booleans
    sebool_obj = seobject.booleanRecords()
  File "/usr/lib64/python2.7/site-packages/seobject.py", line 2015, in __init__
    semanageRecords.__init__(self, store)
  File "/usr/lib64/python2.7/site-packages/seobject.py", line 175, in __init__
    self.sh = self.get_handle(store)
  File "/usr/lib64/python2.7/site-packages/seobject.py", line 202, in get_handle
    raise ValueError(_("SELinux policy is not managed or store cannot be accessed."))
ValueError: SELinux policy is not managed or store cannot be accessed.
/usr/sbin/semanage: SELinux policy is not managed or store cannot be accessed.
  Verifying  : vdsm-4.14.17-1.pkvm2_1.ppc64                                                                                                                                                                    1/1 

Installed:
  vdsm.ppc64 0:4.14.17-1.pkvm2_1

Comment 2 Michal Skrivanek 2014-10-13 14:40:20 UTC
is this a clean install?
what's the status of kimchi?
what's in libvirt credentials (/etc/libvirt/auth.conf) configuration?

Comment 3 Lukas Svaty 2014-10-13 15:00:19 UTC
1. clean installation

3. # /etc/libvirt/auth.conf
cat: /etc/libvirt/auth.conf: No such file or directory

2. # service kimchi status:
kimchid.service - Kimchi server
   Loaded: loaded (/usr/lib/systemd/system/kimchid.service; enabled)
   Active: failed (Result: exit-code) since Mon 2014-10-13 10:13:30 EDT; 41min ago
 Main PID: 113660 (code=exited, status=1/FAILURE)
   CGroup: name=systemd:/system/kimchid.service

Oct 13 10:13:20 `hostname` systemd[1]: Started Kimchi server.
Oct 13 10:13:22 `hostname` python[113660]: No worthy mechs found
Oct 13 10:13:22 `hostname` kimchid[113660]: libvirt: XML-RPC error : authentication failed: Failed to start SASL negotiation: -4 (SASL(-4): no mechanism available: No worthy mechs found)
Oct 13 10:13:22 `hostname` kimchid[113660]: Unable to connect to libvirt.
Oct 13 10:13:24 `hostname` python[113660]: No worthy mechs found
Oct 13 10:13:24 `hostname` kimchid[113660]: libvirt: XML-RPC error : authentication failed: Failed to start SASL negotiation: -4 (SASL(-4): no mechanism available: No worthy mechs found)
Oct 13 10:13:24 `hostname` kimchid[113660]: Unable to connect to libvirt.
Oct 13 10:13:26 `hostname` python[113660]: No worthy mechs found
Oct 13 10:13:26 `hostname` kimchid[113660]: libvirt: XML-RPC error : authentication failed: Failed to start SASL negotiation: -4 (SASL(-4): no mechanism available: No worthy mechs found)
Oct 13 10:13:26 `hostname` kimchid[113660]: Unable to connect to libvirt.
Oct 13 10:13:28 `hostname` python[113660]: No worthy mechs found
Oct 13 10:13:28 `hostname` kimchid[113660]: libvirt: XML-RPC error : authentication failed: Failed to start SASL negotiation: -4 (SASL(-4): no mechanism available: No worthy mechs found)
Oct 13 10:13:28 `hostname` kimchid[113660]: Unable to connect to libvirt.
Oct 13 10:13:30 `hostname` python[113660]: No worthy mechs found
Oct 13 10:13:30 `hostname` kimchid[113660]: libvirt: XML-RPC error : authentication failed: Failed to start SASL negotiation: -4 (SASL(-4): no mechanism available: No worthy mechs found)
Oct 13 10:13:30 `hostname` kimchid[113660]: Unable to connect to libvirt.
Oct 13 10:13:30 `hostname` kimchid[113660]: Libvirt is not available, exiting.
Oct 13 10:13:30 `hostname` kimchid[113660]: [13/Oct/2014:10:13:30] ENGINE Bus STOPPING
Oct 13 10:13:30 `hostname` kimchid[113660]: [13/Oct/2014:10:13:30] ENGINE HTTP Server None already shut down
Oct 13 10:13:30 `hostname` kimchid[113660]: [13/Oct/2014:10:13:30] ENGINE No thread running for None.
Oct 13 10:13:30 `hostname` kimchid[113660]: [13/Oct/2014:10:13:30] ENGINE No thread running for None.
Oct 13 10:13:30 `hostname` kimchid[113660]: [13/Oct/2014:10:13:30] ENGINE Bus STOPPED
Oct 13 10:13:30 `hostname` kimchid[113660]: Traceback (most recent call last):
Oct 13 10:13:30 `hostname` kimchid[113660]: File "/usr/bin/kimchid", line 83, in <module>
Oct 13 10:13:30 `hostname` kimchid[113660]: sys.exit(main(sys.argv[1:]))
Oct 13 10:13:30 `hostname` kimchid[113660]: File "/usr/bin/kimchid", line 80, in main
Oct 13 10:13:30 `hostname` kimchid[113660]: kimchi.server.main(options)
Oct 13 10:13:30 `hostname` kimchid[113660]: File "/usr/lib/python2.7/site-packages/kimchi/server.py", line 175, in main
Oct 13 10:13:30 `hostname` kimchid[113660]: srv = Server(options)
Oct 13 10:13:30 `hostname` kimchid[113660]: File "/usr/lib/python2.7/site-packages/kimchi/server.py", line 121, in __init__
Oct 13 10:13:30 `hostname` kimchid[113660]: model_instance = model.Model()
Oct 13 10:13:30 `hostname` kimchid[113660]: File "/usr/lib/python2.7/site-packages/kimchi/model/model.py", line 41, in __init__
Oct 13 10:13:30 `hostname` kimchid[113660]: self._default_pool_check()
Oct 13 10:13:30 `hostname` kimchid[113660]: File "/usr/lib/python2.7/site-packages/kimchi/model/model.py", line 61, in _default_pool_check
Oct 13 10:13:30 `hostname` kimchid[113660]: conn = self.conn.get()
Oct 13 10:13:30 `hostname` kimchid[113660]: File "/usr/lib/python2.7/site-packages/kimchi/model/libvirtconnection.py", line 91, in get
Oct 13 10:13:30 `hostname` kimchid[113660]: conn = libvirt.open(self.uri)
Oct 13 10:13:30 `hostname` kimchid[113660]: File "/usr/lib64/python2.7/site-packages/libvirt.py", line 247, in open
Oct 13 10:13:30 `hostname` kimchid[113660]: if ret is None:raise libvirtError('virConnectOpen() failed')
Oct 13 10:13:30 `hostname` kimchid[113660]: libvirt.libvirtError: authentication failed: Failed to start SASL negotiation: -4 (SASL(-4): no mechanism available: No worthy mechs found)
Oct 13 10:13:30 `hostname` systemd[1]: kimchid.service: main process exited, code=exited, status=1/FAILURE
Oct 13 10:13:30 `hostname` systemd[1]: kimchid.service: control process exited, code=exited status=1
Oct 13 10:13:30 `hostname` systemd[1]: Unit kimchid.service entered failed state.

Comment 4 Lukas Svaty 2014-10-13 15:14:06 UTC
Created attachment 946453 [details]
host deploy log

Adding host deploy log of engine.

Comment 5 Lukas Svaty 2014-10-13 15:34:56 UTC
I was able to successfully add this host to engine by this steps:

1. Remove from engine.
2. Reinstall vdsm (with same traceback as in comment#1)
3. Deploy host to engine

VDSM seems to be working now.

Comment 6 Yaniv Bronhaim 2014-10-14 12:49:14 UTC
is it happened after every deploy? if you would run manually "vdsm-tool sebool-config" we would see why it failed
the core problem is 

ValueError: SELinux policy is not managed or store cannot be accessed.
/usr/sbin/semanage: SELinux policy is not managed or store cannot be accessed.

and i guess its due to policycoreutils-python bug, so can you tell us the output of "rpm -q policycoreutils-python" on that machine?

Comment 8 Michal Skrivanek 2014-10-14 12:54:19 UTC
(In reply to Lukas Svaty from comment #5)
> I was able to successfully add this host to engine by this steps:
> 
> 1. Remove from engine.
> 2. Reinstall vdsm (with same traceback as in comment#1)
> 3. Deploy host to engine
> 
> VDSM seems to be working now.

so, IOW if you deploy twice it works?

Comment 9 Lukas Svaty 2014-10-14 13:10:19 UTC
yes, it seems after second deployment it works as desired

what would be the steps for reproduction of this

yum remove vdsm*
yum remove libvirt*
yum remove qemu*

anything else?

Comment 10 Lukas Svaty 2014-10-14 13:12:59 UTC
# rpm -q policycoreutils-python
policycoreutils-python-2.1.14-46.6.pkvm2_1.4.ppc64

should not be problem of this package, other host with same package succeeded during deployment

Comment 11 Oved Ourfali 2014-10-14 13:21:45 UTC
So it happens only on this specific host?

Comment 12 Lukas Svaty 2014-10-14 13:26:11 UTC
it seems like it, I was able to reproduce it only first time
after vdsm-tool configure and restart of vdsm and libvirt  I was able to deploy this host

I can provide the host if necessary

Comment 13 Michal Skrivanek 2014-10-14 13:41:03 UTC
I think "vdsm-tool configure" is a sufficient workaround. Still need to find root cause…

Lukas, what's the engine version?

Comment 14 Lukas Svaty 2014-10-14 14:11:44 UTC
Host was deployed on engine from av12.1

Comment 15 Mooli Tayer 2014-10-15 15:05:09 UTC
My current 5 cents:

1. The policycoreutils-python missing attribute was found & fixed  on 2.0.83 so I believe you are right and it is unrelated.

2. Maybe related to the reason why at spec we swallow vdsm-tool sebool-config
errors, another policycoreutils issue, see[1][2].
That one happenes when selinux is off, is selinux off?

This should tell:
$ sudo python
...
>>> import seobject
>>> seobject.booleanRecords().modify('virt_use_samba','on')

if this succeeds:
sudo setenforce 0 

and try again, just in case it is now on and was off when bug came
down upon this earch


Most important if not:
Will this reproduce each time we have a fresh install?
we met symptoms like this before(e.g [3]) where vdsm fails on fresh,
then after vdsmd/second vdsm-tool X run works.
Only this will tell us if this is a real bug.

tl;dr:
1.) suspected: policycoreutils, build is unavailable to me,
but we have a tell (above)
    
2.) does this reproduce freshly? if so do not fix and let's examine.

Email me if you need anything, I'm on holiday be will try to respond when
I can.

ps what os release do you runs on ppc?

[1] http://gerrit.ovirt.org/#/c/9840/
[2] https://bugzilla.redhat.com/show_bug.cgi?id=883355
although the error is different and policycoreutils is 2.1.12 and not 2.1.14, but this was discovered in 2.1.13 so it might be on 2.1.14 as well who knows.
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1127877

Comment 17 Mooli Tayer 2014-10-15 15:18:32 UTC
sadly the selinux vdsm-tool might not be the cause issue;

Since we ignore errors when rpm runs vdsm-tool,
and it happens when it needs not do anything (selinux disabled).
That's why you saw this when running vdsm-tool manually.

@Yaniv
This makes me think that we need to change flow at configurator
(since this is planned to go into configurator soon).

try catch, and run all configurators (except for dependent ones).
if error happend throw at the end.

also we should run at spec same why like user will run.

Comment 18 Lukas Svaty 2014-10-15 15:19:16 UTC
>>> import seobject
ValueError: SELinux policy is not managed or store cannot be accessed.
>>> seobject.booleanRecords().modify('virt_use_samba','on')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/site-packages/seobject.py", line 2015, in __init__
    semanageRecords.__init__(self, store)
  File "/usr/lib64/python2.7/site-packages/seobject.py", line 175, in __init__
    self.sh = self.get_handle(store)
  File "/usr/lib64/python2.7/site-packages/seobject.py", line 202, in get_handle
    raise ValueError(_("SELinux policy is not managed or store cannot be accessed."))
ValueError: SELinux policy is not managed or store cannot be accessed.

[~]# cat /etc/*release
RHEV for IBM POWER release 3.4 build 35 service (pkvm2_1)

I can provide the host for questioning. Once we'll try clean installation and try to reproduce the bug we might loose some data.

Comment 20 Lukas Svaty 2014-10-15 15:20:41 UTC
re-adding needinfo from comment#17

Comment 21 Mooli Tayer 2014-10-15 15:28:42 UTC
I think this isn't the root issue, see comment 17.

Seems we are back to square one so we need second option.

We should nofity people at selinux about this regardless
and open them a bz (afaiu from [1] at comment 15 it should  not fail
if no selinux).

Comment 22 Mooli Tayer 2014-10-19 11:50:25 UTC
I've been pinged about the status of this,
are we ready to try and reproduce?

Comment 23 Mooli Tayer 2014-10-21 08:15:04 UTC
also selinux on the machine is disabled right?

Comment 24 Lukas Svaty 2014-10-21 09:48:15 UTC
tried with Enforcing and Permissive not disabled

Comment 25 Lukas Svaty 2014-10-21 09:51:02 UTC
re-adding needinfo for ybronhei from comment#17

Comment 28 Lukas Svaty 2014-10-22 15:17:28 UTC
Tried on fresh installation today of newest image. Deployment of the host to engine successfully no issue with vdsm/selinux nor anything related.

I would suggest CLOSING as vdsm part is fixed in current release. If I'll be able to reproduce this in the future I can reopen.

However we should close after we get opinion from selinux guys on this topic.

Comment 29 Mooli Tayer 2014-10-22 15:33:00 UTC
Hello dwalsh, 

is the following a bug?
[root@ibm-p8-rhevm-hv-01 ~]# sestatus                                                  |
SELinux status:                 disabled                                               |
[root@ibm-p8-rhevm-hv-01 ~]# python -c "import seobject;seobject.booleanRecords().modif
y('virt_use_samba','on')"                                                              |
SELinux:  Could not downgrade policy file /etc/selinux/targeted/policy/policy.29, searc|
hing for an older version.                                                             |
SELinux:  Could not open policy file <= /etc/selinux/targeted/policy/policy.29:  No suc|
h file or directory                                                                    
/sbin/load_policy:  Can't load policy:  No such file or directory                      
libsemanage.semanage_reload_policy: load_policy returned error code 2.                 
SELinux:  Could not downgrade policy file /etc/selinux/targeted/policy/policy.29, searc
hing for an older version.                                                             
SELinux:  Could not open policy file <= /etc/selinux/targeted/policy/policy.29:  No suc
h file or directory                                                                    
/sbin/load_policy:  Can't load policy:  No such file or directory
libsemanage.semanage_reload_policy: load_policy returned error code 2.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib64/python2.7/site-packages/seobject.py", line 2093, in modify
    self.commit()
  File "/usr/lib64/python2.7/site-packages/seobject.py", line 248, in commit
    raise ValueError(_("Could not commit semanage transaction"))
ValueError: Could not commit semanage transaction

Comment 30 Mooli Tayer 2014-10-22 15:35:36 UTC
see comment 29

Comment 31 Mooli Tayer 2014-10-22 15:43:15 UTC
I'm pretty sure it's not, asking because of the simillar:

https://bugzilla.redhat.com/show_bug.cgi?id=883355

Comment 32 Miroslav Grepl 2014-10-24 09:17:58 UTC
This kind of these errors happen on a system with disabled SELinux. The point is the policycoreutils is not crashing. I thought you had been catching it in vdsm to not show it?

Comment 33 Mooli Tayer 2014-10-26 12:00:53 UTC
I agree it's not a crash but a well placed exception
(semanage boolean --on 'virt_use_samba' will also return failure).

Currently we print a warning and ignore the error.

I will suggest upstream that vdsm also stops printing the warning.

Thanks mgrepl.

We can now close this one.