Bug 1152137
| Summary: | VDSM fails to start on ppc host due to `user not found: unable to canonify user and get auxprops` error | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Lukas Svaty <lsvaty> | ||||
| Component: | vdsm | Assignee: | Mooli Tayer <mtayer> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Lukas Svaty <lsvaty> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 3.4.1-1 | CC: | bazulay, danken, dwalsh, ecohen, gklein, iheim, lpeer, lsurette, lsvaty, mgrepl, michal.skrivanek, oourfali, pstehlik, sherold, ybronhei, yeylon | ||||
| Target Milestone: | --- | Keywords: | Triaged | ||||
| Target Release: | 3.5.0 | ||||||
| Hardware: | ppc64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | infra | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-10-26 12:00:53 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
trying to reinstall vdsm:
Running transaction
Installing : vdsm-4.14.17-1.pkvm2_1.ppc64 1/1
Checking configuration status...
Running configure...
Done configuring modules to VDSM.
Traceback (most recent call last):
File "/usr/bin/vdsm-tool", line 145, in <module>
sys.exit(main())
File "/usr/bin/vdsm-tool", line 142, in main
return tool_command[cmd]["command"](*args[1:])
File "/usr/lib64/python2.7/site-packages/vdsm/tool/seboolsetup.py", line 59, in sebool_config
setup_booleans(True)
File "/usr/lib64/python2.7/site-packages/vdsm/tool/seboolsetup.py", line 41, in setup_booleans
sebool_obj = seobject.booleanRecords()
File "/usr/lib64/python2.7/site-packages/seobject.py", line 2015, in __init__
semanageRecords.__init__(self, store)
File "/usr/lib64/python2.7/site-packages/seobject.py", line 175, in __init__
self.sh = self.get_handle(store)
File "/usr/lib64/python2.7/site-packages/seobject.py", line 202, in get_handle
raise ValueError(_("SELinux policy is not managed or store cannot be accessed."))
ValueError: SELinux policy is not managed or store cannot be accessed.
/usr/sbin/semanage: SELinux policy is not managed or store cannot be accessed.
Verifying : vdsm-4.14.17-1.pkvm2_1.ppc64 1/1
Installed:
vdsm.ppc64 0:4.14.17-1.pkvm2_1
is this a clean install? what's the status of kimchi? what's in libvirt credentials (/etc/libvirt/auth.conf) configuration? 1. clean installation
3. # /etc/libvirt/auth.conf
cat: /etc/libvirt/auth.conf: No such file or directory
2. # service kimchi status:
kimchid.service - Kimchi server
Loaded: loaded (/usr/lib/systemd/system/kimchid.service; enabled)
Active: failed (Result: exit-code) since Mon 2014-10-13 10:13:30 EDT; 41min ago
Main PID: 113660 (code=exited, status=1/FAILURE)
CGroup: name=systemd:/system/kimchid.service
Oct 13 10:13:20 `hostname` systemd[1]: Started Kimchi server.
Oct 13 10:13:22 `hostname` python[113660]: No worthy mechs found
Oct 13 10:13:22 `hostname` kimchid[113660]: libvirt: XML-RPC error : authentication failed: Failed to start SASL negotiation: -4 (SASL(-4): no mechanism available: No worthy mechs found)
Oct 13 10:13:22 `hostname` kimchid[113660]: Unable to connect to libvirt.
Oct 13 10:13:24 `hostname` python[113660]: No worthy mechs found
Oct 13 10:13:24 `hostname` kimchid[113660]: libvirt: XML-RPC error : authentication failed: Failed to start SASL negotiation: -4 (SASL(-4): no mechanism available: No worthy mechs found)
Oct 13 10:13:24 `hostname` kimchid[113660]: Unable to connect to libvirt.
Oct 13 10:13:26 `hostname` python[113660]: No worthy mechs found
Oct 13 10:13:26 `hostname` kimchid[113660]: libvirt: XML-RPC error : authentication failed: Failed to start SASL negotiation: -4 (SASL(-4): no mechanism available: No worthy mechs found)
Oct 13 10:13:26 `hostname` kimchid[113660]: Unable to connect to libvirt.
Oct 13 10:13:28 `hostname` python[113660]: No worthy mechs found
Oct 13 10:13:28 `hostname` kimchid[113660]: libvirt: XML-RPC error : authentication failed: Failed to start SASL negotiation: -4 (SASL(-4): no mechanism available: No worthy mechs found)
Oct 13 10:13:28 `hostname` kimchid[113660]: Unable to connect to libvirt.
Oct 13 10:13:30 `hostname` python[113660]: No worthy mechs found
Oct 13 10:13:30 `hostname` kimchid[113660]: libvirt: XML-RPC error : authentication failed: Failed to start SASL negotiation: -4 (SASL(-4): no mechanism available: No worthy mechs found)
Oct 13 10:13:30 `hostname` kimchid[113660]: Unable to connect to libvirt.
Oct 13 10:13:30 `hostname` kimchid[113660]: Libvirt is not available, exiting.
Oct 13 10:13:30 `hostname` kimchid[113660]: [13/Oct/2014:10:13:30] ENGINE Bus STOPPING
Oct 13 10:13:30 `hostname` kimchid[113660]: [13/Oct/2014:10:13:30] ENGINE HTTP Server None already shut down
Oct 13 10:13:30 `hostname` kimchid[113660]: [13/Oct/2014:10:13:30] ENGINE No thread running for None.
Oct 13 10:13:30 `hostname` kimchid[113660]: [13/Oct/2014:10:13:30] ENGINE No thread running for None.
Oct 13 10:13:30 `hostname` kimchid[113660]: [13/Oct/2014:10:13:30] ENGINE Bus STOPPED
Oct 13 10:13:30 `hostname` kimchid[113660]: Traceback (most recent call last):
Oct 13 10:13:30 `hostname` kimchid[113660]: File "/usr/bin/kimchid", line 83, in <module>
Oct 13 10:13:30 `hostname` kimchid[113660]: sys.exit(main(sys.argv[1:]))
Oct 13 10:13:30 `hostname` kimchid[113660]: File "/usr/bin/kimchid", line 80, in main
Oct 13 10:13:30 `hostname` kimchid[113660]: kimchi.server.main(options)
Oct 13 10:13:30 `hostname` kimchid[113660]: File "/usr/lib/python2.7/site-packages/kimchi/server.py", line 175, in main
Oct 13 10:13:30 `hostname` kimchid[113660]: srv = Server(options)
Oct 13 10:13:30 `hostname` kimchid[113660]: File "/usr/lib/python2.7/site-packages/kimchi/server.py", line 121, in __init__
Oct 13 10:13:30 `hostname` kimchid[113660]: model_instance = model.Model()
Oct 13 10:13:30 `hostname` kimchid[113660]: File "/usr/lib/python2.7/site-packages/kimchi/model/model.py", line 41, in __init__
Oct 13 10:13:30 `hostname` kimchid[113660]: self._default_pool_check()
Oct 13 10:13:30 `hostname` kimchid[113660]: File "/usr/lib/python2.7/site-packages/kimchi/model/model.py", line 61, in _default_pool_check
Oct 13 10:13:30 `hostname` kimchid[113660]: conn = self.conn.get()
Oct 13 10:13:30 `hostname` kimchid[113660]: File "/usr/lib/python2.7/site-packages/kimchi/model/libvirtconnection.py", line 91, in get
Oct 13 10:13:30 `hostname` kimchid[113660]: conn = libvirt.open(self.uri)
Oct 13 10:13:30 `hostname` kimchid[113660]: File "/usr/lib64/python2.7/site-packages/libvirt.py", line 247, in open
Oct 13 10:13:30 `hostname` kimchid[113660]: if ret is None:raise libvirtError('virConnectOpen() failed')
Oct 13 10:13:30 `hostname` kimchid[113660]: libvirt.libvirtError: authentication failed: Failed to start SASL negotiation: -4 (SASL(-4): no mechanism available: No worthy mechs found)
Oct 13 10:13:30 `hostname` systemd[1]: kimchid.service: main process exited, code=exited, status=1/FAILURE
Oct 13 10:13:30 `hostname` systemd[1]: kimchid.service: control process exited, code=exited status=1
Oct 13 10:13:30 `hostname` systemd[1]: Unit kimchid.service entered failed state.
Created attachment 946453 [details]
host deploy log
Adding host deploy log of engine.
I was able to successfully add this host to engine by this steps: 1. Remove from engine. 2. Reinstall vdsm (with same traceback as in comment#1) 3. Deploy host to engine VDSM seems to be working now. is it happened after every deploy? if you would run manually "vdsm-tool sebool-config" we would see why it failed the core problem is ValueError: SELinux policy is not managed or store cannot be accessed. /usr/sbin/semanage: SELinux policy is not managed or store cannot be accessed. and i guess its due to policycoreutils-python bug, so can you tell us the output of "rpm -q policycoreutils-python" on that machine? (In reply to Lukas Svaty from comment #5) > I was able to successfully add this host to engine by this steps: > > 1. Remove from engine. > 2. Reinstall vdsm (with same traceback as in comment#1) > 3. Deploy host to engine > > VDSM seems to be working now. so, IOW if you deploy twice it works? yes, it seems after second deployment it works as desired what would be the steps for reproduction of this yum remove vdsm* yum remove libvirt* yum remove qemu* anything else? # rpm -q policycoreutils-python policycoreutils-python-2.1.14-46.6.pkvm2_1.4.ppc64 should not be problem of this package, other host with same package succeeded during deployment So it happens only on this specific host? it seems like it, I was able to reproduce it only first time after vdsm-tool configure and restart of vdsm and libvirt I was able to deploy this host I can provide the host if necessary I think "vdsm-tool configure" is a sufficient workaround. Still need to find root cause… Lukas, what's the engine version? Host was deployed on engine from av12.1 My current 5 cents: 1. The policycoreutils-python missing attribute was found & fixed on 2.0.83 so I believe you are right and it is unrelated. 2. Maybe related to the reason why at spec we swallow vdsm-tool sebool-config errors, another policycoreutils issue, see[1][2]. That one happenes when selinux is off, is selinux off? This should tell: $ sudo python ... >>> import seobject >>> seobject.booleanRecords().modify('virt_use_samba','on') if this succeeds: sudo setenforce 0 and try again, just in case it is now on and was off when bug came down upon this earch Most important if not: Will this reproduce each time we have a fresh install? we met symptoms like this before(e.g [3]) where vdsm fails on fresh, then after vdsmd/second vdsm-tool X run works. Only this will tell us if this is a real bug. tl;dr: 1.) suspected: policycoreutils, build is unavailable to me, but we have a tell (above) 2.) does this reproduce freshly? if so do not fix and let's examine. Email me if you need anything, I'm on holiday be will try to respond when I can. ps what os release do you runs on ppc? [1] http://gerrit.ovirt.org/#/c/9840/ [2] https://bugzilla.redhat.com/show_bug.cgi?id=883355 although the error is different and policycoreutils is 2.1.12 and not 2.1.14, but this was discovered in 2.1.13 so it might be on 2.1.14 as well who knows. [3] https://bugzilla.redhat.com/show_bug.cgi?id=1127877 sadly the selinux vdsm-tool might not be the cause issue; Since we ignore errors when rpm runs vdsm-tool, and it happens when it needs not do anything (selinux disabled). That's why you saw this when running vdsm-tool manually. @Yaniv This makes me think that we need to change flow at configurator (since this is planned to go into configurator soon). try catch, and run all configurators (except for dependent ones). if error happend throw at the end. also we should run at spec same why like user will run. >>> import seobject ValueError: SELinux policy is not managed or store cannot be accessed. >>> seobject.booleanRecords().modify('virt_use_samba','on') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.7/site-packages/seobject.py", line 2015, in __init__ semanageRecords.__init__(self, store) File "/usr/lib64/python2.7/site-packages/seobject.py", line 175, in __init__ self.sh = self.get_handle(store) File "/usr/lib64/python2.7/site-packages/seobject.py", line 202, in get_handle raise ValueError(_("SELinux policy is not managed or store cannot be accessed.")) ValueError: SELinux policy is not managed or store cannot be accessed. [~]# cat /etc/*release RHEV for IBM POWER release 3.4 build 35 service (pkvm2_1) I can provide the host for questioning. Once we'll try clean installation and try to reproduce the bug we might loose some data. re-adding needinfo from comment#17 I think this isn't the root issue, see comment 17. Seems we are back to square one so we need second option. We should nofity people at selinux about this regardless and open them a bz (afaiu from [1] at comment 15 it should not fail if no selinux). I've been pinged about the status of this, are we ready to try and reproduce? also selinux on the machine is disabled right? tried with Enforcing and Permissive not disabled re-adding needinfo for ybronhei from comment#17 Tried on fresh installation today of newest image. Deployment of the host to engine successfully no issue with vdsm/selinux nor anything related. I would suggest CLOSING as vdsm part is fixed in current release. If I'll be able to reproduce this in the future I can reopen. However we should close after we get opinion from selinux guys on this topic. Hello dwalsh,
is the following a bug?
[root@ibm-p8-rhevm-hv-01 ~]# sestatus |
SELinux status: disabled |
[root@ibm-p8-rhevm-hv-01 ~]# python -c "import seobject;seobject.booleanRecords().modif
y('virt_use_samba','on')" |
SELinux: Could not downgrade policy file /etc/selinux/targeted/policy/policy.29, searc|
hing for an older version. |
SELinux: Could not open policy file <= /etc/selinux/targeted/policy/policy.29: No suc|
h file or directory
/sbin/load_policy: Can't load policy: No such file or directory
libsemanage.semanage_reload_policy: load_policy returned error code 2.
SELinux: Could not downgrade policy file /etc/selinux/targeted/policy/policy.29, searc
hing for an older version.
SELinux: Could not open policy file <= /etc/selinux/targeted/policy/policy.29: No suc
h file or directory
/sbin/load_policy: Can't load policy: No such file or directory
libsemanage.semanage_reload_policy: load_policy returned error code 2.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib64/python2.7/site-packages/seobject.py", line 2093, in modify
self.commit()
File "/usr/lib64/python2.7/site-packages/seobject.py", line 248, in commit
raise ValueError(_("Could not commit semanage transaction"))
ValueError: Could not commit semanage transaction
see comment 29 I'm pretty sure it's not, asking because of the simillar: https://bugzilla.redhat.com/show_bug.cgi?id=883355 This kind of these errors happen on a system with disabled SELinux. The point is the policycoreutils is not crashing. I thought you had been catching it in vdsm to not show it? I agree it's not a crash but a well placed exception (semanage boolean --on 'virt_use_samba' will also return failure). Currently we print a warning and ignore the error. I will suggest upstream that vdsm also stops printing the warning. Thanks mgrepl. We can now close this one. |
Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. service vdsmd start 2. 3. Actual results: Host deployment fails on start of vdsm service vdsmd start: # service vdsmd start Redirecting to /bin/systemctl start vdsmd.service Job for vdsmd.service failed. See 'systemctl status vdsmd.service' and 'journalctl -xn' for details. # service vdsmd status Redirecting to /bin/systemctl status vdsmd.service vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: activating (start-pre) since Mon 2014-10-13 09:32:27 EDT; 7s ago Process: 79742 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS) Control: 81164 (vdsmd_init_comm) CGroup: name=systemd:/system/vdsmd.service └─control ├─81164 /bin/sh /usr/libexec/vdsm/vdsmd_init_common.sh --pre-start └─81215 /usr/bin/python /usr/bin/vdsm-tool nwfilter Oct 13 09:32:34 ibm-p8-rhevm-04.rhts.eng.bos.redhat.com vdsmd_init_common.sh[81164]: libvirt: XML-RPC error : authentication failed: authentication failed Expected results: Deployed successfully Additional info: libvirtd log spammed by this: 2014-10-13 13:30:38.077+0000: 69130: debug : virIdentitySetAttr:282 : ident=0x3fff6c000b80 attribute=0 value=root 2014-10-13 13:30:38.077+0000: 69130: debug : virIdentitySetAttr:282 : ident=0x3fff6c000b80 attribute=1 value=0 2014-10-13 13:30:38.077+0000: 69130: debug : virIdentitySetAttr:282 : ident=0x3fff6c000b80 attribute=2 value=root 2014-10-13 13:30:38.077+0000: 69130: debug : virIdentitySetAttr:282 : ident=0x3fff6c000b80 attribute=3 value=0 2014-10-13 13:30:38.077+0000: 69130: debug : virIdentitySetAttr:282 : ident=0x3fff6c000b80 attribute=4 value=80575 2014-10-13 13:30:38.077+0000: 69130: debug : virIdentitySetAttr:282 : ident=0x3fff6c000b80 attribute=5 value=140013 2014-10-13 13:30:38.077+0000: 69130: debug : virIdentitySetAttr:282 : ident=0x3fff6c000b80 attribute=8 value=system_u:system_r:initrc_t:s0 2014-10-13 13:30:38.083+0000: 69127: error : virNetSASLSessionServerStep:578 : authentication failed: Failed to start SASL negotiation: -20 (SASL(-13): user not found: unable to canonify user and get auxprops) 2014-10-13 13:30:38.083+0000: 69127: error : remoteDispatchAuthSaslStep:2679 : authentication failed: authentication failed 2014-10-13 13:30:38.084+0000: 68896: error : virNetSocketReadWire:1409 : End of file while reading data: Input/output error