Created attachment 816998 [details] This is a vdsm log of the affected host. Description of problem: Can't activate a new host in the existing cluster. Version-Release number of selected component (if applicable): VDSM: vdsm-4.10.2-27.0.el6ev.x86_64 How reproducible: Steps to Reproduce: 1. Adding a new host to the existing cluster on manager running 3.2.4. Actual results: The host becomes "Non Operational" Expected results: The state should be "UP". Additional info:
please add engine log for the non-operational reason (regardless, need to make sure the supervdsm errors in the vdsm log are ok)
No need for additional information. The reason vdsm didn't start properly is the sudoer.d configuration. Somehow vdsm config was settled under the appropriate location with the right configuration for vdsm user, but still the sudo operations was prevented (as starting supervdsm process with sudo, running multipath commands, both were terminated). Please inform me if reinstallation of the host helped, if not we need to figure why the sudoer file was not enough
The sudoers file was autogenerated by some tool we use so it missed the #includedir /etc/sudoers.d directive. Once added - the issue got fixed. I'd suggest verifying it exists in the installation.
After some thinking, you're right. we should verify it during pre-start of vdsmd vdsm adds 50_vdsm file under /etc/sudoers.d , without #includedir /etc/sudoers.d on /etc/sudoers , this won't be read and all sudo operations by vdsm user will be rejected. the implications -> multipath commands fail, supervdsmServer cannot start and more and inc. vdsm cannot operate without this line, so it should verify it. please ack the bug.
There could be soooo many configuration files that can be ruined by a local admin or "by some tool we use". We can never expect to check every configuration file in the host meets our expectations. We require a sudoers version that has #includedir by default. If the local admin removes it, he should have a really good reason to do it, and also inlined /etc/sudoers.d/50_vdsm. It's not vdsm's business to force an #includedir upon him. My vote: CLOSE|NOTABUG after verifying that our log are clear, complaining about missing sudo rights.
That's just it. We weren't able to understand what's wrong from the logs. Making it clear that the sudo rights need to be checked is good enough (IMHO).
Verified in 3.4.0-0.7.beta2.el6 [root@slot-7 ~]# grep '\.d' /etc/sudoers ## Read drop-in files from /etc/sudoers.d (the # here does not mean a comment) ##includedir /etc/sudoers.d [root@slot-7 ~]# grep sudoer /var/log/messages | tail -n 3 Feb 18 13:15:08 slot-7 python: vdsm user could not manage to run sudo operation: (stderr: ['sudo: sorry, you must have a tty to run sudo']). Verify sudoer rules configuration Feb 18 13:15:08 slot-7 python: vdsm user could not manage to run sudo operation: (stderr: ['sudo: sorry, you must have a tty to run sudo']). Verify sudoer rules configuration Feb 18 13:15:09 slot-7 python: vdsm user could not manage to run sudo operation: (stderr: ['sudo: sorry, you must have a tty to run sudo']). Verify sudoer rules configuration
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0504.html