Description of problem: On a RHEL 6.4 system that is a RHEV host with either 'vdsm-4.10.2-1.6' or 'vdsm-4.10.2-11.0' installed if '/etc/rsyslog.conf' contains some invalid entries then 'vdsm' will fail to start. For example, if line 8 of '/etc/rsyslog.conf' is modified as follows (the leading "$ModLoad" was removed) and the rsyslog service is restarted, then the next time that vdsmd is started it will fail to start; 1 # rsyslog v5 configuration file 2 3 # For more information see /usr/share/doc/rsyslog-*/rsyslog_conf.html 4 # If you experience problems, see http://www.rsyslog.com/doc/troubleshoot.html 5 6 #### MODULES #### 7 8 imuxsock # provides support for local system logging (e.g. via logger command) 9 $ModLoad imklog # provides kernel logging support (previously done by rklogd) This was seen on a customer's system. We don't know yet how or why this happened, but it did. They upgraded to RHEL 6.4 with 'vdsm-4.10.2-1.6'. When they tried to register this system as a RHEV host, on the subsequent reboot 'vdsm' failed to start. I have reproduced this here myself. I also tried 'vdsm-4.10.2-11.0', with the same result. However, on a RHEL 6.3 system with 'vdsm-4.9.6-45.2' I cannot recreate this condition. Version-Release number of selected component (if applicable): vdsm-4.10.2-1.6 vdsm-4.10.2-11.0 How reproducible: Every time with 'vdsm-4.10.2-1.6' or 'vdsm-4.10.2-11.0'. Steps to Reproduce: 1. Modify '/etc/rsyslog.conf', e.g. remove the leading "$ModLoad" from line 8; 8 imuxsock # provides support for local system logging (e.g. via logger command) 2. Restart rsyslogd (service rsyslog restart). 3. Restart vdsmd (service vdsmd restart). vdsm will not start. Or, as the customer did, instead of 2 and 3 above, reboot. Actual results: After restarting rsyslogd and vdsmd; - vdsm procs; # ps -ef|grep vdsm vdsm 18520 1 0 16:19 ? 00:00:00 /bin/bash -e /usr/share/vdsm/respawn --minlifetime 10 --daemon --masterpid /var/run/vdsm/respawn.pid /usr/share/vdsm/vdsm vdsm 20790 18520 0 16:20 ? 00:00:00 sleep 900 - vdsm.log; # tail -n 33 /var/log/vdsm/vdsm.log Thread-269::DEBUG::2013-03-28 10:59:03,615::resourceManager::538::ResourceManager::(releaseResource) Trying to release resource 'Storage.2c110819-c5fa-423d-a855-569af43b7676' Thread-269::DEBUG::2013-03-28 10:59:03,616::resourceManager::553::ResourceManager::(releaseResource) Released resource 'Storage.2c110819-c5fa-423d-a855-569af43b7676' (0 active users) Thread-269::DEBUG::2013-03-28 10:59:03,616::resourceManager::558::ResourceManager::(releaseResource) Resource 'Storage.2c110819-c5fa-423d-a855-569af43b7676' is free, finding out if anyone is waiting for it. Thread-269::DEBUG::2013-03-28 10:59:03,616::resourceManager::565::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.2c110819-c5fa-423d-a855-569af43b7676', Clearing records. Thread-269::DEBUG::2013-03-28 10:59:03,616::task::957::TaskManager.Task::(_decref) Task=`d375525d-6af1-4fc4-9954-b636d3ac9a25`::ref 0 aborting False MainThread::INFO::2013-03-28 10:59:06,052::vmChannels::135::vds::(stop) VM channels listener was stopped. MainThread::DEBUG::2013-03-28 10:59:06,052::task::568::TaskManager.Task::(_updateState) Task=`ac615ed1-dbe6-464e-85f2-6e45e2b79292`::moving from state init -> state preparing MainThread::INFO::2013-03-28 10:59:06,053::logUtils::37::dispatcher::(wrapper) Run and protect: prepareForShutdown(options=None) Thread-11::DEBUG::2013-03-28 10:59:06,053::storageServer::674::ConnectionMonitor::(_monitorConnections) Monitoring stopped MainThread::WARNING::2013-03-28 10:59:06,113::hsm::3344::Storage.HSM::(__releaseLocks) Found lease locks, releasing Thread-18::DEBUG::2013-03-28 10:59:07,174::domainMonitor::160::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for c1c7fcf2-f7bc-4884-b632-d6c64a30d2c3 Thread-18::DEBUG::2013-03-28 10:59:07,174::clusterlock::85::SafeLease::(releaseHostId) Host id for domain c1c7fcf2-f7bc-4884-b632-d6c64a30d2c3 released successfully (id: 1) MainThread::DEBUG::2013-03-28 10:59:07,174::taskManager::80::TaskManager::(prepareForShutdown) Request to stop all tasks MainThread::INFO::2013-03-28 10:59:07,174::logUtils::39::dispatcher::(wrapper) Run and protect: prepareForShutdown, Return response: None MainThread::DEBUG::2013-03-28 10:59:07,174::task::1151::TaskManager.Task::(prepare) Task=`ac615ed1-dbe6-464e-85f2-6e45e2b79292`::finished: None MainThread::DEBUG::2013-03-28 10:59:07,175::task::568::TaskManager.Task::(_updateState) Task=`ac615ed1-dbe6-464e-85f2-6e45e2b79292`::moving from state preparing -> state finished MainThread::DEBUG::2013-03-28 10:59:07,175::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} MainThread::DEBUG::2013-03-28 10:59:07,175::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} MainThread::DEBUG::2013-03-28 10:59:07,175::task::957::TaskManager.Task::(_decref) Task=`ac615ed1-dbe6-464e-85f2-6e45e2b79292`::ref 0 aborting False MainThread::INFO::2013-03-28 10:59:07,175::vdsm::95::vds::(run) VDSM main thread ended. Waiting for 12 other threads... MainThread::INFO::2013-03-28 10:59:07,176::vdsm::99::vds::(run) <_MainThread(MainThread, started 140544775120640)> MainThread::INFO::2013-03-28 10:59:07,176::vdsm::99::vds::(run) <Thread(libvirtEventLoop, started daemon 140544684652288)> MainThread::INFO::2013-03-28 10:59:07,176::vdsm::99::vds::(run) <WorkerThread(Thread-6, started daemon 140544508409600)> MainThread::INFO::2013-03-28 10:59:07,176::vdsm::99::vds::(run) <WorkerThread(Thread-9, started daemon 140544070117120)> MainThread::INFO::2013-03-28 10:59:07,176::vdsm::99::vds::(run) <WorkerThread(Thread-5, started daemon 140544518899456)> MainThread::INFO::2013-03-28 10:59:07,176::vdsm::99::vds::(run) <WorkerThread(414d32e8-ac0e-44d0-962a-b363426b41ff, started daemon 140544631383808)> MainThread::INFO::2013-03-28 10:59:07,176::vdsm::99::vds::(run) <WorkerThread(Thread-8, started daemon 140544080606976)> MainThread::INFO::2013-03-28 10:59:07,177::vdsm::99::vds::(run) <WorkerThread(Thread-7, started daemon 140544497919744)> MainThread::INFO::2013-03-28 10:59:07,177::vdsm::99::vds::(run) <WorkerThread(Thread-3, started daemon 140544539879168)> MainThread::INFO::2013-03-28 10:59:07,177::vdsm::99::vds::(run) <KsmMonitorThread(KsmMonitor, started daemon 140543610844928)> MainThread::INFO::2013-03-28 10:59:07,177::vdsm::99::vds::(run) <WorkerThread(Thread-2, started daemon 140544550369024)> MainThread::INFO::2013-03-28 10:59:07,177::vdsm::99::vds::(run) <WorkerThread(Thread-10, started daemon 140544059627264)> MainThread::INFO::2013-03-28 10:59:07,177::vdsm::99::vds::(run) <WorkerThread(Thread-4, started daemon 140544529389312)> - '/var/log/messages'; # tail /var/log/messages Mar 27 16:17:50 dhcp-190-228 vdsm Storage.StorageDomain WARNING Resource namespace c1c7fcf2-f7bc-4884-b632-d6c64a30d2c3_imageNS already registered Mar 27 16:17:50 dhcp-190-228 vdsm Storage.StorageDomain WARNING Resource namespace c1c7fcf2-f7bc-4884-b632-d6c64a30d2c3_volumeNS already registered Mar 27 16:18:24 dhcp-190-228 kernel: Kernel logging (proc) stopped. Mar 27 16:18:24 dhcp-190-228 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="16437" x-info="http://www.rsyslog.com"] exiting on signal 15. Mar 27 16:18:24 dhcp-190-228 kernel: imklog 5.8.10, log source = /proc/kmsg started. Mar 27 16:18:24 dhcp-190-228 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="17751" x-info="http://www.rsyslog.com"] start Mar 27 16:18:24 dhcp-190-228 rsyslogd-3000: unknown priority name "g." [try http://www.rsyslog.com/e/3000 ] Mar 27 16:18:24 dhcp-190-228 rsyslogd: the last error occured in /etc/rsyslog.conf, line 8:" imuxsock # provides support for local system logging (e.g. via logger command)" Mar 27 16:18:24 dhcp-190-228 rsyslogd: warning: selector line without actions will be discarded Mar 27 16:18:24 dhcp-190-228 rsyslogd-2124: CONFIG ERROR: could not interpret master config file '/etc/rsyslog.conf'. [try http://www.rsyslog.com/e/2124 ] No further messages get written to the 'messages' file after this, e.g. running 'logger' does not update the file. - RHEV-M; RHEV-M sees the host as "Non Responsive". Expected results: As in previous versions, 'vdsm' is able to start, even if in a degraded mode. If so, some warning should be reported back to RHEV-M. Additional info: Obviously there is a simple avoidance here in that the '/etc/rsyslog.conf' file can be corrected. However, finding that that is the cause is the problem here. 'vdsm' should be more resilient. At the very least there should be some error reported or some clue provided. All we knew in the customer's case was that a host failed to register (the customer's description) or activate successfully. It wasn't obvious for a while where the problem was. If I have omitted anything important, please let me know. Regards, GFW.
1) can you please attach SOS report? 2) what is the status of rsyslogd, is it running? I suspect rsyslogd is not running at all, in that case, vdsm should alert that one of the dependent services is down, and might give a hint to the user.
First I submitted a patch that adds rsyslog service to vdsm dependencies, Second, when loading logger.conf file, logger module tries to open a socket to rsyslog service (that runs even if its configuration is corrupted), due to the configuration failure exception is raised and not handled in vdsm. We can't print this error to vdsm.log or syslog with the current logger configuration. This happens over and over and can't be fixed until rsyslog.conf will be fixed. We can split the logger.conf, and if this exception is raised, we can try load the conf without the use of syslog and print the error to vdsm.log. But IMO it's too messy and unclear for this specific case and require the service is enough.
syslog listens on /dev/log socket and logger tries to open it, if syslog runs with corrupted configuration this socket won't be created. I propose a patch that check this file before starting vdsm process. Both patches together ensure that rsyslog deamon is installed and operates as needed.
verified. vdsm fail to start but it throws a message Cannot communicate with syslog deamon for reports. Check r[FAILED]onf vdsm: one of the dependent services did not start, error co[FAILED]
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0886.html