Bug 928861 - VDSM will fail to start if rsyslogd's configuration is invalid.
Summary: VDSM will fail to start if rsyslogd's configuration is invalid.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.1.2
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 3.2.0
Assignee: Yaniv Bronhaim
QA Contact: Tareq Alayan
URL:
Whiteboard: infra
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-03-28 15:31 UTC by Gordon Watson
Modified: 2022-07-09 05:58 UTC (History)
10 users (show)

Fixed In Version: vdsm-4.10.2-16.0.el6ev
Doc Type: Bug Fix
Doc Text:
Previously, VDSM did not start properly due to an exception when trying to set logger syslog configuration. Part of the setting was connecting to /dev/log, but that didn't exist when a corruption existed in the syslog.conf file. Now, during startup of the rsyslogd service, rsyslogd verifies that the /dev/log socket exists and is accessible. If the /dev/log socket does not exist, "vdsmd service start" fails.
Clone Of:
Environment:
Last Closed: 2013-06-10 20:47:23 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-47080 0 None None None 2022-07-09 05:58:54 UTC
Red Hat Knowledge Base (Solution) 340283 0 None None None Never
Red Hat Product Errata RHSA-2013:0886 0 normal SHIPPED_LIVE Moderate: rhev 3.2 - vdsm security and bug fix update 2013-06-11 00:25:02 UTC
oVirt gerrit 13509 0 None None None Never
oVirt gerrit 13547 0 None None None Never

Description Gordon Watson 2013-03-28 15:31:23 UTC
Description of problem:

On a RHEL 6.4 system that is a RHEV host with either 'vdsm-4.10.2-1.6' or 'vdsm-4.10.2-11.0' installed if '/etc/rsyslog.conf' contains some invalid entries then 'vdsm' will fail to start.

For example, if line 8 of '/etc/rsyslog.conf' is modified as follows (the leading "$ModLoad" was removed) and the rsyslog service is restarted, then the next time that vdsmd is started it will fail to start;

      1 # rsyslog v5 configuration file
      2 
      3 # For more information see /usr/share/doc/rsyslog-*/rsyslog_conf.html
      4 # If you experience problems, see http://www.rsyslog.com/doc/troubleshoot.html
      5 
      6 #### MODULES ####
      7 
      8  imuxsock # provides support for local system logging (e.g. via logger command)
      9 $ModLoad imklog   # provides kernel logging support (previously done by rklogd)


This was seen on a customer's system. We don't know yet how or why this happened, but it did. They upgraded to RHEL 6.4 with 'vdsm-4.10.2-1.6'. When they tried to register this system as a RHEV host, on the subsequent reboot 'vdsm' failed to start.

I have reproduced this here myself. I also tried 'vdsm-4.10.2-11.0', with the same result.

However, on a RHEL 6.3 system with 'vdsm-4.9.6-45.2' I cannot recreate this condition.


Version-Release number of selected component (if applicable):

vdsm-4.10.2-1.6
vdsm-4.10.2-11.0


How reproducible:

Every time with 'vdsm-4.10.2-1.6' or 'vdsm-4.10.2-11.0'.


Steps to Reproduce:

1. Modify '/etc/rsyslog.conf', e.g. remove the leading "$ModLoad" from line 8;

      8  imuxsock # provides support for local system logging (e.g. via logger command)

2. Restart rsyslogd (service rsyslog restart).

3. Restart vdsmd (service vdsmd restart). vdsm will not start.

Or, as the customer did, instead of 2 and 3 above, reboot.



Actual results:

After restarting rsyslogd and vdsmd;

-  vdsm procs;

# ps -ef|grep vdsm
vdsm     18520     1  0 16:19 ?        00:00:00 /bin/bash -e /usr/share/vdsm/respawn --minlifetime 10 --daemon --masterpid /var/run/vdsm/respawn.pid /usr/share/vdsm/vdsm
vdsm     20790 18520  0 16:20 ?        00:00:00 sleep 900


-  vdsm.log;

# tail -n 33 /var/log/vdsm/vdsm.log
Thread-269::DEBUG::2013-03-28 10:59:03,615::resourceManager::538::ResourceManager::(releaseResource) Trying to release resource 'Storage.2c110819-c5fa-423d-a855-569af43b7676'
Thread-269::DEBUG::2013-03-28 10:59:03,616::resourceManager::553::ResourceManager::(releaseResource) Released resource 'Storage.2c110819-c5fa-423d-a855-569af43b7676' (0 active users)
Thread-269::DEBUG::2013-03-28 10:59:03,616::resourceManager::558::ResourceManager::(releaseResource) Resource 'Storage.2c110819-c5fa-423d-a855-569af43b7676' is free, finding out if anyone is waiting for it.
Thread-269::DEBUG::2013-03-28 10:59:03,616::resourceManager::565::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.2c110819-c5fa-423d-a855-569af43b7676', Clearing records.
Thread-269::DEBUG::2013-03-28 10:59:03,616::task::957::TaskManager.Task::(_decref) Task=`d375525d-6af1-4fc4-9954-b636d3ac9a25`::ref 0 aborting False
MainThread::INFO::2013-03-28 10:59:06,052::vmChannels::135::vds::(stop) VM channels listener was stopped.
MainThread::DEBUG::2013-03-28 10:59:06,052::task::568::TaskManager.Task::(_updateState) Task=`ac615ed1-dbe6-464e-85f2-6e45e2b79292`::moving from state init -> state preparing
MainThread::INFO::2013-03-28 10:59:06,053::logUtils::37::dispatcher::(wrapper) Run and protect: prepareForShutdown(options=None)
Thread-11::DEBUG::2013-03-28 10:59:06,053::storageServer::674::ConnectionMonitor::(_monitorConnections) Monitoring stopped
MainThread::WARNING::2013-03-28 10:59:06,113::hsm::3344::Storage.HSM::(__releaseLocks) Found lease locks, releasing
Thread-18::DEBUG::2013-03-28 10:59:07,174::domainMonitor::160::Storage.DomainMonitorThread::(_monitorLoop) Stopping domain monitor for c1c7fcf2-f7bc-4884-b632-d6c64a30d2c3
Thread-18::DEBUG::2013-03-28 10:59:07,174::clusterlock::85::SafeLease::(releaseHostId) Host id for domain c1c7fcf2-f7bc-4884-b632-d6c64a30d2c3 released successfully (id: 1)
MainThread::DEBUG::2013-03-28 10:59:07,174::taskManager::80::TaskManager::(prepareForShutdown) Request to stop all tasks
MainThread::INFO::2013-03-28 10:59:07,174::logUtils::39::dispatcher::(wrapper) Run and protect: prepareForShutdown, Return response: None
MainThread::DEBUG::2013-03-28 10:59:07,174::task::1151::TaskManager.Task::(prepare) Task=`ac615ed1-dbe6-464e-85f2-6e45e2b79292`::finished: None
MainThread::DEBUG::2013-03-28 10:59:07,175::task::568::TaskManager.Task::(_updateState) Task=`ac615ed1-dbe6-464e-85f2-6e45e2b79292`::moving from state preparing -> state finished
MainThread::DEBUG::2013-03-28 10:59:07,175::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
MainThread::DEBUG::2013-03-28 10:59:07,175::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
MainThread::DEBUG::2013-03-28 10:59:07,175::task::957::TaskManager.Task::(_decref) Task=`ac615ed1-dbe6-464e-85f2-6e45e2b79292`::ref 0 aborting False
MainThread::INFO::2013-03-28 10:59:07,175::vdsm::95::vds::(run) VDSM main thread ended. Waiting for 12 other threads...
MainThread::INFO::2013-03-28 10:59:07,176::vdsm::99::vds::(run) <_MainThread(MainThread, started 140544775120640)>
MainThread::INFO::2013-03-28 10:59:07,176::vdsm::99::vds::(run) <Thread(libvirtEventLoop, started daemon 140544684652288)>
MainThread::INFO::2013-03-28 10:59:07,176::vdsm::99::vds::(run) <WorkerThread(Thread-6, started daemon 140544508409600)>
MainThread::INFO::2013-03-28 10:59:07,176::vdsm::99::vds::(run) <WorkerThread(Thread-9, started daemon 140544070117120)>
MainThread::INFO::2013-03-28 10:59:07,176::vdsm::99::vds::(run) <WorkerThread(Thread-5, started daemon 140544518899456)>
MainThread::INFO::2013-03-28 10:59:07,176::vdsm::99::vds::(run) <WorkerThread(414d32e8-ac0e-44d0-962a-b363426b41ff, started daemon 140544631383808)>
MainThread::INFO::2013-03-28 10:59:07,176::vdsm::99::vds::(run) <WorkerThread(Thread-8, started daemon 140544080606976)>
MainThread::INFO::2013-03-28 10:59:07,177::vdsm::99::vds::(run) <WorkerThread(Thread-7, started daemon 140544497919744)>
MainThread::INFO::2013-03-28 10:59:07,177::vdsm::99::vds::(run) <WorkerThread(Thread-3, started daemon 140544539879168)>
MainThread::INFO::2013-03-28 10:59:07,177::vdsm::99::vds::(run) <KsmMonitorThread(KsmMonitor, started daemon 140543610844928)>
MainThread::INFO::2013-03-28 10:59:07,177::vdsm::99::vds::(run) <WorkerThread(Thread-2, started daemon 140544550369024)>
MainThread::INFO::2013-03-28 10:59:07,177::vdsm::99::vds::(run) <WorkerThread(Thread-10, started daemon 140544059627264)>
MainThread::INFO::2013-03-28 10:59:07,177::vdsm::99::vds::(run) <WorkerThread(Thread-4, started daemon 140544529389312)>


-  '/var/log/messages';

# tail /var/log/messages
Mar 27 16:17:50 dhcp-190-228 vdsm Storage.StorageDomain WARNING Resource namespace c1c7fcf2-f7bc-4884-b632-d6c64a30d2c3_imageNS already registered
Mar 27 16:17:50 dhcp-190-228 vdsm Storage.StorageDomain WARNING Resource namespace c1c7fcf2-f7bc-4884-b632-d6c64a30d2c3_volumeNS already registered
Mar 27 16:18:24 dhcp-190-228 kernel: Kernel logging (proc) stopped.
Mar 27 16:18:24 dhcp-190-228 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="16437" x-info="http://www.rsyslog.com"] exiting on signal 15.
Mar 27 16:18:24 dhcp-190-228 kernel: imklog 5.8.10, log source = /proc/kmsg started.
Mar 27 16:18:24 dhcp-190-228 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="17751" x-info="http://www.rsyslog.com"] start
Mar 27 16:18:24 dhcp-190-228 rsyslogd-3000: unknown priority name "g." [try http://www.rsyslog.com/e/3000 ]
Mar 27 16:18:24 dhcp-190-228 rsyslogd: the last error occured in /etc/rsyslog.conf, line 8:" imuxsock # provides support for local system logging (e.g. via logger command)"
Mar 27 16:18:24 dhcp-190-228 rsyslogd: warning: selector line without actions will be discarded
Mar 27 16:18:24 dhcp-190-228 rsyslogd-2124: CONFIG ERROR: could not interpret master config file '/etc/rsyslog.conf'. [try http://www.rsyslog.com/e/2124 ]


No further messages get written to the 'messages' file after this, e.g. running 'logger' does not update the file.


-  RHEV-M;

RHEV-M sees the host as "Non Responsive".



Expected results:

As in previous versions, 'vdsm' is able to start, even if in a degraded mode. If so, some warning should be reported back to RHEV-M.



Additional info:

Obviously there is a simple avoidance here in that the '/etc/rsyslog.conf' file can be corrected. However, finding that that is the cause is the problem here. 'vdsm' should be more resilient. At the very least there should be some error reported or some clue provided. All we knew in the customer's case was that a host failed to register (the customer's description) or activate successfully. It wasn't obvious for a while where the problem was.

If I have omitted anything important, please let me know. 

Regards, GFW.

Comment 1 Haim 2013-03-29 19:50:30 UTC
1) can you please attach SOS report?
2) what is the status of  rsyslogd, is it running?

I suspect rsyslogd is not running at all, in that case, vdsm should alert that one of the dependent services is down, and might give a hint to the user.

Comment 5 Yaniv Bronhaim 2013-04-02 12:31:03 UTC
First I submitted a patch that adds rsyslog service to vdsm dependencies,

Second, when loading logger.conf file, logger module tries to open a socket to rsyslog service (that runs even if its configuration is corrupted), due to the configuration failure exception is raised and not handled in vdsm. We can't print this error to vdsm.log or syslog with the current logger configuration. 
This happens over and over and can't be fixed until rsyslog.conf will be fixed.

We can split the logger.conf, and if this exception is raised, we can try load the conf without the use of syslog and print the error to vdsm.log. But IMO it's too messy and unclear for this specific case and require the service is enough.

Comment 8 Yaniv Bronhaim 2013-04-03 08:44:47 UTC
syslog listens on /dev/log socket and logger tries to open it, if syslog runs with corrupted configuration this socket won't be created. I propose a patch that check this file before starting vdsm process.
 
Both patches together ensure that rsyslog deamon is installed and operates as needed.

Comment 11 Tareq Alayan 2013-05-23 16:15:14 UTC
verified.

vdsm fail to start but it throws a message 

Cannot communicate with syslog deamon for reports.  Check r[FAILED]onf
vdsm: one of the dependent services did not start, error co[FAILED]

Comment 13 errata-xmlrpc 2013-06-10 20:47:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0886.html


Note You need to log in before you can comment on or make changes to this bug.