Bug 480997 - collectd does not re-connect to libvirtd properly if libvirtd is not running when collectd is started
Summary: collectd does not re-connect to libvirtd properly if libvirtd is not running ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: collectd
Version: 10
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Alan Pevec
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-01-21 16:52 UTC by Perry Myers
Modified: 2009-09-15 07:48 UTC (History)
6 users (show)

Fixed In Version: 4.5.4-2.fc11
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-15 07:48:06 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Perry Myers 2009-01-21 16:52:31 UTC
Description of problem:
If collectd is started when libvirtd is not running it tries to reconnect after increasing timeout intervals.  But this is not happening.

Version-Release number of selected component (if applicable):
4.5.1-2.1.fc10

How reproducible:
Every time

Steps to Reproduce:
1. service libvirtd stop
2. service collectd start
3. service libvirtd start
  
Actual results:
Step 2 results in:

[root@localhost log]# service collectd start
Starting collectd: libvir: Remote error : unable to connect to '/var/run/libvirt/libvirt-sock-ro': Connection refused
                                                           [  OK  ]

Would be good if that error message was not printed to stdout, since it will retry when the libvirt socket is available

/var/log/collectd.log after step2 says:
[2009-01-21 16:45:30] connection failed: unable to connect to '/var/run/libvirt/libvirt-sock-ro': Connection refused
[2009-01-21 16:45:30] libvirt plugin: Not connected. Use Connection in config file to supply connection URI.  For more information see <http://libvirt.org/uri.html>
[2009-01-21 16:45:30] read-function of plugin `libvirt' failed. Will suspend it for 10 seconds.

After Step 3 the log says:
[2009-01-21 16:47:54] libvirt plugin: Not connected. Use Connection in config file to supply connection URI.  For more information see <http://libvirt.org/uri.html>
[2009-01-21 16:47:54] read-function of plugin `libvirt' failed. Will suspend it for 20 seconds.
[2009-01-21 16:48:14] libvirt plugin: Not connected. Use Connection in config file to supply connection URI.  For more information see <http://libvirt.org/uri.html>
[2009-01-21 16:48:14] read-function of plugin `libvirt' failed. Will suspend it for 40 seconds.
...

Even though /var/run/libvirt/libvirt-sock-ro is now present

Expected results:
After libvirtd is started, the connection should be successful

Additional info:
Also, if libvirtd is running when collectd is started and then later libvirtd is stopped/started, collectd never reconnects

Comment 1 Richard W.M. Jones 2009-01-21 17:03:40 UTC
Agreed, this is going to be a problem.  Any idea what the
priority of having this fix is for ovirt?

Comment 2 Perry Myers 2009-01-22 03:48:04 UTC
For oVirt this is generally a problem on startup of the Node, since collectd starts right after libvirtd and the libvirtd ro socket is not ready yet on occasion when collectd starts.  So it's an intermittent problem.

What we can do is put something in rc.local to restart collectd in the hopes that the second time the ro socket will be around.  But this is not an optimal solution since there is still a chance that the connection might be dead.

Comment 3 Fedora Update System 2009-08-11 23:04:51 UTC
collectd-4.5.4-2.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/collectd-4.5.4-2.fc11

Comment 4 Fedora Update System 2009-08-12 20:52:26 UTC
collectd-4.5.4-2.fc11 has been pushed to the Fedora 11 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update collectd'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-8503

Comment 5 Fedora Update System 2009-09-15 07:47:56 UTC
collectd-4.5.4-2.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.