Bug 480997

Summary: collectd does not re-connect to libvirtd properly if libvirtd is not running when collectd is started
Product: [Fedora] Fedora Reporter: Perry Myers <pmyers>
Component: collectdAssignee: Alan Pevec <apevec>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 10CC: apevec, apevec, berrange, dpierce, rjones, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 4.5.4-2.fc11 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-15 03:48:06 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Perry Myers 2009-01-21 11:52:31 EST
Description of problem:
If collectd is started when libvirtd is not running it tries to reconnect after increasing timeout intervals.  But this is not happening.

Version-Release number of selected component (if applicable):
4.5.1-2.1.fc10

How reproducible:
Every time

Steps to Reproduce:
1. service libvirtd stop
2. service collectd start
3. service libvirtd start
  
Actual results:
Step 2 results in:

[root@localhost log]# service collectd start
Starting collectd: libvir: Remote error : unable to connect to '/var/run/libvirt/libvirt-sock-ro': Connection refused
                                                           [  OK  ]

Would be good if that error message was not printed to stdout, since it will retry when the libvirt socket is available

/var/log/collectd.log after step2 says:
[2009-01-21 16:45:30] connection failed: unable to connect to '/var/run/libvirt/libvirt-sock-ro': Connection refused
[2009-01-21 16:45:30] libvirt plugin: Not connected. Use Connection in config file to supply connection URI.  For more information see <http://libvirt.org/uri.html>
[2009-01-21 16:45:30] read-function of plugin `libvirt' failed. Will suspend it for 10 seconds.

After Step 3 the log says:
[2009-01-21 16:47:54] libvirt plugin: Not connected. Use Connection in config file to supply connection URI.  For more information see <http://libvirt.org/uri.html>
[2009-01-21 16:47:54] read-function of plugin `libvirt' failed. Will suspend it for 20 seconds.
[2009-01-21 16:48:14] libvirt plugin: Not connected. Use Connection in config file to supply connection URI.  For more information see <http://libvirt.org/uri.html>
[2009-01-21 16:48:14] read-function of plugin `libvirt' failed. Will suspend it for 40 seconds.
...

Even though /var/run/libvirt/libvirt-sock-ro is now present

Expected results:
After libvirtd is started, the connection should be successful

Additional info:
Also, if libvirtd is running when collectd is started and then later libvirtd is stopped/started, collectd never reconnects
Comment 1 Richard W.M. Jones 2009-01-21 12:03:40 EST
Agreed, this is going to be a problem.  Any idea what the
priority of having this fix is for ovirt?
Comment 2 Perry Myers 2009-01-21 22:48:04 EST
For oVirt this is generally a problem on startup of the Node, since collectd starts right after libvirtd and the libvirtd ro socket is not ready yet on occasion when collectd starts.  So it's an intermittent problem.

What we can do is put something in rc.local to restart collectd in the hopes that the second time the ro socket will be around.  But this is not an optimal solution since there is still a chance that the connection might be dead.
Comment 3 Fedora Update System 2009-08-11 19:04:51 EDT
collectd-4.5.4-2.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/collectd-4.5.4-2.fc11
Comment 4 Fedora Update System 2009-08-12 16:52:26 EDT
collectd-4.5.4-2.fc11 has been pushed to the Fedora 11 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update collectd'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-8503
Comment 5 Fedora Update System 2009-09-15 03:47:56 EDT
collectd-4.5.4-2.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.