Hide Forgot
Created attachment 479087 [details] vdsm log Description of problem: In case libvirt crashes- vdsm watchdog will stop trying to restart after 30 seconds: Feb 16 11:40:20 gold-vdsc respawn: slave '/usr/share/vdsm//vdsm' died too quickly, respawning slave Feb 16 11:40:24 gold-vdsc respawn: slave '/usr/share/vdsm//vdsm' died too quickly, respawning slave Feb 16 11:40:29 gold-vdsc respawn: slave '/usr/share/vdsm//vdsm' died too quickly, respawning slave Feb 16 11:40:34 gold-vdsc respawn: slave '/usr/share/vdsm//vdsm' died too quickly, respawning slave Feb 16 11:40:39 gold-vdsc respawn: slave '/usr/share/vdsm//vdsm' died too quickly, respawning slave Feb 16 11:40:43 gold-vdsc respawn: slave '/usr/share/vdsm//vdsm' died too quickly, respawning slave Feb 16 11:40:48 gold-vdsc respawn: slave '/usr/share/vdsm//vdsm' died too quickly, respawning slave Feb 16 11:40:53 gold-vdsc respawn: slave '/usr/share/vdsm//vdsm' died too quickly for more than 30 seconds, exiting master Which in case libvirt will be restarted vdsm needs to be restarted manually as well. Version-Release number of selected component (if applicable): vdsm-4.9-48.el6.x86_64 libvirt-0.8.7-5.el6.x86_64 device-mapper-multipath-0.4.9-32.el6.x86_64 qemu-kvm-0.12.1.2-2.144.el6.x86_64 How reproducible: always Steps to Reproduce: 1.run vm 2.kill libvirt 3. Actual results: Expected results: vdsm watchdog should keep trying to restart vdsm (maybe on longer wait intervals) Additional info:
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative.
I expect libvirt to be restarted automatically within 5 seconds https://www.redhat.com/archives/libvir-list/2011-February/msg00789.html so I believe 30 seconds of attempts is fine. However I'm willing to extend this to 60 seconds if that's enough in your opinion. Is it?
Ayal suggests to have vdsmd/respawn to sleep for 10 minutes instead of stopping to respawn. We should make sure that `service vdsmd status` reflects this state of sleep between trials.
Verified - vdsm-4.9-59.el6 - vdsm/respawn command sleeps for the amount of time that is described above, under normal scenario (accidentaly killing libvirt - started it up after few minutes) the vdsmd service doesn't have to be started manually.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2011-1782.html