Created attachment 648291 [details] Attaching vdsm log from the host Description of problem: ------------------------------------------------------------------------------ After a host is added to a cluster from the UI, it goes to 'non-responsive' state following reboot. It is found that vdsmd is not running on the host after being rebooted. The following is seen in the vdsm logs on the host - ------------------------------------------------------------------------------ MainThread::DEBUG::2012-11-20 06:20:03,163::task::588::TaskManager.Task::(_updateState) Task=`f7e839c0-af8e-4f40-bd72-324f296d855d`::moving from state preparing -> state finished MainThread::DEBUG::2012-11-20 06:20:03,163::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} MainThread::DEBUG::2012-11-20 06:20:03,164::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} MainThread::DEBUG::2012-11-20 06:20:03,164::task::978::TaskManager.Task::(_decref) Task=`f7e839c0-af8e-4f40-bd72-324f296d855d`::ref 0 aborting False MainThread::ERROR::2012-11-20 06:20:03,164::vdsm::73::vds::(run) Exception raised Traceback (most recent call last): File "/usr/share/vdsm/vdsm", line 71, in run serve_clients(log) File "/usr/share/vdsm/vdsm", line 39, in serve_clients cif = clientIF.clientIF(log) File "/usr/share/vdsm/clientIF.py", line 87, in __init__ caps.CpuTopology().cores()) File "/usr/share/vdsm/caps.py", line 87, in __init__ self._topology = _getCpuTopology(capabilities) File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 799, in __call__ value = self.func(*args) File "/usr/share/vdsm/caps.py", line 115, in _getCpuTopology 'sockets': int(cpu.getElementsByTagName('topology')[0]. IndexError: list index out of range Version-Release number of selected component (if applicable): 2.1-qa18.el6ev How reproducible: Always Steps to Reproduce: 1.Add a host that has RHS installed on it, with glusterfs version - glusterfs-3.4.0qa2-1.el6rhs.x86_64 Actual results: Host goes to 'non-responsive' state after reboot. Expected results: Host should be up after reboot. Additional info:
This bug doesn't appear in all vm systems. The same works fine in vm hosted in f17 and esxi servers. I had a discussion in #vdsm and was told to upgrade libvirt for fixing this error. But it doesn't work. I am checking the root cause of the problem before fixing it in vdsm code.
vdsm fix is submitted to upstream at http://gerrit.ovirt.org/#/c/9386/
After some discussion in http://gerrit.ovirt.org/#/c/9386/, I was told its libvirt bug https://bugzilla.redhat.com/show_bug.cgi?id=866999 and the fix is available in libvirt-0.9.10-21.el6_3.6.x86_64.rpm
After upgrading to libvirt-0.9.10-21.el6_3.6.x86_64.rpm things are working fine
Review at https://code.engineering.redhat.com/gerrit/1631
Current QE drop has this fix.
Verified in Red Hat Storage Console Version: 2.1-qa18.el6ev, vdsm version: vdsm-4.9.6-32.0.qa3.el6rhs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html