878327 – RHS-C Newly added host is non-responsive following reboot.

Bug 878327 - RHS-C Newly added host is non-responsive following reboot.

Summary: RHS-C Newly added host is non-responsive following reboot.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	rhsc
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Bala.FA
QA Contact:	Shruti Sampat
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-11-20 06:33 UTC by Shruti Sampat
Modified:	2015-11-23 02:57 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-09-23 22:25:24 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Attaching vdsm log from the host (135.24 KB, text/x-log) 2012-11-20 06:33 UTC, Shruti Sampat	no flags	Details
View All

Description Shruti Sampat 2012-11-20 06:33:16 UTC

Created attachment 648291 [details]
Attaching vdsm log from the host

Description of problem:
------------------------------------------------------------------------------
After a host is added to a cluster from the UI, it goes to 'non-responsive' state following reboot.
It is found that vdsmd is not running on the host after being rebooted.

The following is seen in the vdsm logs on the host - 
------------------------------------------------------------------------------
MainThread::DEBUG::2012-11-20 06:20:03,163::task::588::TaskManager.Task::(_updateState) Task=`f7e839c0-af8e-4f40-bd72-324f296d855d`::moving from state
 preparing -> state finished
MainThread::DEBUG::2012-11-20 06:20:03,163::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
MainThread::DEBUG::2012-11-20 06:20:03,164::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
MainThread::DEBUG::2012-11-20 06:20:03,164::task::978::TaskManager.Task::(_decref) Task=`f7e839c0-af8e-4f40-bd72-324f296d855d`::ref 0 aborting False
MainThread::ERROR::2012-11-20 06:20:03,164::vdsm::73::vds::(run) Exception raised
Traceback (most recent call last):
  File "/usr/share/vdsm/vdsm", line 71, in run
    serve_clients(log)
  File "/usr/share/vdsm/vdsm", line 39, in serve_clients
    cif = clientIF.clientIF(log)
  File "/usr/share/vdsm/clientIF.py", line 87, in __init__
    caps.CpuTopology().cores())
  File "/usr/share/vdsm/caps.py", line 87, in __init__
    self._topology = _getCpuTopology(capabilities)
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 799, in __call__
    value = self.func(*args)
  File "/usr/share/vdsm/caps.py", line 115, in _getCpuTopology
    'sockets': int(cpu.getElementsByTagName('topology')[0].
IndexError: list index out of range


Version-Release number of selected component (if applicable):
2.1-qa18.el6ev

How reproducible:
Always

Steps to Reproduce:
1.Add a host that has RHS installed on it, with glusterfs version - 
glusterfs-3.4.0qa2-1.el6rhs.x86_64

  
Actual results:
Host goes to 'non-responsive' state after reboot.

Expected results:
Host should be up after reboot.

Additional info:

Comment 2 Bala.FA 2012-11-21 08:57:18 UTC

This bug doesn't appear in all vm systems.  The same works fine in vm hosted in f17 and esxi servers. I had a discussion in #vdsm and was told to upgrade libvirt for fixing this error.  But it doesn't work.  I am checking the root cause of the problem before fixing it in vdsm code.

Comment 3 Bala.FA 2012-11-21 12:11:26 UTC

vdsm fix is submitted to upstream at http://gerrit.ovirt.org/#/c/9386/

Comment 4 Bala.FA 2012-11-21 13:23:50 UTC

After some discussion in http://gerrit.ovirt.org/#/c/9386/, I was told its libvirt bug https://bugzilla.redhat.com/show_bug.cgi?id=866999 and the fix is available in libvirt-0.9.10-21.el6_3.6.x86_64.rpm

Comment 5 Bala.FA 2012-11-21 13:26:23 UTC

After upgrading to libvirt-0.9.10-21.el6_3.6.x86_64.rpm things are working fine

Comment 6 Bala.FA 2012-11-22 03:36:02 UTC

Review at https://code.engineering.redhat.com/gerrit/1631

Comment 7 Bala.FA 2012-12-20 12:48:25 UTC

Current QE drop has this fix.

Comment 8 Shruti Sampat 2012-12-24 04:52:01 UTC

Verified in Red Hat Storage Console Version: 2.1-qa18.el6ev, vdsm version: vdsm-4.9.6-32.0.qa3.el6rhs.x86_64

Comment 9 Scott Haines 2013-09-23 22:25:24 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.