Bug 851936 - Occassionally gluster storage domain goes offline, but RHS nodes are all online.
Summary: Occassionally gluster storage domain goes offline, but RHS nodes are all online.
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: vdsm
Version: 2.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Dan Kenigsberg
QA Contact: Sudhir D
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-27 04:53 UTC by Gowrishankar Rajaiyan
Modified: 2012-09-16 06:00 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-09-16 06:00:18 UTC
Embargoed:


Attachments (Terms of Use)
vdsm.log (72.55 KB, text/x-log)
2012-08-27 04:53 UTC, Gowrishankar Rajaiyan
no flags Details

Description Gowrishankar Rajaiyan 2012-08-27 04:53:13 UTC
Created attachment 607114 [details]
vdsm.log

Description of problem:
During RHEV3.1-RHS2.0+ testing, I noticed that gluster storage domains go offline even though all the RHS nodes are online.

Version-Release number of selected component (if applicable):
glusterfs-3.3.0rhs-26.el6rhs.x86_64
rhev-hypervisor-6.1-20120607.0.el6_1.noarch
vdsm-4.9.6-28.0.el6_3.x86_64

How reproducible:
Occasionally

Steps to Reproduce:
1. Setup RHEV environment
2. Create Datacenter with Storage as gluster mount (POSIX compliant FS).
3. Create a virtual machine on this storage.

Actual results:
After some time, the storage domains go offline. Restarting vdsm brings them back online.

Expected results:
Storage domain should never go offline.

Additional info:

Comment 2 Bala.FA 2012-09-14 05:21:54 UTC
Thread-194253::DEBUG::2011-10-11 14:01:15,351::task::978::TaskManager.Task::(_decref) Task=`4dc31781-bb5b-4a71-8d72-5e12ea60ef2e`::ref 0 aborting False
Thread-194254::DEBUG::2011-10-11 14:01:15,384::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`6de348d7-bf46-4881-ab23-e5d41d13f42e`::Disk hdc stats not available
Thread-194254::DEBUG::2011-10-11 14:01:15,385::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`d0f58ab3-7202-45a0-a9b9-6088e00a65f1`::Disk hdc stats not available
Thread-194254::DEBUG::2011-10-11 14:01:15,386::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`4f1d9317-46ab-4eb3-8f90-e0b258513d19`::Disk hdc stats not available
Thread-194254::DEBUG::2011-10-11 14:01:15,386::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`816ac538-6aa6-44af-8ad0-7c35d149d6c0`::Disk hdc stats not available
Thread-194254::DEBUG::2011-10-11 14:01:15,387::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`d9e0053c-0e7b-442f-abb5-733f31da97b4`::Disk hdc stats not available
Thread-25::ERROR::2011-10-11 14:01:15,465::utils::399::vm.Vm::(collect) vmId=`d9e0053c-0e7b-442f-abb5-733f31da97b4`::Stats function failed: <AdvancedStatsFunction _sampleNet at 0x29b0a78>
Traceback (most recent call last):
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 395, in collect
    statsFunction()
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 272, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/libvirtvm.py", line 179, in _sampleNet
    netSamples[nic.name] = self._vm._dom.interfaceStats(nic.name)
  File "/usr/share/vdsm/libvirtvm.py", line 491, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1762, in interfaceStats
    if ret is None: raise libvirtError ('virDomainInterfaceStats() failed', dom=self)
libvirtError: internal error client socket is closed
Thread-22::ERROR::2011-10-11 14:01:19,192::utils::399::vm.Vm::(collect) vmId=`4f1d9317-46ab-4eb3-8f90-e0b258513d19`::Stats function failed: <AdvancedStatsFunction _sampleNet at 0x29b0a78>
Traceback (most recent call last):
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 395, in collect
    statsFunction()
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 272, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/libvirtvm.py", line 179, in _sampleNet
    netSamples[nic.name] = self._vm._dom.interfaceStats(nic.name)
  File "/usr/share/vdsm/libvirtvm.py", line 491, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1762, in interfaceStats
    if ret is None: raise libvirtError ('virDomainInterfaceStats() failed', dom=self)
libvirtError: internal error client socket is closed
Thread-21::ERROR::2011-10-11 14:01:19,205::utils::399::vm.Vm::(collect) vmId=`816ac538-6aa6-44af-8ad0-7c35d149d6c0`::Stats function failed: <AdvancedStatsFunction _sampleNet at 0x29b0a78>
Traceback (most recent call last):
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 395, in collect
    statsFunction()
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 272, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/libvirtvm.py", line 179, in _sampleNet
    netSamples[nic.name] = self._vm._dom.interfaceStats(nic.name)
  File "/usr/share/vdsm/libvirtvm.py", line 491, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1762, in interfaceStats
    if ret is None: raise libvirtError ('virDomainInterfaceStats() failed', dom=self)
libvirtError: internal error client socket is closed

Comment 3 Bala.FA 2012-09-14 05:23:02 UTC
I am not clear about this exception.  Please help on this.

Comment 4 Dan Kenigsberg 2012-09-14 07:41:26 UTC
The attached log is unrelated to storage - this is not what made Engine believe that the storage is offline. Do you see a prepareForShutdown somewhere? Could you find some other clues in engine and vdsm logs?

Has the VM crashed, or did it appear up after vdsm was restarted? Which libvirt version is used? Does it logs have clues about vdsm's disconnecting from it? Has libvirt process crashed?

Comment 5 Gowrishankar Rajaiyan 2012-09-14 08:36:54 UTC
Haven't seen this behaviour after upgrading to si17.

Comment 6 Dan Kenigsberg 2012-09-16 06:00:18 UTC
Please reopen with requested info if it ever reproduces.


Note You need to log in before you can comment on or make changes to this bug.