Bug 851936 - Occassionally gluster storage domain goes offline, but RHS nodes are all online.
Occassionally gluster storage domain goes offline, but RHS nodes are all online.
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: vdsm (Show other bugs)
2.0
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Dan Kenigsberg
Sudhir D
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-27 00:53 EDT by Gowrishankar Rajaiyan
Modified: 2012-09-16 02:00 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-09-16 02:00:18 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
vdsm.log (72.55 KB, text/x-log)
2012-08-27 00:53 EDT, Gowrishankar Rajaiyan
no flags Details

  None (edit)
Description Gowrishankar Rajaiyan 2012-08-27 00:53:13 EDT
Created attachment 607114 [details]
vdsm.log

Description of problem:
During RHEV3.1-RHS2.0+ testing, I noticed that gluster storage domains go offline even though all the RHS nodes are online.

Version-Release number of selected component (if applicable):
glusterfs-3.3.0rhs-26.el6rhs.x86_64
rhev-hypervisor-6.1-20120607.0.el6_1.noarch
vdsm-4.9.6-28.0.el6_3.x86_64

How reproducible:
Occasionally

Steps to Reproduce:
1. Setup RHEV environment
2. Create Datacenter with Storage as gluster mount (POSIX compliant FS).
3. Create a virtual machine on this storage.

Actual results:
After some time, the storage domains go offline. Restarting vdsm brings them back online.

Expected results:
Storage domain should never go offline.

Additional info:
Comment 2 Bala.FA 2012-09-14 01:21:54 EDT
Thread-194253::DEBUG::2011-10-11 14:01:15,351::task::978::TaskManager.Task::(_decref) Task=`4dc31781-bb5b-4a71-8d72-5e12ea60ef2e`::ref 0 aborting False
Thread-194254::DEBUG::2011-10-11 14:01:15,384::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`6de348d7-bf46-4881-ab23-e5d41d13f42e`::Disk hdc stats not available
Thread-194254::DEBUG::2011-10-11 14:01:15,385::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`d0f58ab3-7202-45a0-a9b9-6088e00a65f1`::Disk hdc stats not available
Thread-194254::DEBUG::2011-10-11 14:01:15,386::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`4f1d9317-46ab-4eb3-8f90-e0b258513d19`::Disk hdc stats not available
Thread-194254::DEBUG::2011-10-11 14:01:15,386::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`816ac538-6aa6-44af-8ad0-7c35d149d6c0`::Disk hdc stats not available
Thread-194254::DEBUG::2011-10-11 14:01:15,387::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`d9e0053c-0e7b-442f-abb5-733f31da97b4`::Disk hdc stats not available
Thread-25::ERROR::2011-10-11 14:01:15,465::utils::399::vm.Vm::(collect) vmId=`d9e0053c-0e7b-442f-abb5-733f31da97b4`::Stats function failed: <AdvancedStatsFunction _sampleNet at 0x29b0a78>
Traceback (most recent call last):
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 395, in collect
    statsFunction()
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 272, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/libvirtvm.py", line 179, in _sampleNet
    netSamples[nic.name] = self._vm._dom.interfaceStats(nic.name)
  File "/usr/share/vdsm/libvirtvm.py", line 491, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1762, in interfaceStats
    if ret is None: raise libvirtError ('virDomainInterfaceStats() failed', dom=self)
libvirtError: internal error client socket is closed
Thread-22::ERROR::2011-10-11 14:01:19,192::utils::399::vm.Vm::(collect) vmId=`4f1d9317-46ab-4eb3-8f90-e0b258513d19`::Stats function failed: <AdvancedStatsFunction _sampleNet at 0x29b0a78>
Traceback (most recent call last):
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 395, in collect
    statsFunction()
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 272, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/libvirtvm.py", line 179, in _sampleNet
    netSamples[nic.name] = self._vm._dom.interfaceStats(nic.name)
  File "/usr/share/vdsm/libvirtvm.py", line 491, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1762, in interfaceStats
    if ret is None: raise libvirtError ('virDomainInterfaceStats() failed', dom=self)
libvirtError: internal error client socket is closed
Thread-21::ERROR::2011-10-11 14:01:19,205::utils::399::vm.Vm::(collect) vmId=`816ac538-6aa6-44af-8ad0-7c35d149d6c0`::Stats function failed: <AdvancedStatsFunction _sampleNet at 0x29b0a78>
Traceback (most recent call last):
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 395, in collect
    statsFunction()
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 272, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/libvirtvm.py", line 179, in _sampleNet
    netSamples[nic.name] = self._vm._dom.interfaceStats(nic.name)
  File "/usr/share/vdsm/libvirtvm.py", line 491, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1762, in interfaceStats
    if ret is None: raise libvirtError ('virDomainInterfaceStats() failed', dom=self)
libvirtError: internal error client socket is closed
Comment 3 Bala.FA 2012-09-14 01:23:02 EDT
I am not clear about this exception.  Please help on this.
Comment 4 Dan Kenigsberg 2012-09-14 03:41:26 EDT
The attached log is unrelated to storage - this is not what made Engine believe that the storage is offline. Do you see a prepareForShutdown somewhere? Could you find some other clues in engine and vdsm logs?

Has the VM crashed, or did it appear up after vdsm was restarted? Which libvirt version is used? Does it logs have clues about vdsm's disconnecting from it? Has libvirt process crashed?
Comment 5 Gowrishankar Rajaiyan 2012-09-14 04:36:54 EDT
Haven't seen this behaviour after upgrading to si17.
Comment 6 Dan Kenigsberg 2012-09-16 02:00:18 EDT
Please reopen with requested info if it ever reproduces.

Note You need to log in before you can comment on or make changes to this bug.