Bug 851936

Summary: Occassionally gluster storage domain goes offline, but RHS nodes are all online.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Gowrishankar Rajaiyan <grajaiya>
Component: vdsmAssignee: Dan Kenigsberg <danken>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Sudhir D <sdharane>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.0CC: barumuga, grajaiya, hchiramm, perfbz, rhs-bugs, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-09-16 06:00:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm.log none

Description Gowrishankar Rajaiyan 2012-08-27 04:53:13 UTC
Created attachment 607114 [details]
vdsm.log

Description of problem:
During RHEV3.1-RHS2.0+ testing, I noticed that gluster storage domains go offline even though all the RHS nodes are online.

Version-Release number of selected component (if applicable):
glusterfs-3.3.0rhs-26.el6rhs.x86_64
rhev-hypervisor-6.1-20120607.0.el6_1.noarch
vdsm-4.9.6-28.0.el6_3.x86_64

How reproducible:
Occasionally

Steps to Reproduce:
1. Setup RHEV environment
2. Create Datacenter with Storage as gluster mount (POSIX compliant FS).
3. Create a virtual machine on this storage.

Actual results:
After some time, the storage domains go offline. Restarting vdsm brings them back online.

Expected results:
Storage domain should never go offline.

Additional info:

Comment 2 Bala.FA 2012-09-14 05:21:54 UTC
Thread-194253::DEBUG::2011-10-11 14:01:15,351::task::978::TaskManager.Task::(_decref) Task=`4dc31781-bb5b-4a71-8d72-5e12ea60ef2e`::ref 0 aborting False
Thread-194254::DEBUG::2011-10-11 14:01:15,384::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`6de348d7-bf46-4881-ab23-e5d41d13f42e`::Disk hdc stats not available
Thread-194254::DEBUG::2011-10-11 14:01:15,385::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`d0f58ab3-7202-45a0-a9b9-6088e00a65f1`::Disk hdc stats not available
Thread-194254::DEBUG::2011-10-11 14:01:15,386::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`4f1d9317-46ab-4eb3-8f90-e0b258513d19`::Disk hdc stats not available
Thread-194254::DEBUG::2011-10-11 14:01:15,386::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`816ac538-6aa6-44af-8ad0-7c35d149d6c0`::Disk hdc stats not available
Thread-194254::DEBUG::2011-10-11 14:01:15,387::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`d9e0053c-0e7b-442f-abb5-733f31da97b4`::Disk hdc stats not available
Thread-25::ERROR::2011-10-11 14:01:15,465::utils::399::vm.Vm::(collect) vmId=`d9e0053c-0e7b-442f-abb5-733f31da97b4`::Stats function failed: <AdvancedStatsFunction _sampleNet at 0x29b0a78>
Traceback (most recent call last):
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 395, in collect
    statsFunction()
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 272, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/libvirtvm.py", line 179, in _sampleNet
    netSamples[nic.name] = self._vm._dom.interfaceStats(nic.name)
  File "/usr/share/vdsm/libvirtvm.py", line 491, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1762, in interfaceStats
    if ret is None: raise libvirtError ('virDomainInterfaceStats() failed', dom=self)
libvirtError: internal error client socket is closed
Thread-22::ERROR::2011-10-11 14:01:19,192::utils::399::vm.Vm::(collect) vmId=`4f1d9317-46ab-4eb3-8f90-e0b258513d19`::Stats function failed: <AdvancedStatsFunction _sampleNet at 0x29b0a78>
Traceback (most recent call last):
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 395, in collect
    statsFunction()
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 272, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/libvirtvm.py", line 179, in _sampleNet
    netSamples[nic.name] = self._vm._dom.interfaceStats(nic.name)
  File "/usr/share/vdsm/libvirtvm.py", line 491, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1762, in interfaceStats
    if ret is None: raise libvirtError ('virDomainInterfaceStats() failed', dom=self)
libvirtError: internal error client socket is closed
Thread-21::ERROR::2011-10-11 14:01:19,205::utils::399::vm.Vm::(collect) vmId=`816ac538-6aa6-44af-8ad0-7c35d149d6c0`::Stats function failed: <AdvancedStatsFunction _sampleNet at 0x29b0a78>
Traceback (most recent call last):
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 395, in collect
    statsFunction()
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 272, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/libvirtvm.py", line 179, in _sampleNet
    netSamples[nic.name] = self._vm._dom.interfaceStats(nic.name)
  File "/usr/share/vdsm/libvirtvm.py", line 491, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1762, in interfaceStats
    if ret is None: raise libvirtError ('virDomainInterfaceStats() failed', dom=self)
libvirtError: internal error client socket is closed

Comment 3 Bala.FA 2012-09-14 05:23:02 UTC
I am not clear about this exception.  Please help on this.

Comment 4 Dan Kenigsberg 2012-09-14 07:41:26 UTC
The attached log is unrelated to storage - this is not what made Engine believe that the storage is offline. Do you see a prepareForShutdown somewhere? Could you find some other clues in engine and vdsm logs?

Has the VM crashed, or did it appear up after vdsm was restarted? Which libvirt version is used? Does it logs have clues about vdsm's disconnecting from it? Has libvirt process crashed?

Comment 5 Gowrishankar Rajaiyan 2012-09-14 08:36:54 UTC
Haven't seen this behaviour after upgrading to si17.

Comment 6 Dan Kenigsberg 2012-09-16 06:00:18 UTC
Please reopen with requested info if it ever reproduces.