Created attachment 607114 [details] vdsm.log Description of problem: During RHEV3.1-RHS2.0+ testing, I noticed that gluster storage domains go offline even though all the RHS nodes are online. Version-Release number of selected component (if applicable): glusterfs-3.3.0rhs-26.el6rhs.x86_64 rhev-hypervisor-6.1-20120607.0.el6_1.noarch vdsm-4.9.6-28.0.el6_3.x86_64 How reproducible: Occasionally Steps to Reproduce: 1. Setup RHEV environment 2. Create Datacenter with Storage as gluster mount (POSIX compliant FS). 3. Create a virtual machine on this storage. Actual results: After some time, the storage domains go offline. Restarting vdsm brings them back online. Expected results: Storage domain should never go offline. Additional info:
Thread-194253::DEBUG::2011-10-11 14:01:15,351::task::978::TaskManager.Task::(_decref) Task=`4dc31781-bb5b-4a71-8d72-5e12ea60ef2e`::ref 0 aborting False Thread-194254::DEBUG::2011-10-11 14:01:15,384::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`6de348d7-bf46-4881-ab23-e5d41d13f42e`::Disk hdc stats not available Thread-194254::DEBUG::2011-10-11 14:01:15,385::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`d0f58ab3-7202-45a0-a9b9-6088e00a65f1`::Disk hdc stats not available Thread-194254::DEBUG::2011-10-11 14:01:15,386::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`4f1d9317-46ab-4eb3-8f90-e0b258513d19`::Disk hdc stats not available Thread-194254::DEBUG::2011-10-11 14:01:15,386::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`816ac538-6aa6-44af-8ad0-7c35d149d6c0`::Disk hdc stats not available Thread-194254::DEBUG::2011-10-11 14:01:15,387::libvirtvm::240::vm.Vm::(_getDiskStats) vmId=`d9e0053c-0e7b-442f-abb5-733f31da97b4`::Disk hdc stats not available Thread-25::ERROR::2011-10-11 14:01:15,465::utils::399::vm.Vm::(collect) vmId=`d9e0053c-0e7b-442f-abb5-733f31da97b4`::Stats function failed: <AdvancedStatsFunction _sampleNet at 0x29b0a78> Traceback (most recent call last): File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 395, in collect statsFunction() File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 272, in __call__ retValue = self._function(*args, **kwargs) File "/usr/share/vdsm/libvirtvm.py", line 179, in _sampleNet netSamples[nic.name] = self._vm._dom.interfaceStats(nic.name) File "/usr/share/vdsm/libvirtvm.py", line 491, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1762, in interfaceStats if ret is None: raise libvirtError ('virDomainInterfaceStats() failed', dom=self) libvirtError: internal error client socket is closed Thread-22::ERROR::2011-10-11 14:01:19,192::utils::399::vm.Vm::(collect) vmId=`4f1d9317-46ab-4eb3-8f90-e0b258513d19`::Stats function failed: <AdvancedStatsFunction _sampleNet at 0x29b0a78> Traceback (most recent call last): File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 395, in collect statsFunction() File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 272, in __call__ retValue = self._function(*args, **kwargs) File "/usr/share/vdsm/libvirtvm.py", line 179, in _sampleNet netSamples[nic.name] = self._vm._dom.interfaceStats(nic.name) File "/usr/share/vdsm/libvirtvm.py", line 491, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1762, in interfaceStats if ret is None: raise libvirtError ('virDomainInterfaceStats() failed', dom=self) libvirtError: internal error client socket is closed Thread-21::ERROR::2011-10-11 14:01:19,205::utils::399::vm.Vm::(collect) vmId=`816ac538-6aa6-44af-8ad0-7c35d149d6c0`::Stats function failed: <AdvancedStatsFunction _sampleNet at 0x29b0a78> Traceback (most recent call last): File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 395, in collect statsFunction() File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 272, in __call__ retValue = self._function(*args, **kwargs) File "/usr/share/vdsm/libvirtvm.py", line 179, in _sampleNet netSamples[nic.name] = self._vm._dom.interfaceStats(nic.name) File "/usr/share/vdsm/libvirtvm.py", line 491, in f ret = attr(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1762, in interfaceStats if ret is None: raise libvirtError ('virDomainInterfaceStats() failed', dom=self) libvirtError: internal error client socket is closed
I am not clear about this exception. Please help on this.
The attached log is unrelated to storage - this is not what made Engine believe that the storage is offline. Do you see a prepareForShutdown somewhere? Could you find some other clues in engine and vdsm logs? Has the VM crashed, or did it appear up after vdsm was restarted? Which libvirt version is used? Does it logs have clues about vdsm's disconnecting from it? Has libvirt process crashed?
Haven't seen this behaviour after upgrading to si17.
Please reopen with requested info if it ever reproduces.