Bug 865382

Summary: [RHEV-RHS] glusterd restart pauses VMs
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Anush Shetty <ashetty>
Component: glusterfsAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED CURRENTRELEASE QA Contact: Anush Shetty <ashetty>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 2.0CC: grajaiya, kaushal, nsathyan, rhs-bugs, rwheeler, sdharane, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.3.0rhsvirt1-8.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-10 07:43:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Anush Shetty 2012-10-11 10:07:34 UTC
Description of problem: When we restart glusterd, we saw that active VMs moved to paused state. 

We were to able to start them again from RHEV-M

Version-Release number of selected component (if applicable):

RHEV-M
rhevm-webadmin-portal-3.1.0-20.el6ev.noarch

RHS
# rpm -qa | grep glusterfs
glusterfs-fuse-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-rdma-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-debuginfo-3.3.0rhsvirt1-7.el6rhs.x86_64
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-server-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-devel-3.3.0rhsvirt1-7.el6rhs.x86_64
glusterfs-geo-replication-3.3.0rhsvirt1-7.el6rhs.x86_6


How reproducible: Consistently 



Steps to Reproduce:
1. Create gluster volume and add that a storage domain in RHEVM
2. Create VMs on the storage domain and activate them
3. Restart glusterd
  
Actual results:

VM has been paused due to a storage error

Expected results:

VMs shoudn't pause

Additional info:

Volume Name: dist-replica
Type: Distributed-Replicate
Volume ID: 39e0c10c-12d8-4484-b21d-a3be0cd0b7aa
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: rhs-client36.lab.eng.blr.redhat.com:/dist-replica1
Brick2: rhs-client37.lab.eng.blr.redhat.com:/dist-replica1
Brick3: rhs-client43.lab.eng.blr.redhat.com:/dist-replica1
Brick4: rhs-client44.lab.eng.blr.redhat.com:/dist-replica1
Options Reconfigured:
performance.quick-read: disable
performance.io-cache: disable
performance.stat-prefetch: disable
performance.read-ahead: disable
storage.linux-aio: disable
cluster.eager-lock: enable

vdsm log:
]
Thread-72275::DEBUG::2012-10-11 15:31:41,666::libvirtvm::243::vm.Vm::(_getDiskStats) vmId=`9539050b-1955-4e94-815d-7b7f4d40a9b7`::Disk hdc stats not available
Thread-72275::DEBUG::2012-10-11 15:31:41,666::BindingXMLRPC::880::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Paused', 'us
ername': 'Unknown', 'memUsage': '0', 'acpiEnable': 'true', 'pid': '9725', 'displayIp': '0', 'displayPort': u'5904', 'session': 'Unknown', 'displaySecurePort': u'5905', 'timeOffset': '-4
3200', 'hash': '8932327113078622514', 'pauseCode': 'EOTHER', 'clientIp': '', 'kvmEnable': 'true', 'network': {u'vnet2': {'macAddr': '52:54:00:35:c9:ab', 'rxDropped': '0', 'rxErrors': '0
', 'txDropped': '0', 'txRate': '0.0', 'rxRate': '0.0', 'txErrors': '0', 'state': 'unknown', 'speed': '1000', 'name': u'vnet2'}}, 'vmId': '9539050b-1955-4e94-815d-7b7f4d40a9b7', 'display
Type': 'qxl', 'cpuUser': '0.00', 'disks': {u'vda': {'readLatency': '0', 'apparentsize': '2555904', 'writeLatency': '0', 'imageID': 'bf486cb9-96f7-423e-8a64-30204da9a689', 'flushLatency'
: '0', 'readRate': '0.00', 'truesize': '2555904', 'writeRate': '0.00'}, u'hdc': {'flushLatency': '0', 'readLatency': '0', 'writeLatency': '0'}}, 'monitorResponse': '0', 'statsAge': '0.1
0', 'cpuIdle': '100.00', 'elapsedTime': '612276', 'vmType': 'kvm', 'cpuSys': '0.00', 'appsList': [], 'guestIPs': '', 'nice': ''}]}
Thread-26::ERROR::2012-10-11 15:31:44,237::domainMonitor::204::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain fa2ed64f-be0f-4b89-b29f-6e7642b0ffd3 monitorin
g information
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 184, in _monitorDomain
    self.nextStatus.readDelay = self.domain.getReadDelay()
  File "/usr/share/vdsm/storage/fileSD.py", line 166, in getReadDelay
    oop.getProcessPool(self.sdUUID).directReadLines(self.metafile)
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 272, in callCrabRPCFunction
    *args, **kwargs)
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 190, in callCrabRPCFunction
    raise err
OSError: [Errno 2] No such file or directory: '/rhev/data-center/mnt/rhs-client36.lab.eng.blr.redhat.com:_pure-dist/fa2ed64f-be0f-4b89-b29f-6e7642b0ffd3/dom_md/metadata'
Thread-26::DEBUG::2012-10-11 15:31:44,238::domainMonitor::212::Storage.DomainMonitorThread::(_monitorDomain) Domain fa2ed64f-be0f-4b89-b29f-6e7642b0ffd3 changed its status to Invalid
Thread-26::WARNING::2012-10-11 15:31:44,238::domainMonitor::219::Storage.DomainMonitorThread::(_monitorDomain) Could not emit domain state change event
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 215, in _monitorDomain
    self.onDomainConnectivityStateChange.emit(
AttributeError: 'DomainMonitorThread' object has no attribute 'onDomainConnectivityStateChange'
Thread-28::ERROR::2012-10-11 15:31:44,550::domainMonitor::204::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain a6f4ac9a-8aa5-452e-8747-151d57ffe3ef monitorin
g information
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 182, in _monitorDomain
    self.domain.selftest()
  File "/usr/share/vdsm/storage/nfsSD.py", line 134, in selftest
    fileSD.FileStorageDomain.selftest(self)
  File "/usr/share/vdsm/storage/fileSD.py", line 370, in selftest
    self.oop.os.statvfs(self.domaindir)
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 272, in callCrabRPCFunction
    *args, **kwargs)
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 190, in callCrabRPCFunction
    raise err
OSError: [Errno 107] Transport endpoint is not connected: '/rhev/data-center/mnt/rhs-client43.lab.eng.blr.redhat.com:_pure-replica/a6f4ac9a-8aa5-452e-8747-151d57ffe3ef'
Thread-28::DEBUG::2012-10-11 15:31:44,551::domainMonitor::212::Storage.DomainMonitorThread::(_monitorDomain) Domain a6f4ac9a-8aa5-452e-8747-151d57ffe3ef changed its status to Invalid
Thread-28::WARNING::2012-10-11 15:31:44,551::domainMonitor::219::Storage.DomainMonitorThread::(_monitorDomain) Could not emit domain state change event
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 215, in _monitorDomain
    self.onDomainConnectivityStateChange.emit(
AttributeError: 'DomainMonitorThread' object has no attribute 'onDomainConnectivityStateChange'
libvirtEventLoop::INFO::2012-10-11 15:31:44,620::libvirtvm::2027::vm.Vm::(_onAbnormalStop) vmId=`3cd7c575-8e95-4b81-b937-a189f6dc4604`::abnormal vm stop device virtio-disk0 error eperm
libvirtEventLoop::DEBUG::2012-10-11 15:31:44,621::libvirtvm::2481::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`3cd7c575-8e95-4b81-b937-a189f6dc4604`::event Suspended detail 2 opaque None
Thread-67885::DEBUG::2012-10-11 15:31:45,562::task::588::TaskManager.Task::(_updateState) Task=`13a1ee82-af8c-4e67-a1de-663366965ee3`::moving from state init -> state preparing
Thread-67885::INFO::2012-10-11 15:31:45,562::logUtils::37::dispatcher::(wrapper) Run and protect: getVolumeSize(sdUUID='7746e77b-7475-4fb8-ab7f-fd85773c5762', spUUID='20175cc4-e804-4434-a851-b1315510c5e5', imgUUID='bf486cb9-96f7-423e-8a64-30204da9a689', volUUID='6cde00c4-5e77-4d7e-a771-3774a766f080', options=None)

Comment 2 Amar Tumballi 2012-10-11 10:37:53 UTC
KP, can you lend a helping hand here?

Comment 4 Anush Shetty 2012-11-20 13:23:07 UTC
Verified with glusterfs-3.3.0rhsvirt1-8.el6rhs.x86_64