Description of problem: When we restart glusterd, we saw that active VMs moved to paused state. We were to able to start them again from RHEV-M Version-Release number of selected component (if applicable): RHEV-M rhevm-webadmin-portal-3.1.0-20.el6ev.noarch RHS # rpm -qa | grep glusterfs glusterfs-fuse-3.3.0rhsvirt1-7.el6rhs.x86_64 glusterfs-rdma-3.3.0rhsvirt1-7.el6rhs.x86_64 glusterfs-debuginfo-3.3.0rhsvirt1-7.el6rhs.x86_64 org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch glusterfs-3.3.0rhsvirt1-7.el6rhs.x86_64 glusterfs-server-3.3.0rhsvirt1-7.el6rhs.x86_64 glusterfs-devel-3.3.0rhsvirt1-7.el6rhs.x86_64 glusterfs-geo-replication-3.3.0rhsvirt1-7.el6rhs.x86_6 How reproducible: Consistently Steps to Reproduce: 1. Create gluster volume and add that a storage domain in RHEVM 2. Create VMs on the storage domain and activate them 3. Restart glusterd Actual results: VM has been paused due to a storage error Expected results: VMs shoudn't pause Additional info: Volume Name: dist-replica Type: Distributed-Replicate Volume ID: 39e0c10c-12d8-4484-b21d-a3be0cd0b7aa Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: rhs-client36.lab.eng.blr.redhat.com:/dist-replica1 Brick2: rhs-client37.lab.eng.blr.redhat.com:/dist-replica1 Brick3: rhs-client43.lab.eng.blr.redhat.com:/dist-replica1 Brick4: rhs-client44.lab.eng.blr.redhat.com:/dist-replica1 Options Reconfigured: performance.quick-read: disable performance.io-cache: disable performance.stat-prefetch: disable performance.read-ahead: disable storage.linux-aio: disable cluster.eager-lock: enable vdsm log: ] Thread-72275::DEBUG::2012-10-11 15:31:41,666::libvirtvm::243::vm.Vm::(_getDiskStats) vmId=`9539050b-1955-4e94-815d-7b7f4d40a9b7`::Disk hdc stats not available Thread-72275::DEBUG::2012-10-11 15:31:41,666::BindingXMLRPC::880::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'status': 'Paused', 'us ername': 'Unknown', 'memUsage': '0', 'acpiEnable': 'true', 'pid': '9725', 'displayIp': '0', 'displayPort': u'5904', 'session': 'Unknown', 'displaySecurePort': u'5905', 'timeOffset': '-4 3200', 'hash': '8932327113078622514', 'pauseCode': 'EOTHER', 'clientIp': '', 'kvmEnable': 'true', 'network': {u'vnet2': {'macAddr': '52:54:00:35:c9:ab', 'rxDropped': '0', 'rxErrors': '0 ', 'txDropped': '0', 'txRate': '0.0', 'rxRate': '0.0', 'txErrors': '0', 'state': 'unknown', 'speed': '1000', 'name': u'vnet2'}}, 'vmId': '9539050b-1955-4e94-815d-7b7f4d40a9b7', 'display Type': 'qxl', 'cpuUser': '0.00', 'disks': {u'vda': {'readLatency': '0', 'apparentsize': '2555904', 'writeLatency': '0', 'imageID': 'bf486cb9-96f7-423e-8a64-30204da9a689', 'flushLatency' : '0', 'readRate': '0.00', 'truesize': '2555904', 'writeRate': '0.00'}, u'hdc': {'flushLatency': '0', 'readLatency': '0', 'writeLatency': '0'}}, 'monitorResponse': '0', 'statsAge': '0.1 0', 'cpuIdle': '100.00', 'elapsedTime': '612276', 'vmType': 'kvm', 'cpuSys': '0.00', 'appsList': [], 'guestIPs': '', 'nice': ''}]} Thread-26::ERROR::2012-10-11 15:31:44,237::domainMonitor::204::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain fa2ed64f-be0f-4b89-b29f-6e7642b0ffd3 monitorin g information Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 184, in _monitorDomain self.nextStatus.readDelay = self.domain.getReadDelay() File "/usr/share/vdsm/storage/fileSD.py", line 166, in getReadDelay oop.getProcessPool(self.sdUUID).directReadLines(self.metafile) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 272, in callCrabRPCFunction *args, **kwargs) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 190, in callCrabRPCFunction raise err OSError: [Errno 2] No such file or directory: '/rhev/data-center/mnt/rhs-client36.lab.eng.blr.redhat.com:_pure-dist/fa2ed64f-be0f-4b89-b29f-6e7642b0ffd3/dom_md/metadata' Thread-26::DEBUG::2012-10-11 15:31:44,238::domainMonitor::212::Storage.DomainMonitorThread::(_monitorDomain) Domain fa2ed64f-be0f-4b89-b29f-6e7642b0ffd3 changed its status to Invalid Thread-26::WARNING::2012-10-11 15:31:44,238::domainMonitor::219::Storage.DomainMonitorThread::(_monitorDomain) Could not emit domain state change event Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 215, in _monitorDomain self.onDomainConnectivityStateChange.emit( AttributeError: 'DomainMonitorThread' object has no attribute 'onDomainConnectivityStateChange' Thread-28::ERROR::2012-10-11 15:31:44,550::domainMonitor::204::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain a6f4ac9a-8aa5-452e-8747-151d57ffe3ef monitorin g information Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 182, in _monitorDomain self.domain.selftest() File "/usr/share/vdsm/storage/nfsSD.py", line 134, in selftest fileSD.FileStorageDomain.selftest(self) File "/usr/share/vdsm/storage/fileSD.py", line 370, in selftest self.oop.os.statvfs(self.domaindir) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 272, in callCrabRPCFunction *args, **kwargs) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 190, in callCrabRPCFunction raise err OSError: [Errno 107] Transport endpoint is not connected: '/rhev/data-center/mnt/rhs-client43.lab.eng.blr.redhat.com:_pure-replica/a6f4ac9a-8aa5-452e-8747-151d57ffe3ef' Thread-28::DEBUG::2012-10-11 15:31:44,551::domainMonitor::212::Storage.DomainMonitorThread::(_monitorDomain) Domain a6f4ac9a-8aa5-452e-8747-151d57ffe3ef changed its status to Invalid Thread-28::WARNING::2012-10-11 15:31:44,551::domainMonitor::219::Storage.DomainMonitorThread::(_monitorDomain) Could not emit domain state change event Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 215, in _monitorDomain self.onDomainConnectivityStateChange.emit( AttributeError: 'DomainMonitorThread' object has no attribute 'onDomainConnectivityStateChange' libvirtEventLoop::INFO::2012-10-11 15:31:44,620::libvirtvm::2027::vm.Vm::(_onAbnormalStop) vmId=`3cd7c575-8e95-4b81-b937-a189f6dc4604`::abnormal vm stop device virtio-disk0 error eperm libvirtEventLoop::DEBUG::2012-10-11 15:31:44,621::libvirtvm::2481::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`3cd7c575-8e95-4b81-b937-a189f6dc4604`::event Suspended detail 2 opaque None Thread-67885::DEBUG::2012-10-11 15:31:45,562::task::588::TaskManager.Task::(_updateState) Task=`13a1ee82-af8c-4e67-a1de-663366965ee3`::moving from state init -> state preparing Thread-67885::INFO::2012-10-11 15:31:45,562::logUtils::37::dispatcher::(wrapper) Run and protect: getVolumeSize(sdUUID='7746e77b-7475-4fb8-ab7f-fd85773c5762', spUUID='20175cc4-e804-4434-a851-b1315510c5e5', imgUUID='bf486cb9-96f7-423e-8a64-30204da9a689', volUUID='6cde00c4-5e77-4d7e-a771-3774a766f080', options=None)
KP, can you lend a helping hand here?
Verified with glusterfs-3.3.0rhsvirt1-8.el6rhs.x86_64