Created attachment 856063 [details] engine.log Description of problem: Running VMs will go into paused state when storage connection is lost (usual). However when the connection is returned, they do not resume. This sounds very similar to BZ 1036358 but I believe my nightly version is newer than the patch. Version-Release number of selected component (if applicable): vdsm-4.14.0-33.gitf029545.el6.x86_64 How reproducible: Always Steps to Reproduce: 1. Stop the gluster volume which hosts the VM images 2. Wait for VMs to enter paused state 3. Re-start the gluster volume and wait for the SPM to be re-established 4. Wait a while to see if VMs will auto restart. 5. Try to manually resume a VM Actual results: No VMs are restarted and need to be forced off and then powered back on. Expected results: VMs should resume Additional info: In my logs, I attempt to manually resume dev1-melb-example-net
Created attachment 856064 [details] vdsm.log
Strange update... In 3.3.2 when they went into a paused state I used to be able to ungracefully power them off and start it again however this time when I ungracefully powered them off through the ovirt UI they appeared to be down in the UI but virsh list showed them as paused. virsh destroy vm Then starting them again through the UI seemed to be the only way.. I did not expect this to creep up so I only have a 5MB if you want me to upload them - sorry.
Nir, it looks very similar to BZ 1036358, but for Gluster.
(In reply to Sergey Gotliv from comment #3) > Nir, it looks very similar to BZ 1036358, but for Gluster. Looking in vdsm log, we don't see any vm that paused due to EIO error, so we don't exepct that any vm in this log will be unpaued when storage becomes accessible again. And indeed, when storage becomes accessible again, and the domain monitor correclty invoke the contEIOVms method, we can see that no vm is found as paused because of EIO error, and no vm is unapused, as expected. We can see lot of libvirt "eother" errors, like this: libvirtEventLoop::INFO::2014-01-28 00:13:17,614::vm::4507::vm.Vm::(_onAbnormalStop) vmId=`b3cde33c-f4c5-4b42-8e9d-820b80abc2fb`::abnormal vm stop device virtio-disk0 error eother The current code unpause only vms that paused because of EIO error. This is not related to bug 1036358, which was about unpausing vms paused because of EIO, when vdsm starts.
Andrew, is vdsm log complete, showing the time where the vms were paused? If not, please reproduce again, and include vdsm log showing the vms getting paused.
To understand better what the vms were stopped anbormally, please attach also libvirt and qemu logs. The logs are: /var/log/livbirtd.log /var/log/libvirt/qemu/vmname.log
Created attachment 858008 [details] libvirt.log Attached are the three logs you requested. The VM in question in this case is: rabbitmq1.melb.example.net
Created attachment 858009 [details] qemu/vm.log The last two "block I/O error in device 'drive-virtio-disk0': Bad file descriptor (9)" was me attempting to manually resume the VM through the engine.
Created attachment 858010 [details] vdsm.log
Setting target release to current version for consideration and review. please do not push non-RFE bugs to an undefined target release to make sure bugs are reviewed for relevancy, fix, closure, etc.
Nir, any update on this?
Andrew, This is a hosted engine setup, right?
Correct.
Andrew, Attached logs are too short and don't match, for example: Engine log starts at 2014-01-28 00:13:04,495 and ends at 2014-01-28 00:16:13,468 (3 minutes) VDSM log starts at 014-02-01 11:58:12,437 and ends at 2014-02-01 12:00:54,668 (2 minutes) Same about libvirt. So I see This: OSError: [Errno 2] No such file or directory: '/rhev/data-center/mnt/glusterSD/172.16.1.5:_VM-DATA/ff3a0c94-11dc-446d-b093-5f68ad81520d' And that: Exception: path to storage domain 562f0160-7b80-42e2-b248-5754455c40fc not found in /rhev/data-center/mnt But don't find the root cause because I don't see the entire flow. Can you attach a zip with full logs.
Sorry for the delay in getting back to you, naturally testing this requires some sort of downtime so it was hard to get this in motion. Steps: - I stopped all the other VMs and nulled all the logs so you could get a clean log view. - Waited a few minutes, then shut off the volume hosting the VM data. - I attempted to restart it manually both through the engine and through virsh. The VM here is nginxtemp I hope this helps.
Created attachment 867646 [details] ovirt-host logs (vdsm/libvirt)
Created attachment 867647 [details] ovirt-engine logs
This looks like virt issue. 1. Domain becomes valid: Thread-5162::DEBUG::2014-02-26 07:48:20,994::domainMonitor::247::Storage.DomainMonitorThread::(_monitorDomain) Domain ff3a0c94-11dc-446d-b093-5f68ad81520d changed its status to Valid 2. Domain monitor emit an event Thread-2496084::DEBUG::2014-02-26 07:48:20,996::misc::884::Event.Storage.DomainMonitor.onDomainStateChange::(_emit) Emitting event 3. Event handler look for puased vms Thread-2496084::DEBUG::2014-02-26 07:48:20,997::misc::894::Event.Storage.DomainMonitor.onDomainStateChange::(_emit) Calling registered method `contEIOVms` Thread-2496084::DEBUG::2014-02-26 07:48:20,998::misc::904::Event.Storage.DomainMonitor.onDomainStateChange::(_emit) Event emitted Thread-2496085::INFO::2014-02-26 07:48:20,999::clientIF::126::vds::(contEIOVms) vmContainerLock acquired Thread-2496085::INFO::2014-02-26 07:48:21,001::clientIF::133::vds::(contEIOVms) 4. A pause vm is found and unpaused Cont vm 66874a89-8745-4eff-b3d2-5fde7625f8dc in EIO libvirtEventLoop::DEBUG::2014-02-26 07:48:21,057::vm::5096::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`66874a89-8745-4eff-b3d2-5fde7625f8dc`::event Resumed detail 0 opaque None libvirtEventLoop::DEBUG::2014-02-26 07:48:21,065::vm::5096::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`66874a89-8745-4eff-b3d2-5fde7625f8dc`::event Resumed detail 0 opaque None 5. Vm stopped abnormally libvirtEventLoop::INFO::2014-02-26 07:48:21,068::vm::4505::vm.Vm::(_onAbnormalStop) vmId=`66874a89-8745-4eff-b3d2-5fde7625f8dc`::abnormal vm stop device virtio-disk0 error eother libvirtEventLoop::DEBUG::2014-02-26 07:48:21,068::vm::5096::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`66874a89-8745-4eff-b3d2-5fde7625f8dc`::event Suspended detail 2 opaque None libvirtEventLoop::INFO::2014-02-26 07:48:21,069::vm::4505::vm.Vm::(_onAbnormalStop) vmId=`66874a89-8745-4eff-b3d2-5fde7625f8dc`::abnormal vm stop device virtio-disk0 error eother Michal, can you check this?
This is an automated message. Re-targeting all non-blocker bugs still open on 3.4.0 to 3.4.1.
(In reply to Nir Soffer from comment #18) > 5. Vm stopped abnormally > libvirtEventLoop::INFO::2014-02-26 > 07:48:21,068::vm::4505::vm.Vm::(_onAbnormalStop) > vmId=`66874a89-8745-4eff-b3d2-5fde7625f8dc`::abnormal vm stop device > virtio-disk0 error eother > libvirtEventLoop::DEBUG::2014-02-26 > 07:48:21,068::vm::5096::vm.Vm::(_onLibvirtLifecycleEvent) > vmId=`66874a89-8745-4eff-b3d2-5fde7625f8dc`::event Suspended detail 2 opaque > None > libvirtEventLoop::INFO::2014-02-26 > 07:48:21,069::vm::4505::vm.Vm::(_onAbnormalStop) > vmId=`66874a89-8745-4eff-b3d2-5fde7625f8dc`::abnormal vm stop device > virtio-disk0 error eother > > Michal, can you check this? Michal, can anyone from your team take a look please?
so in other words VM was in EIO and after resume it immediately got paused egain with EOTHER error
2014-02-25 20:48:21.034+0000: 2852: debug : qemuProcessHandleResume:776 : Transitioned guest xxx out of paused into resumed state followed immediately by 2014-02-25 20:48:21.041+0000: 2852: debug : qemuMonitorIOProcess:354 : QEMU_MONITOR_IO_PROCESS: mon=0x7f61600c5dd0 buf={"timestamp": {"seconds": 1393361301, "microseconds": 38479}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk0", "__com.redhat_debug_info": {"message": "Bad file descriptor", "errno": 9}, "__com.redhat_reason": "eother", "operation": "write", "action": "stop"}} {"timestamp": {"seconds": 1393361301, "microseconds": 41019}, "event": "STOP"} {"timestamp": {"seconds": 1393361301, "microseconds": 41368}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk0", "__com.redhat_debug_info": {"message": "Bad file descriptor", "errno": 9}, "__com.redhat_reason": "eother", "operation": "write", "action": "stop"}} len=640
and anything relevant in QEMU logs (is it the right one, though? I'm missing the cmdline) is: block I/O error in device 'drive-virtio-disk0': Read-only file system (30) block I/O error in device 'drive-virtio-disk0': Read-only file system (30) block I/O error in device 'drive-virtio-disk0': Bad file descriptor (9) block I/O error in device 'drive-virtio-disk0': Bad file descriptor (9) block I/O error in device 'drive-virtio-disk0': Bad file descriptor (9) block I/O error in device 'drive-virtio-disk0': Bad file descriptor (9) block I/O error in device 'drive-virtio-disk0': Bad file descriptor (9) block I/O error in device 'drive-virtio-disk0': Bad file descriptor (9) so nothing virt here...
(In reply to Michal Skrivanek from comment #23) > and anything relevant in QEMU logs (is it the right one, though? I'm missing > the cmdline) is: > block I/O error in device 'drive-virtio-disk0': Read-only file system (30) > block I/O error in device 'drive-virtio-disk0': Read-only file system (30) > block I/O error in device 'drive-virtio-disk0': Bad file descriptor (9) > block I/O error in device 'drive-virtio-disk0': Bad file descriptor (9) > block I/O error in device 'drive-virtio-disk0': Bad file descriptor (9) > block I/O error in device 'drive-virtio-disk0': Bad file descriptor (9) > block I/O error in device 'drive-virtio-disk0': Bad file descriptor (9) > block I/O error in device 'drive-virtio-disk0': Bad file descriptor (9) > > so nothing virt here... We (storage) did hour part, detecting the that the storage became valid, and tried to continue the vm. Now tha the vm got stopped, I don't see how can we help in this case. Maybe qemu folks can help?
maybe. I don't really know. My area starts when I get the error of Bad file descriptor again after resume.
Michael, can you look at this? A vm was paused by EIO error, storage became available, paused vm was unpaused, fail during the unpause. See comment 18, comment 22, and comment 23.
(In reply to Nir Soffer from comment #26) > A vm was paused by EIO error, storage became available, paused vm was > unpaused, fail during the unpause. See comment 18, comment 22, and comment > 23. The QEMU process is getting EROFS and EBADF when the file for 'drive-virtio-disk0' on a Gluster FUSE mount: /rhev/data-center/mnt/glusterSD/172.16.1.5:_VM-DATA/ff3a0c94-11dc-446d-b093-5f68ad81520d/images/a746f514-51eb-4926-80d5-545108438f01/ba8685ec-668a-4780-88ba-0b114e1a7e7d QEMU just makes preadv(2)/pwritev(2)/fdatasync(2) syscalls but the error is coming from the Gluster FUSE daemon. I suspect the Gluster logs and maybe dmesg on the KVM host will contain more information on why the Gluster FUSE daemon was unable to process I/O requests. Perhaps the process died and the FUSE mount went away? Adding Vijay for advice on which logs to collect from Gluster.
Can you please provide a tarball of /var/log/glusterfs on the hypervisor when this happens?
Hi, Attached is my gluster logs, my hypervisors run both vdsm and gluster-server btw. I also did a few extra testing on this, it seems if the gluster storage is removed/lost and then returned within the SPM timeout the VM will automatically resume. I believe this is 5 minutes. However even within that 5 minutes, if the user attempts to manually resume the VM when there is no storage it will attempt to resume it and fail. This probably giving the resume too early issue. I think the engine should prevent the user from resuming the VM if the datastore is unavailable.. if there is an ISO attached to the VM it will not allow the resume because the ISO domain is inactive (as it relies on the data domain which is inactive). That's a little weird that only the ISO domain gets checked, but the data domain doesn't.. Back to the point, after the 5 or so minute timeout, when the gluster domain returns I'm seeing it attempt to resume the instance as my cursor (within the console) resumes for about 5 seconds and then times out again with the engine reporting the VM resume failed. HTH
Created attachment 884735 [details] gluster.tar
I recently tried the network.ping-timeout option: gluster volume set vm-data network.ping-timeout 10 The qemu logs now report: block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107)
> However even within that 5 minutes, if the user attempts to manually resume the VM when there is no storage it will attempt to resume it and fail. > If the storage is unavailable, attempting to resume a 'paused' vm is not 'expected to work' anyway. > I think the engine should prevent the user from resuming the VM if the datastore is unavailable. ... if there is an ISO attached to the VM it will not allow the resume because the ISO domain is inactive (as it relies on the data domain which is inactive). That's a little weird that only the ISO domain gets checked, but the data domain doesn't.. > some one from vdsm team can comment on this. > Back to the point, after the 5 or so minute timeout, when the gluster domain returns I'm seeing it attempt to resume the instance as my cursor (within the console) resumes for about 5 seconds and then times out again with the engine reporting the VM resume failed. > This error mostly indicates that glusterfs was unable to serve I/O request.. common cause of this error are network interruption or brick is not available. Client log : ------------------ [2014-04-10 03:35:58.620542] W [socket.c:514:__socket_rwv] 0-vm-sata-client-0: readv failed (No data available) [2014-04-10 03:35:58.620709] W [socket.c:1962:__socket_proto_state_machine] 0-vm-sata-client-0: reading from socket failed. Error (No data available), peer (172.16.2.11:50153) [2014-04-10 03:35:58.620832] I [client.c:2098:client_rpc_notify] 0-vm-sata-client-0: disconnected [2014-04-10 03:35:59.646518] W [afr-inode-write.c:457:afr_writev] 0-vm-sata-replicate-0: failing writev due to lack of quorum [2014-04-10 03:35:59.646588] W [fuse-bridge.c:2167:fuse_writev_cbk] 0-glusterfs-fuse: 41153: WRITE => -1 (Read-only file system) [2014-04-10 03:36:00.237352] W [afr-inode-write.c:457:afr_writev] 0-vm-sata-replicate-0: failing writev due to lack of quorum [2014-04-10 03:36:00.237419] W [fuse-bridge.c:2167:fuse_writev_cbk] 0-glusterfs-fuse: 41163: WRITE => -1 (Read-only file system) [2014-04-10 03:36:00.829441] W [afr-inode-write.c:457:afr_writev] 0-vm-sata-replicate-0: failing writev due to lack of quorum [2014-04-10 03:36:00.829524] W [fuse-bridge.c:2167:fuse_writev_cbk] 0-glusterfs-fuse: 41173: WRITE => -1 (Read-only file system) [2014-04-10 03:36:00.829827] W [socket.c:514:__socket_rwv] 0-vm-sata-client-1: readv failed (No data available) Brick log says: [2014-04-10 03:35:58.614444] W [glusterfsd.c:1002:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x7fc2c4123b6d] (-->/lib64/libpthread.so.0(+0x79d1) [0x7fc2c47b69d1] (-->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xcd) [0x40533d]))) 0-: received signum (15), shutting down [2014-04-10 03:35:58.614494] W [glusterfsd.c:1002:cleanup_and_exit] (-->/lib64/libc.so.6(+0x43bf0) [0x7fc2c407ebf0] (-->/usr/lib64/libglusterfs.so.0(synctask_wrap+0x12) [0x7fc2c4e3eaa2] (-->/usr/sbin/glusterfsd(glusterfs_handle_terminate+0x15) [0x40a5f5]))) 0-: received signum (15), shutting down [2014-04-10 03:43:18.258953] I [glusterfsd.c:1910:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.4.3 (/usr/sbin/glusterfsd -s 172.16.2.11 --volfile-id vm-sata.172.16.2.11.data1-vm-sata -p /var/lib/glusterd/vols/vm-sata/run/172.16.2.11-data1-vm-sata.pid -S /var/run/880d32267c8ea6e63cb8edaff26364f3.socket --brick-name /data1/vm-sata -l /var/log/glusterfs/bricks/data1-vm-sata.log --xlator-option *-posix.glusterd-uuid=986c80cc-3ad9-4012-81fd-a18b559245a8 --brick-port 50153 --xlator-option vm-sata-server.listen-port=50153) It looks like brick process was down , so client failed on I/O. iuc, you have 2 servers (172.16.2.11 &172.16.2.12) which serves bricks . Are you sure the storage domain came back and it was available when you tried vm 'resume' ? If yes, can you provide the logs from other server as well and give more details on : *) Gluster configuration of your setup. *) Steps to reproduce this issue *) Version of subjected packages *) Glusterd logs from both servers and also provide qemu/vm logs at the problemtic timestamp.
I tried this again with the following setup: 3 hosts gluster volume: vm-data 2 bricks (replicated) only on two hosts (172.16.2.11, 172.16.2.12) third host (172.16.2.13) part of the gluster peer group, but not a brick. Gluster storage volume mounted as localhost:/vm-data VM running on third host. Replicating a switch failure by just running a reboot, the outage was about 1 minute tops. When the network returned, the storage domain was set back to active. However the vm 'foreman1' was set to paused state. Only two lines exist in 'qemu/foreman1.log' block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) Attempting to resume it manually would just result in another entry. Gluster Version: glusterfs-libs-3.5.0-2.el6.x86_64 glusterfs-cli-3.5.0-2.el6.x86_64 glusterfs-3.5.0-2.el6.x86_64 glusterfs-api-3.5.0-2.el6.x86_64 glusterfs-fuse-3.5.0-2.el6.x86_64 glusterfs-server-3.5.0-2.el6.x86_64 glusterfs-rdma-3.5.0-2.el6.x86_64 vdsm-gluster-4.14.9-0.el6.noarch Brick Info: [root@ov-hv1-2a-08-23 bricks]# gluster volume info vm-data Volume Name: vm-data Type: Replicate Volume ID: 9c10ce6c-5d94-42a6-81be-d628e813da48 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 172.16.2.11:/data1/vm-data Brick2: 172.16.2.12:/data1/vm-data Options Reconfigured: network.ping-timeout: 10 nfs.disable: yes storage.owner-gid: 36 storage.owner-uid: 36 server.allow-insecure: on cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off auth.allow: 172.16.*.* cluster.server-quorum-ratio: 51% Logs attached, the failure occured at 09:52
Created attachment 909506 [details] gluster brick logs Attached are the logs from the two gluster storage (hv1, hv2). hv3-client contains the virt host (/var/log/glusterfs/rhev-data-center-mnt-glusterSD-localhost\:_vm-data.log )
Thanks for providing the details and logs. From the same, it looks like 'quorum' playing a role here : [2014-04-10 03:35:59.646518] W [afr-inode-write.c:457:afr_writev] 0-vm-sata-replicate-0: failing writev due to lack of quorum [2014-04-10 03:35:59.646588] W [fuse-bridge.c:2167:fuse_writev_cbk] 0-glusterfs-fuse: 41153: WRITE => -1 (Read-only file system) [2014-04-10 03:36:00.237352] W [afr-inode-write.c:457:afr_writev] 0-vm-sata-replicate-0: failing writev due to lack of quorum [2014-04-10 03:36:00.237419] W [fuse-bridge.c:2167:fuse_writev_cbk] 0-glusterfs-fuse: 41163: WRITE => -1 (Read-only file system) [2014-04-10 03:36:00.829441] W [afr-inode-write.c:457:afr_writev] 0-vm-sata-replicate-0: failing writev due to lack of quorum [2014-04-10 03:36:00.829524] W [fuse-bridge.c:2167:fuse_writev_cbk] 0-glusterfs-fuse: 41173: WRITE => -1 (Read-only file system) [2014-04-10 03:36:00.829827] W [socket.c:514:__socket_rwv] 0-vm-sata-client-1: readv failed (No data available) I am involving one of the devel (who works on this component) from glusterfs team to check further on this. Btw, iic, you have/had enabled 'virt' group profile [1]on the subjected volume at problematic time. can you please confirm? [1] https://access.redhat.com/site/documentation/en-US/Red_Hat_Storage/2.1/html-single/Quick_Start_Guide/index.html#configure_vols_cli
Quorum would kick in as the switch would remove all network connectivity between any servers for that period of time. I have applied the 'virt' group profile as well as the extra ones like: storage.owner-gid: 36 storage.owner-uid: 36 server.allow-insecure: on network.ping-timeout: 10 The network.ping-timeout was recommended for the virt timeout as well. It appears to have changed the qemu output from a more "serious" block I/O error in device 'drive-virtio-disk0': Bad file descriptor (9) to block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) Although, from the earlier logs (last year) was oVirt 3.3 and Gluster 3.4.
I have tried a new setup, WITHOUT the 'recommended' param: network.ping-timeout: 10 I broke a quorum by killing gluster on one host. After restoring the connection. One out of three VMs now resumed from the paused state. The VM which resumed, was doing active writes at the time of the killing of gluster, was paused first. It was the only one which resumed.
Did some more research on this topic and here it is: In your environment, client and server side quorum has been enabled. Your configuration looks like this: Number of Bricks: 1 x 2 = 2 Brick1: 172.16.2.11:/data1/vm-data Brick2: 172.16.2.12:/data1/vm-data If client quorum is enabled and type is 'auto', then for odd replica count, quorum is met if the number of up bricks is more than 50%. For even replica count ( which is the case here) , it is met if more than 50% of bricks are up. If exactly 50% of the bricks are up, then the first brick (( here Brick1: 172.16.2.11:/data1/vm-data)) has to be up. For server side quorum, as you have 3 servers, you loose quorum only if 2 nodes are down. >> I broke a quorum by killing gluster on one host. After restoring the connection. One out of three VMs now resumed from the paused state. The VM which resumed, was doing active writes at the time of the killing of gluster, was paused first. It was the only one which resumed. >> In this scenario, it looks like only *one* node was in trouble. Can you please let us know which node was that ? if it is "the node which serve first brick", readonly fs and vm pause ..etc are expected. However I dont know why other 2 vms failed to resume when we got the quorum ( which also means the volume started to work) back. I think only logs can tell something about this.
The bug is still present in oVirt 3.5. I have a gluster storage with no replica and client and server quorum disabled. VM was working fine until I stopped and restarted the gluster volume via ovirt-engine. VM got paused and it is impossible to restart the VM. Shutting down and restarting the VM works, but it will destroy the VM. VDSM says: - abnormal vm stop device virtio-disk0 error eother - desktopLock failed Please increase severity, since VMs are paused on network errors and will fail to resume.
Please ignore my comment. I noticed that another VM resumed succesfully, so I will have to research deeper.
Ok. One VM is unable to resume. The reason in my case seems to be "Transport endpoint is not connected" (http://www.gluster.org/community/documentation/index.php/Transport_endpoint_is_not_connected) To reproduce, I'm just stopping and restarting a gluster volume (with no replica). This are the libvirtd logs: 2015-02-24 22:02:05.034+0000: 19733: debug : virDomainGetXMLDesc:4363 : dom=0x7fe6e8003760, (VM: name=Backup-Server, uuid=56c30083-9248-4145-92be-38109b3536e1), flags=0 2015-02-24 22:02:05.034+0000: 19733: debug : virDomainFree:2281 : dom=0x7fe6e8003760, (VM: name=Backup-Server, uuid=56c30083-9248-4145-92be-38109b3536e1) 2015-02-24 22:02:05.036+0000: 19731: debug : virDomainResume:2393 : dom=0x7fe6f000ea20, (VM: name=Backup-Server, uuid=56c30083-9248-4145-92be-38109b3536e1) 2015-02-24 22:02:05.036+0000: 19731: debug : qemuDomainObjBeginJobInternal:847 : Starting job: modify (async=none) 2015-02-24 22:02:05.036+0000: 19731: debug : qemuProcessStartCPUs:2922 : Using lock state '(null)' 2015-02-24 22:02:05.036+0000: 19731: debug : qemuMonitorStartCPUs:1217 : mon=0x7fe6f00d6de0 2015-02-24 22:02:05.036+0000: 19731: debug : qemuMonitorSend:911 : QEMU_MONITOR_SEND_MSG: mon=0x7fe6f00d6de0 msg={"execute":"cont","id":"libvirt-238"} 2015-02-24 22:02:05.037+0000: 19726: debug : qemuMonitorIOWrite:463 : QEMU_MONITOR_IO_WRITE: mon=0x7fe6f00d6de0 buf={"execute":"cont","id":"libvirt-238"} 2015-02-24 22:02:05.046+0000: 19726: debug : qemuMonitorIOProcess:355 : QEMU_MONITOR_IO_PROCESS: mon=0x7fe6f00d6de0 buf={"timestamp": {"seconds": 1424815325, "microseconds": 46322}, "event": "RESUME"} 2015-02-24 22:02:05.046+0000: 19726: debug : qemuMonitorEmitResume:1035 : mon=0x7fe6f00d6de0 2015-02-24 22:02:05.046+0000: 19726: debug : qemuProcessHandleResume:777 : Transitioned guest Backup-Server out of paused into resumed state 2015-02-24 22:02:05.046+0000: 19726: debug : qemuProcessHandleResume:785 : Using lock state '(null)' on resume event 2015-02-24 22:02:05.113+0000: 19726: debug : virDomainFree:2281 : dom=0x114a130, (VM: name=Backup-Server, uuid=56c30083-9248-4145-92be-38109b3536e1) 2015-02-24 22:02:05.113+0000: 19726: debug : qemuMonitorIOProcess:355 : QEMU_MONITOR_IO_PROCESS: mon=0x7fe6f00d6de0 buf={"timestamp": {"seconds": 1424815325, "microseconds": 47842}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk0", "__com.redhat_debug_info": {"message": "Transport endpoint is not connected", "errno": 107}, "__com.redhat_reason": "eother", "operation": "write", "action": "stop"}} 2015-02-24 22:02:05.114+0000: 19726: debug : qemuMonitorEmitIOError:1077 : mon=0x7fe6f00d6de0 2015-02-24 22:02:05.145+0000: 19731: debug : qemuDomainObjEndJob:992 : Stopping job: modify (async=none) 2015-02-24 22:02:05.145+0000: 19731: debug : virDomainFree:2281 : dom=0x7fe6f000ea20, (VM: name=Backup-Server, uuid=56c30083-9248-4145-92be-38109b3536e1) 2015-02-24 22:02:05.145+0000: 19726: debug : qemuProcessHandleIOError:945 : Transitioned guest Backup-Server to paused state due to IO error 2015-02-24 22:02:05.146+0000: 19726: debug : qemuProcessHandleIOError:955 : Preserving lock state '(null)' 2015-02-24 22:02:05.195+0000: 19726: debug : qemuMonitorEmitStop:1025 : mon=0x7fe6f00d6de0 2015-02-24 22:02:05.196+0000: 19726: debug : qemuMonitorEmitIOError:1077 : mon=0x7fe6f00d6de0 2015-02-24 22:02:05.196+0000: 19726: debug : qemuMonitorEmitIOError:1077 : mon=0x7fe6f00d6de0 2015-02-24 22:02:05.196+0000: 19726: debug : virDomainFree:2281 : dom=0x115b160, (VM: name=Backup-Server, uuid=56c30083-9248-4145-92be-38109b3536e1) 2015-02-24 22:02:05.196+0000: 19726: debug : virDomainFree:2281 : dom=0x115b160, (VM: name=Backup-Server, uuid=56c30083-9248-4145-92be-38109b3536e1) 2015-02-24 22:02:05.196+0000: 19726: debug : virDomainFree:2281 : dom=0x115b160, (VM: name=Backup-Server, uuid=56c30083-9248-4145-92be-38109b3536e1) 2015-02-24 22:02:05.196+0000: 19726: debug : virDomainFree:2281 : dom=0x115b160, (VM: name=Backup-Server, uuid=56c30083-9248-4145-92be-38109b3536e1) 2015-02-24 22:02:05.196+0000: 19726: debug : virDomainFree:2281 : dom=0x115b160, (VM: name=Backup-Server, uuid=56c30083-9248-4145-92be-38109b3536e1) 2015-02-24 22:02:05.197+0000: 19726: debug : qemuMonitorIOProcess:355 : QEMU_MONITOR_IO_PROCESS: mon=0x7fe6f00d6de0 buf={"timestamp": {"seconds": 1424815325, "microseconds": 50560}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk0", "__com.redhat_debug_info": {"message": "Transport endpoint is not connected", "errno": 107}, "__com.redhat_reason": "eother", "operation": "write", "action": "stop"}} 2015-02-24 22:02:05.197+0000: 19726: debug : qemuMonitorEmitIOError:1077 : mon=0x7fe6f00d6de0 2015-02-24 22:02:05.197+0000: 19726: debug : qemuMonitorEmitIOError:1077 : mon=0x7fe6f00d6de0 2015-02-24 22:02:05.197+0000: 19726: debug : virDomainFree:2281 : dom=0x114a050, (VM: name=Backup-Server, uuid=56c30083-9248-4145-92be-38109b3536e1) 2015-02-24 22:02:05.197+0000: 19726: debug : virDomainFree:2281 : dom=0x114a050, (VM: name=Backup-Server, uuid=56c30083-9248-4145-92be-38109b3536e1) 2015-02-24 22:02:05.198+0000: 19730: debug : virDomainGetInfo:4213 : dom=0x7fe6fc00cd20, (VM: name=Backup-Server, uuid=56c30083-9248-4145-92be-38109b3536e1), info=0x7fe710746ad0 2015-02-24 22:02:05.198+0000: 19730: debug : qemudGetProcessInfo:1351 : Got status for 2205/0 user=1409 sys=537 cpu=1 rss=73001 2015-02-24 22:02:05.198+0000: 19730: debug : virDomainFree:2281 : dom=0x7fe6fc00cd20, (VM: name=Backup-Server, uuid=56c30083-9248-4145-92be-38109b3536e1) 2015-02-24 22:02:05.299+0000: 19734: debug : virDomainGetXMLDesc:4363 : dom=0x7fe7040e8500, (VM: name=Backup-Server, uuid=56c30083-9248-4145-92be-38109b3536e1), flags=0 2015-02-24 22:02:05.299+0000: 19734: debug : virDomainFree:2281 : dom=0x7fe7040e8500, (VM: name=Backup-Server, uuid=56c30083-9248-4145-92be-38109b3536e1) 2015-02-24 22:02:05.300+0000: 19732: debug : virDomainGetXMLDesc:4363 : dom=0x7fe6ec0d92d0, (VM: name=Backup-Server, uuid=56c30083-9248-4145-92be-38109b3536e1), flags=0 2015-02-24 22:02:05.301+0000: 19732: debug : virDomainFree:2281 : dom=0x7fe6ec0d92d0, (VM: name=Backup-Server, uuid=56c30083-9248-4145-92be-38109b3536e1)
The other message I get is: virDomainObjGetMetadata:16665 : metadata not found: Requested metadata element is not present Sometimes stopping gluster volume + restarting gluster volume + resuming the VM works, and sometimes not. In this case, shutting down the VM is the only way to bring it up again.
I discovered something important: The qemu process of the paused VM is accessing a wrong image path. lsof is returning: qemu-kvm 1404 qemu 11u unknown /b55d0462-0dec-41a3-b0fd-6225fc5cf248/images/393a14c3-99df-4ffd-8ded-0705a22d3304/dc9d4128-3172-4259-9923-54db19347fd2 (stat: No such file or directory) After restarting the VM, the paths are correct: qemu-kvm 7781 qemu 11u REG 0,34 647364608 11256586019595226674 /rhev/data-center/mnt/glusterSD/h1.imatronix.com:gluster-storage/b55d0462-0dec-41a3-b0fd-6225fc5cf248/images/393a14c3-99df-4ffd-8ded-0705a22d3304/dc9d4128-3172-4259-9923-54db19347fd2 This is the reason while the VM cannot be started again.
No. False alarm. It seems that lsof is not always returning the correct path. Lsof's FAQ says "3.1.2 Why does lsof report the wrong path names?" Additional info: 1) When stopping and resuming the gluster volume, it is also not possible to resume the VM via virsh (libvirt). In libvirt/qemu/VM logs, I get a: block I/O error in device 'drive-virtio-disk0': Transport endpoint is not connected (107) 2) If the VM is suspended in oVirt before stopping and resuming the gluster volume, it can be resumed via oVirt. 3) If the VM is suspended in virsh before stopping and resuming the gluster volume, it can be resumed via virsh. Maybe the file descriptors pointing to the gluster volume must be reopened after restarting the gluster volume.
I confirmed with people from gluster that the file descriptors pointing to the gluster-mount should be reopened in QEMU after restarting the volume. Can someone please confirm that the file handlers are reopened in QEMU when libvirt resumes a VM? I'm not sure if this also applies for network problems, but IMO we should provide a stable solution for resuming VMs even when glusterd or the volume is restarted.
Stefan from Qemu-devel, confirmed that "QEMU doesn't reopen file descriptors". Thus, if all assumptions are correct, QEMU must be patched to reopen file descriptors when resuming VMs to fix this issue.
I reported the problem to QEMU here: https://bugs.launchpad.net/qemu/+bug/1450891 Please update BZ as it won't be fixed in oVirt nor GluserFS, but in QEMU. Note that this issue was triggered by this other one (fixed in Gerrit): https://bugzilla.redhat.com/show_bug.cgi?id=1201355
It's not possible to reopen the image file in QEMU, because it could lead to data corruption. If the glusterfs endpoint must be shutdown for some reason, the VM must be suspended before invalidating the file descriptors. Any way, the main reason the file descriptors got invalidated, was because of BZ 1172905: https://bugzilla.redhat.com/show_bug.cgi?id=1172905 This issue is just the consequence.
Very interesting. Thanks Christopher. I'm an onsite consultant doing PoC and we're seeing the same issue here. Using 3 replicas with default quorum ratio renders our volume read-only during a test kill of 2 bricks (by design). We're debating adjusting quorum ratio manually to allow a single replica to be considered quorate for read/write. Storage is fairly stable RAID6 with hot spare, and customer is aware that a single brick in rw mode will not be considered unsafe with no replication. Looking forward to your fix in 3.6A.
Hi John, We dropped gluster replication after finding out that writes are synchronic (i.e. performance will be limited to storage network bandwith). We are waiting for async write support in stable gluster branches. A 1 Gbps network would restrict disk writes to 125 MB/s. Having multiple VM's means sharing those slow resources. Besides, gluster doesn't suport SEEK_HOLE reads, giving very bad performance with raw sparse files. There were some patches to implement SEEK_HOLE support, but they probably won't be merged in the short term: https://bugzilla.redhat.com/show_bug.cgi?id=1220173
Tested on 3.6-rc1 on CentOS 7. Patches verified in production for some months. Can be easily verified by checking that glusterd service is NOT running inside VDSM group. The main issue was: https://bugzilla.redhat.com/show_bug.cgi?id=1201355#c7
Looks like latest releases of glusterfs releases work totally fine with oVirt releases. Please reopen if these issues are seen when used with glusterfs-5.5 or 6.0 versions.