Description of problem: ----------------------- When VMs are created with qemu's native driver for glusterfs ( which uses libgfapi ), I/O errors are seen. Version-Release number of selected component (if applicable): -------------------------------------------------------------- RHEL 7.2 RHGS 3.2.0 interim build ( glusterfs-3.8.4-3.el7rhgs ) libvirt-1.2.17-13.el7_2.5.x86_64 qemu-img-1.5.3-105.el7_2.7.x86_64 qemu-kvm-1.5.3-105.el7_2.7.x86_64 qemu-kvm-common-1.5.3-105.el7_2.7.x86_64 How reproducible: ----------------- Always Steps to Reproduce: ------------------- 1. Create a replica 3 volume and optimize the volume for storing VM images 2. Create a VM image on the volume 3. Create a VM to use that image file and start the VM Actual results: --------------- Unable to install OS on the VM. I/O errors are observed Expected results: ----------------- No I/O Errors
Error messages as reported in QEMU log : /var/log/libvirt/qemu/vm2.log 2016-10-25 15:03:55.357+0000: starting up libvirt version: 1.2.17, package: 13.el7_2.5 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-05-26-07:48:46, x86-020.build.eng.bos.redhat.com), qemu version: 1.5.3 (qemu-kvm-1.5.3-105.el7_2.7) LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name vm2 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -cpu SandyBridge -m 4096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid abf54c1a-e1be-4c8e-a3ef-fd04191395ba -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-vm2/monitor.sock,server,nowait -mon cha rdev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot order=c,menu= on,strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci .0,addr=0x6.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=gluster://dhcp37-172.lab.eng.blr.redhat.com:24 007/rep3vol/vm3.img,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,fd=24,id=hostnet0 -device rtl8139,netdev=hostn et0,id=net0,mac=52:54:00:ac:13:40,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -global qxl-vga.vgamem_mb=16 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on char device redirected to /dev/pts/1 (label charserial0) [2016-10-25 15:03:56.483026] I [MSGID: 104045] [glfs-master.c:91:notify] 0-gfapi: New graph 7268732d-636c-6965-6e74-31302e6c6162 (0) coming up [2016-10-25 15:03:56.483076] I [MSGID: 114020] [client.c:2356:notify] 0-rep3vol-client-0: parent translators are ready, attempting connect on transport [2016-10-25 15:03:56.487260] I [MSGID: 114020] [client.c:2356:notify] 0-rep3vol-client-1: parent translators are ready, attempting connect on transport [2016-10-25 15:03:56.489211] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-rep3vol-client-0: changing port to 49152 (from 0) [2016-10-25 15:03:56.490961] I [MSGID: 114020] [client.c:2356:notify] 0-rep3vol-client-2: parent translators are ready, attempting connect on transport [2016-10-25 15:03:56.495305] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-rep3vol-client-1: changing port to 49152 (from 0) [2016-10-25 15:03:56.496954] I [MSGID: 114057] [client-handshake.c:1446:select_server_supported_programs] 0-rep3vol-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-10-25 15:03:56.498682] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-rep3vol-client-2: changing port to 49152 (from 0) [2016-10-25 15:03:56.500015] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 0-rep3vol-client-0: Connected to rep3vol-client-0, attached to remote volume '/gluster/brick1/b1'. [2016-10-25 15:03:56.500040] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 0-rep3vol-client-0: Server and Client lk-version numbers are not same, reopening the fds [2016-10-25 15:03:56.500112] I [MSGID: 108005] [afr-common.c:4430:afr_notify] 0-rep3vol-replicate-0: Subvolume 'rep3vol-client-0' came back up; going online. [2016-10-25 15:03:56.500882] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-rep3vol-client-0: Server lk version = 1 [2016-10-25 15:03:56.501540] I [MSGID: 114057] [client-handshake.c:1446:select_server_supported_programs] 0-rep3vol-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-10-25 15:03:56.503869] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 0-rep3vol-client-1: Connected to rep3vol-client-1, attached to remote volume '/gluster/brick1/b1'. [2016-10-25 15:03:56.503889] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 0-rep3vol-client-1: Server and Client lk-version numbers are not same, reopening the fds [2016-10-25 15:03:56.504367] I [MSGID: 114057] [client-handshake.c:1446:select_server_supported_programs] 0-rep3vol-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-10-25 15:03:56.504522] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-rep3vol-client-1: Server lk version = 1 [2016-10-25 15:03:56.506745] I [MSGID: 114046] [client-handshake.c:1222:client_setvolume_cbk] 0-rep3vol-client-2: Connected to rep3vol-client-2, attached to remote volume '/gluster/brick1/b1'. [2016-10-25 15:03:56.506767] I [MSGID: 114047] [client-handshake.c:1233:client_setvolume_cbk] 0-rep3vol-client-2: Server and Client lk-version numbers are not same, reopening the fds [2016-10-25 15:03:56.525522] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-rep3vol-client-2: Server lk version = 1 [2016-10-25 15:03:56.529824] I [MSGID: 104041] [glfs-resolve.c:885:__glfs_active_subvol] 0-rep3vol: switched to graph 7268732d-636c-6965-6e74-31302e6c6162 (0) block I/O error in device 'drive-virtio-disk0': Input/output error (5) main_channel_link: add main channel client main_channel_handle_parsed: net test: latency 537.039000 ms, bitrate 372552 bps (0.355293 Mbps) LOW BANDWIDTH red_dispatcher_set_cursor_peer: inputs_connect: inputs channel client create block I/O error in device 'drive-virtio-disk0': Input/output error (5) block I/O error in device 'drive-virtio-disk0': Input/output error (5) block I/O error in device 'drive-virtio-disk0': Input/output error (5) block I/O error in device 'drive-virtio-disk0': Input/output error (5) block I/O error in device 'drive-virtio-disk0': Input/output error (5) block I/O error in device 'drive-virtio-disk0': Input/output error (5) block I/O error in device 'drive-virtio-disk0': Input/output error (5) block I/O error in device 'drive-virtio-disk0': Input/output error (5) block I/O error in device 'drive-virtio-disk0': Input/output error (5) block I/O error in device 'drive-virtio-disk0': Input/output error (5) block I/O error in device 'drive-virtio-disk0': Input/output error (5) block I/O error in device 'drive-virtio-disk0': Input/output error (5) block I/O error in device 'drive-virtio-disk0': Input/output error (5) block I/O error in device 'drive-virtio-disk0': Input/output error (5) block I/O error in device 'drive-virtio-disk0': Input/output error (5) block I/O error in device 'drive-virtio-disk0': Input/output error (5)
Created attachment 1213992 [details] QEMU logs from the RHEL 7.2 hypervisor
I have disabled compound-fops and client-io-threads , then tested the same. Still I see the same problem - I/O errors
> [...] 0-rep3vol-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) > [...] 0-rep3vol-client-0: Server and Client lk-version numbers are not same, reopening the fds Is it possible that simply the client version in the test setup is too old?
(In reply to Michael Adam from comment #5) > > [...] 0-rep3vol-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) > > [...] 0-rep3vol-client-0: Server and Client lk-version numbers are not same, reopening the fds > > Is it possible that simply the client version in the test setup is too old? Ok, confused client version and client LK version... And that is only informational. But still, the client seems very old. Would it be possible that there are some incompatible changes between 3.3 and 3.8?
I'm seeing almost the same issue when trying to connect to a QCOW2 drive with a VM. This seems to be limited to glusterfs package 3.8.5-1.el7. 3.8.4-1.el7 works fine when I downgrade back to it. I am using a replica 3 arbiter volume. I've been seeing those informational client version and client LK version alerts since I started working with gluster. GlusterFS Installed when it works: glusterfs.x86_64 3.8.4-1.el7 glusterfs-api.x86_64 3.8.4-1.el7 glusterfs-client-xlators.x86_64 3.8.4-1.el7 glusterfs-fuse.x86_64 3.8.4-1.el7 glusterfs-libs.x86_64 GlusterFS Installed when it doesn't work: glusterfs x86_64 3.8.5-1.el7 glusterfs-api x86_64 3.8.5-1.el7 glusterfs-client-xlators x86_64 3.8.5-1.el7 glusterfs-fuse x86_64 3.8.5-1.el7 glusterfs-libs x86_64 3.8.5-1.el7 Error: ---------------------------------------------------------------------------------- 2016-11-02T17:28:51.970295Z qemu-kvm: -drive file=gluster://gluster1:24007/opennebula/361d9f69c43ca458f037b8afb23eed5a,if=none,id=drive-ide0-1-0,format=qcow2,cache=none: could not open disk image gluster://gluster1:24007/opennebula/361d9f69c43ca458f037b8afb23eed5a: Could not read L1 table: Input/output error 2016-11-02 17:28:51.996+0000: shutting down
(In reply to Daryl Lee from comment #7) > I'm seeing almost the same issue when trying to connect to a QCOW2 drive > with a VM. This seems to be limited to glusterfs package 3.8.5-1.el7. > 3.8.4-1.el7 works fine when I downgrade back to it. I am using a replica 3 > arbiter volume. I've been seeing those informational client version and > client LK version alerts since I started working with gluster. > > GlusterFS Installed when it works: > glusterfs.x86_64 > 3.8.4-1.el7 > glusterfs-api.x86_64 > 3.8.4-1.el7 > glusterfs-client-xlators.x86_64 > 3.8.4-1.el7 > glusterfs-fuse.x86_64 > 3.8.4-1.el7 > glusterfs-libs.x86_64 > > GlusterFS Installed when it doesn't work: > glusterfs x86_64 > 3.8.5-1.el7 > glusterfs-api x86_64 > 3.8.5-1.el7 > glusterfs-client-xlators x86_64 > 3.8.5-1.el7 > glusterfs-fuse x86_64 > 3.8.5-1.el7 > glusterfs-libs x86_64 > 3.8.5-1.el7 > > Error: > ----------------------------------------------------------------------------- > ----- > 2016-11-02T17:28:51.970295Z qemu-kvm: -drive > file=gluster://gluster1:24007/opennebula/361d9f69c43ca458f037b8afb23eed5a, > if=none,id=drive-ide0-1-0,format=qcow2,cache=none: could not open disk image > gluster://gluster1:24007/opennebula/361d9f69c43ca458f037b8afb23eed5a: Could > not read L1 table: Input/output error > 2016-11-02 17:28:51.996+0000: shutting down Hi Daryl, This issue is reported on the product - 'Red Hat Gluster Storage' which is the downstream version of GlusterFS product. There is the issue[1] reported for the same with the product - 'GlusterFS' [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1390521 Please follow up with that one for the resolution of the issue
(In reply to Michael Adam from comment #6) > > But still, the client seems very old. > Would it be possible that there are some incompatible changes between 3.3 > and 3.8? No, I am using the latest client binaries on RHEL 7.3. I am not seeing this issue with previous interim build - glusterfs-3.8.4-2.el7rhgs. This is a regression with glusterfs-3.8.4-3.el7rhgs build.
From the initial investigation this issue looks similar to BZ 1391086. Fix for this bug is already merged downstream. Can we retest this with the latest build?
(In reply to rjoseph from comment #11) > From the initial investigation this issue looks similar to BZ 1391086. Fix > for this bug is already merged downstream. Can we retest this with the > latest build? I don't see any new downstream build post the interim build - glusterfs-3.8.4-3.el7rhgs. I will check this issue with the next downstream build
All, I have tested with the latest downstream RHGS 3.2.0 interim build - glusterfs-3.8.4-5.el7rhgs - on RHEL 7.3. Version of the other components: -------------------------------- qemu-kvm-rhev-2.6.0-27.el7.x86_64 libvirt-daemon-driver-qemu-2.0.0-10.el7.x86_64 ipxe-roms-qemu-20160127-5.git6366fa7a.el7.noarch qemu-img-rhev-2.6.0-27.el7.x86_64 qemu-kvm-tools-rhev-2.6.0-27.el7.x86_64 qemu-kvm-common-rhev-2.6.0-27.el7.x86_64 qemu-guest-agent-2.5.0-3.el7.x86_64 libvirt-daemon-driver-qemu-2.0.0-10.el7.x86_64 libvirt-python-2.0.0-2.el7.x86_64 libvirt-daemon-driver-network-2.0.0-10.el7.x86_64 libvirt-daemon-driver-nodedev-2.0.0-10.el7.x86_64 libvirt-daemon-driver-nwfilter-2.0.0-10.el7.x86_64 libvirt-daemon-config-network-2.0.0-10.el7.x86_64 libvirt-daemon-driver-secret-2.0.0-10.el7.x86_64 libvirt-client-2.0.0-10.el7.x86_64 libvirt-daemon-driver-storage-2.0.0-10.el7.x86_64 libvirt-daemon-driver-lxc-2.0.0-10.el7.x86_64 libvirt-2.0.0-10.el7.x86_64 libvirt-daemon-2.0.0-10.el7.x86_64 libvirt-daemon-driver-interface-2.0.0-10.el7.x86_64 libvirt-daemon-config-nwfilter-2.0.0-10.el7.x86_64 I am not seeing this issue anymore. Please provide the patch URL for the fix, move it to ON_QA with the proper fixed-in-version, so that this bug could be VERIFIED
Fix for BZ1391093 also fixes this issue. Following is the corresponding downstream patch: https://code.engineering.redhat.com/gerrit/89229 Therefore moving the bug to ON_QA.
Tested with RHGS 3.2.0 interim build - glusterfs-3.8.4-5.el7rhgs installed on RHEL 7.3 with the following components : qemu-img-1.5.3-126.el7.x86_64 ipxe-roms-qemu-20160127-5.git6366fa7a.el7.noarch qemu-kvm-common-1.5.3-126.el7.x86_64 libvirt-daemon-driver-qemu-2.0.0-10.el7.x86_64 qemu-kvm-1.5.3-126.el7.x86_64 libvirt-daemon-config-network-2.0.0-10.el7.x86_64 libvirt-python-2.0.0-2.el7.x86_64 libvirt-daemon-driver-storage-2.0.0-10.el7.x86_64 libvirt-daemon-2.0.0-10.el7.x86_64 libvirt-daemon-driver-qemu-2.0.0-10.el7.x86_64 libvirt-daemon-driver-interface-2.0.0-10.el7.x86_64 libvirt-daemon-driver-network-2.0.0-10.el7.x86_64 libvirt-daemon-config-nwfilter-2.0.0-10.el7.x86_64 libvirt-daemon-driver-nodedev-2.0.0-10.el7.x86_64 libvirt-2.0.0-10.el7.x86_64 libvirt-daemon-driver-nwfilter-2.0.0-10.el7.x86_64 libvirt-daemon-driver-secret-2.0.0-10.el7.x86_64 libvirt-client-2.0.0-10.el7.x86_64 libvirt-daemon-driver-lxc-2.0.0-10.el7.x86_64 VMs could access the disks through gfapi without any I/O Errors.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html