Description of problem: Once the template is created, any new vms created from that template fails to start. Attaching the screenshot for the same. Version-Release number of selected component (if applicable): glusterfs-3.8.4-41.el7rhgs.x86_64 vdsm-4.19.28-1.el7ev.x86_64 How reproducible: Always Steps to Reproduce: 1. Configure gluster and install Hosted Engine using cockpit UI 2. Add additional hosts and create storage domains 3. create a vm. 4. Now make a template out of the created vm. 5. create new vm from the template . Actual results: vm creation succeeds but fails to boot with the error shown in the screenshot. Expected results: Vm should get booted successfully. Additional info: no errors seen in qemu logs: ===================================== 2017-08-18 13:33:37.463+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.2 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-07-18-09:40:48, x86-039.build.eng.bos.redhat.com), qemu version: 2.9.0(qemu-kvm-rhev-2.9.0-16.el7_4.4), hostname: tettnang.lab.eng.blr.redhat.com LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name guest=testvm_shard_64MB,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-14-testvm_shard_64MB/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off -cpu Haswell-noTSX -m size=8388608k,slots=16,maxmem=33554432k -realtime mlock=off -smp 2,maxcpus=16,sockets=16,cores=1,threads=1 -numa node,nodeid=0,cpus=0-1,mem=8192 -uuid fea25030-23af-4069-98da-e74a544cb04f -smbios 'type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.4-18.el7rhgs,serial=36383737-3538-4753-4835-30375832434A,uuid=fea25030-23af-4069-98da-e74a544cb04f' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-14-testvm_shard_64MB/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2017-08-18T13:33:36,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=gluster://10.70.36.78/data/2edc6cc1-7b31-425d-a969-c5df2287c84a/images/3ef6ed85-9647-4b48-a041-77d9a5a1e9f5/e07f8c83-b365-49f1-b276-fad2f8afc9dc,file.debug=4,format=qcow2,if=none,id=drive-scsi0-0-0-0,serial=3ef6ed85-9647-4b48-a041-77d9a5a1e9f5,cache=none,werror=stop,rerror=stop,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -netdev tap,fd=31,id=hostnet0,vhost=on,vhostfd=33 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:45:55:21:48:15,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/fea25030-23af-4069-98da-e74a544cb04f.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/fea25030-23af-4069-98da-e74a544cb04f.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice tls-port=5900,addr=10.70.34.41,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=default,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=8388608,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 -msg timestamp=on [2017-08-18 13:33:37.791278] I [afr.c:94:fix_quorum_options] 0-data-replicate-0: reindeer: incoming qtype = auto [2017-08-18 13:33:37.791329] I [afr.c:116:fix_quorum_options] 0-data-replicate-0: reindeer: quorum_count = 2147483647 [2017-08-18 13:33:37.792934] W [MSGID: 101174] [graph.c:360:_log_if_unknown_option] 0-data-readdir-ahead: option 'parallel-readdir' is not recognized [2017-08-18 13:33:37.792997] I [MSGID: 104045] [glfs-master.c:91:notify] 0-gfapi: New graph 74657474-6e61-6e67-2e6c-61622e656e67 (0) coming up [2017-08-18 13:33:37.793027] I [MSGID: 114020] [client.c:2356:notify] 0-data-client-0: parent translators are ready, attempting connect on transport [2017-08-18 13:33:37.797036] I [MSGID: 114020] [client.c:2356:notify] 0-data-client-1: parent translators are ready, attempting connect on transport [2017-08-18 13:33:37.797683] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-data-client-0: changing port to 49153 (from 0) [2017-08-18 13:33:37.801016] I [MSGID: 114020] [client.c:2356:notify] 0-data-client-2: parent translators are ready, attempting connect on transport [2017-08-18 13:33:37.805314] I [MSGID: 114057] [client-handshake.c:1450:select_server_supported_programs] 0-data-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-08-18 13:33:37.805424] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-data-client-1: changing port to 49153 (from 0) [2017-08-18 13:33:37.808857] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-data-client-0: Connected to data-client-0, attached to remote volume '/gluster_bricks/data/data'. [2017-08-18 13:33:37.808885] I [MSGID: 114047] [client-handshake.c:1226:client_setvolume_cbk] 0-data-client-0: Server and Client lk-version numbers are not same, reopening the fds [2017-08-18 13:33:37.808965] I [MSGID: 108005] [afr-common.c:4706:afr_notify] 0-data-replicate-0: Subvolume 'data-client-0' came back up; going online. [2017-08-18 13:33:37.809303] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-data-client-0: Server lk version = 1 [2017-08-18 13:33:37.809652] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-data-client-2: changing port to 49153 (from 0) [2017-08-18 13:33:37.812611] I [MSGID: 114057] [client-handshake.c:1450:select_server_supported_programs] 0-data-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-08-18 13:33:37.813306] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-data-client-1: Connected to data-client-1, attached to remote volume '/gluster_bricks/data/data'. [2017-08-18 13:33:37.813347] I [MSGID: 114047] [client-handshake.c:1226:client_setvolume_cbk] 0-data-client-1: Server and Client lk-version numbers are not same, reopening the fds [2017-08-18 13:33:37.813489] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-data-client-1: Server lk version = 1 [2017-08-18 13:33:37.816687] I [MSGID: 114057] [client-handshake.c:1450:select_server_supported_programs] 0-data-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-08-18 13:33:37.813306] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-data-client-1: Connected to data-client-1, attached to remote volume '/gluster_bricks/data/data'. [2017-08-18 13:33:37.813347] I [MSGID: 114047] [client-handshake.c:1226:client_setvolume_cbk] 0-data-client-1: Server and Client lk-version numbers are not same, reopening the fds [2017-08-18 13:33:37.813489] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-data-client-1: Server lk version = 1 [2017-08-18 13:33:37.816687] I [MSGID: 114057] [client-handshake.c:1450:select_server_supported_programs] 0-data-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-08-18 13:33:37.817323] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-data-client-2: Connected to data-client-2, attached to remote volume '/gluster_bricks/data/data'. [2017-08-18 13:33:37.817342] I [MSGID: 114047] [client-handshake.c:1226:client_setvolume_cbk] 0-data-client-2: Server and Client lk-version numbers are not same, reopening the fds [2017-08-18 13:33:37.817531] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-data-client-2: Server lk version = 1 [2017-08-18 13:33:37.820304] I [MSGID: 108031] [afr-common.c:2264:afr_local_discovery_cbk] 0-data-replicate-0: selecting local read_child data-client-1 [2017-08-18 13:33:37.820727] I [MSGID: 104041] [glfs-resolve.c:970:__glfs_active_subvol] 0-data: switched to graph 74657474-6e61-6e67-2e6c-61622e656e67 (0) [2017-08-18 13:33:37.825282] I [MSGID: 133017] [shard.c:4900:shard_seek] 0-data-shard: seek called on f7e757ae-e231-4e79-8dea-d9ff84da555d. [Operation not supported] [2017-08-18 13:33:37.832548] I [MSGID: 133017] [shard.c:4900:shard_seek] 0-data-shard: seek called on 7771a6d6-6171-4a09-ac83-e2cf723833bf. [Operation not supported] 2017-08-18T13:33:37.832765Z qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2017-08-18T13:33:37.832797Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config main_channel_link: add main channel client main_channel_handle_parsed: net test: latency 1.825000 ms, bitrate 1369899665 bps (1306.438127 Mbps) inputs_connect: inputs channel client create errors seen in brick logs: ================================ [2017-08-18 13:28:24.415782] E [MSGID: 113002] [posix.c:267:posix_lookup] 0-data-posix: buf->ia_gfid is null for /gluster_bricks/data/data/.shard/b1a937e5-4a69-4b41-96cb-07a 7f562db55.785 [No data available] [2017-08-18 13:28:24.415845] E [MSGID: 115050] [server-rpc-fops.c:158:server_lookup_cbk] 0-data-server: 114652: LOOKUP /.shard/b1a937e5-4a69-4b41-96cb-07a7f562db55.785 (be31 8638-e8a0-4c6d-977d-7a937aa84806/b1a937e5-4a69-4b41-96cb-07a7f562db55.785) ==> (No data available) [No data available] [2017-08-18 13:28:24.472442] W [MSGID: 113096] [posix-handle.c:773:posix_handle_hard] 0-data-posix: link /gluster_bricks/data/data/.shard/b1a937e5-4a69-4b41-96cb-07a7f562db5 5.786 -> /gluster_bricks/data/data/.glusterfs/ce/13/ce1330fe-e550-43d2-92ea-caa2c9510971failed [File exists] [2017-08-18 13:28:24.472492] E [MSGID: 113020] [posix.c:1395:posix_mknod] 0-data-posix: setting gfid on /gluster_bricks/data/data/.shard/b1a937e5-4a69-4b41-96cb-07a7f562db55 .786 failed [2017-08-18 13:28:24.534927] E [MSGID: 113002] [posix.c:267:posix_lookup] 0-data-posix: buf->ia_gfid is null for /gluster_bricks/data/data/.shard/b1a937e5-4a69-4b41-96cb-07a 7f562db55.787 [No data available] [2017-08-18 13:28:24.534981] E [MSGID: 115050] [server-rpc-fops.c:158:server_lookup_cbk] 0-data-server: 114743: LOOKUP /.shard/b1a937e5-4a69-4b41-96cb-07a7f562db55.787 (be31 8638-e8a0-4c6d-977d-7a937aa84806/b1a937e5-4a69-4b41-96cb-07a7f562db55.787) ==> (No data available) [No data available] [2017-08-18 13:28:24.592544] W [MSGID: 113096] [posix-handle.c:773:posix_handle_hard] 0-data-posix: link /gluster_bricks/data/data/.shard/b1a937e5-4a69-4b41-96cb-07a7f562db5 5.788 -> /gluster_bricks/data/data/.glusterfs/83/00/830084a6-423b-4a29-a0bc-2ea90e2374a2failed [File exists] [2017-08-18 13:28:24.592544] W [MSGID: 113096] [posix-handle.c:773:posix_handle_hard] 0-data-posix: link /gluster_bricks/data/data/.shard/b1a937e5-4a69-4b41-96cb-07a7f562db5 5.788 -> /gluster_bricks/data/data/.glusterfs/83/00/830084a6-423b-4a29-a0bc-2ea90e2374a2failed [File exists] [2017-08-18 13:28:24.592562] E [MSGID: 113020] [posix.c:1395:posix_mknod] 0-data-posix: setting gfid on /gluster_bricks/data/data/.shard/b1a937e5-4a69-4b41-96cb-07a7f562db55 .788 failed [2017-08-18 13:28:24.700997] W [MSGID: 113096] [posix-handle.c:773:posix_handle_hard] 0-data-posix: link /gluster_bricks/data/data/.shard/b1a937e5-4a69-4b41-96cb-07a7f562db5 5.790 -> /gluster_bricks/data/data/.glusterfs/fb/a9/fba964dc-2b80-4c97-a84a-0fd4970d12f4failed [File exists] [2017-08-18 13:28:24.701032] E [MSGID: 113020] [posix.c:1395:posix_mknod] 0-data-posix: setting gfid on /gluster_bricks/data/data/.shard/b1a937e5-4a69-4b41-96cb-07a7f562db55 .790 failed [2017-08-18 13:28:24.761620] W [MSGID: 113096] [posix-handle.c:773:posix_handle_hard] 0-data-posix: link /gluster_bricks/data/data/.shard/b1a937e5-4a69-4b41-96cb-07a7f562db5 5.791 -> /gluster_bricks/data/data/.glusterfs/a7/f5/a7f5cc30-f0fd-49a0-91d6-2233563e9c65failed [File exists] [2017-08-18 13:28:24.761621] W [MSGID: 113096] [posix-handle.c:773:posix_handle_hard] 0-data-posix: link /gluster_bricks/data/data/.shard/b1a937e5-4a69-4b41-96cb-07a7f562db5 5.791 -> /gluster_bricks/data/data/.glusterfs/a7/f5/a7f5cc30-f0fd-49a0-91d6-2233563e9c65failed [File exists] [2017-08-18 13:28:24.761638] E [MSGID: 113020] [posix.c:1395:posix_mknod] 0-data-posix: setting gfid on /gluster_bricks/data/data/.shard/b1a937e5-4a69-4b41-96cb-07a7f562db55 .791 failed [2017-08-18 13:28:24.890108] E [MSGID: 113002] [posix.c:267:posix_lookup] 0-data-posix: buf->ia_gfid is null for /gluster_bricks/data/data/.shard/b1a937e5-4a69-4b41-96cb-07a 7f562db55.793 [No data available] [2017-08-18 13:28:24.890172] E [MSGID: 115050] [server-rpc-fops.c:158:server_lookup_cbk] 0-data-server: 115002: LOOKUP /.shard/b1a937e5-4a69-4b41-96cb-07a7f562db55.793 (be31 8638-e8a0-4c6d-977d-7a937aa84806/b1a937e5-4a69-4b41-96cb-07a7f562db55.793) ==> (No data available) [No data available] [2017-08-18 13:28:25.003037] E [MSGID: 113002] [posix.c:267:posix_lookup] 0-data-posix: buf->ia_gfid is null for /gluster_bricks/data/data/.shard/b1a937e5-4a69-4b41-96cb-07a 7f562db55.795 [No data available] [2017-08-18 13:28:25.003093] E [MSGID: 115050] [server-rpc-fops.c:158:server_lookup_cbk] 0-data-server: 115090: LOOKUP /.shard/b1a937e5-4a69-4b41-96cb-07a7f562db55.795 (be31 8638-e8a0-4c6d-977d-7a937aa84806/b1a937e5-4a69-4b41-96cb-07a7f562db55.795) ==> (No data available) [No data available]
sosreports can be found in the link below: ========================================== http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1482994/
Moving from oVirt to RHGS
Niels, The only error that we see in the client logs are from shard which informs about SEEK_DATA and SEEK_HOLE not being supported. Technically things should work fine even if shard doesn't support it in terms of qemu falling back to its earlier mode of operation before seek was even implemented in gluster, at least until it is implemented in shard, which is a non-trivial effort. I checked block/gluster.c in qemu codebase and found that with sharding enabled, supports_seek_data flag should be set to false by virtue of shard returning ENOTSUP. The only place there this flag is read seems to be find_allocation() which will go to "exit" in this case and return -ENOTSUP to qemu_gluster_co_get_block_status(). Unfortunately I do not understand what would happen next. ret seems to be set to BDRV_BLOCK_DATA and I don't understand what that's supposed to mean. So I did a git-blame and saw that this particular piece of code was written by you as part of commit 947eb2030eef73b7a66167a6244b11eeb4f109ff. Could you please clarify what qemu does next and why vms won't boot because of this, for want of any kind of logging of fop failures in gfapi itself? Thanks in advance, Krutika
I don't see any seek problems in the logs here... not sure if this problem is the same as bz#1357919. There have been some changes to how glfs_lseek() is handled by QEMU, but that has been months ago. Unfortunately the version of qemu is not mentioned in the bug. Rafi provided versions over IRC: 15:56 < rafi> ndevos, qemu-kvm-tools-rhev-2.6.0-28.el7_3.12.x86_64 15:56 < rafi> libvirt-daemon-kvm-2.0.0-10.el7_3.9.x86_64 15:56 < rafi> qemu-kvm-rhev-2.6.0-28.el7_3.12.x86_64 15:56 < rafi> qemu-kvm-common-rhev-2.6.0-28.el7_3.12.x86_64
Rafi, I checked the logs. None of them indicate anything about the issue that is seen in dev cluster. All of these logs represent a transient state. -Krutika
(In reply to Niels de Vos from comment #11) > I don't see any seek problems in the logs here... not sure if this problem > is the same as bz#1357919. > > There have been some changes to how glfs_lseek() is handled by QEMU, but > that has been months ago. Unfortunately the version of qemu is not mentioned > in the bug. Rafi provided versions over IRC: > > 15:56 < rafi> ndevos, qemu-kvm-tools-rhev-2.6.0-28.el7_3.12.x86_64 > 15:56 < rafi> libvirt-daemon-kvm-2.0.0-10.el7_3.9.x86_64 > 15:56 < rafi> qemu-kvm-rhev-2.6.0-28.el7_3.12.x86_64 > 15:56 < rafi> qemu-kvm-common-rhev-2.6.0-28.el7_3.12.x86_64 Kasturi, What is the version of qemu that you are using? Could you please provide the output? Niels seems to suggest in comment 11 that there were some changes to how QEMU handles seek fop in the past. -Krutika
Hi Neils & krutika, Apologies that i missed to put that information. Here is the qemu version. [root@yarrow ~]# rpm -qa | grep qemu qemu-kvm-rhev-2.9.0-16.el7_4.4.x86_64 qemu-kvm-tools-rhev-2.9.0-16.el7_4.4.x86_64 qemu-kvm-common-rhev-2.9.0-16.el7_4.4.x86_64 libvirt-daemon-driver-qemu-3.2.0-14.el7_4.2.x86_64 ipxe-roms-qemu-20170123-1.git4e85b27.el7.noarch qemu-img-rhev-2.9.0-16.el7_4.4.x86_64 Thanks kasturi
In case of problems related to seek, we need to verify if this upstream change was backported. commit 223a23c198787328ae75bc65d84edf5fde33c0b6 Author: Jeff Cody <jcody> Date: Tue May 23 13:27:50 2017 -0400 block/gluster: glfs_lseek() workaround https://git.qemu.org/gitweb.cgi?p=qemu.git;a=commit;h=223a23c198787328ae75bc65d84edf5fde33c0b6
(In reply to Niels de Vos from comment #15) > In case of problems related to seek, we need to verify if this upstream > change was backported. > > commit 223a23c198787328ae75bc65d84edf5fde33c0b6 > Author: Jeff Cody <jcody> > Date: Tue May 23 13:27:50 2017 -0400 > > block/gluster: glfs_lseek() workaround > > https://git.qemu.org/gitweb.cgi?p=qemu.git;a=commit; > h=223a23c198787328ae75bc65d84edf5fde33c0b6 Bug 1451191 was used to get the fix in qemu-kvm-rhev-2.9.0-8.el7.x86_64. The version this was reported against is higher, so it should contain the backport as well. Bug 1425293#c21 mentions that the seek problem has been fixed with the combination of libvirt-3.2.0-10.el7.x86_64 qemu-kvm-rhev-2.9.0-10.el7.x86_64 glusterfs-3.8.4-28.el7rhgs.x86_64 There has been a fix in seek on the gluster side as well. http://review.gluster.org/15943 was backported to upstream glusterfs-3.8.7. This matches downstream bug 1454558 (git tag v3.8.4-26). At the moment, I do not think we are missing anything from the seek side. Is it possible to verify that qemu continues with <something> after trying glfs_lseek()? You could do this by setting diagnostics.client-log-level=TRACE for the volume and the QEMU/gfapi logs should be much more verbose.
sure niels. I will set the option on volume and update the bug.
Hi Niels, I have set the volume option as you mentioned and i see errors only with seek returning operation not supported. [2017-08-23 10:32:42.999569] D [MSGID: 0] [shard.c:4901:shard_seek] 0-stack-trace: stack-address: 0x5594f725a020, data-shard returned -1 error: Operation not supported [Operation not supported] [2017-08-23 10:32:42.999585] D [MSGID: 0] [io-stats.c:2529:io_stats_seek_cbk] 0-stack-trace: stack-address: 0x5594f725a020, data returned -1 error: Operation not supported [Operation not supported] complete qemu log file is present in the link below. http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1482994/vm_data_trace1.log Thanks kasturi
Thanks for testing! After seek failed with ENOTSUPP, qemu does some reads from the vm-image on the volume. I therefore doubt that the problem is caused by seek. It would be helpful to have similar logs from the test where sharding is disabled. If this also does not work, it may really not be a problem with seek (or if we're unlocky, a different seek issue).
Hi Niels, I tested this with out sharding enabled and below is what i see. [root@yarrow qemu]# cat vm_noshard_1.log 2017-08-18 07:47:56.357+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.2 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-07-18-09:40:48, x86-039.build.eng.bos.redhat.com), qemu version: 2.9.0(qemu-kvm-rhev-2.9.0-16.el7_4.4), hostname: yarrow.lab.eng.blr.redhat.com LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name guest=vm_noshard_1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-vm_noshard_1/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off -cpu Haswell-noTSX -m size=8388608k,slots=16,maxmem=33554432k -realtime mlock=off -smp 2,maxcpus=16,sockets=16,cores=1,threads=1 -numa node,nodeid=0,cpus=0-1,mem=8192 -uuid 6c540118-b51e-4599-958e-a7a80e5d4f9c -smbios 'type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.4-18.el7rhgs,serial=36383737-3538-4753-4835-303758324348,uuid=6c540118-b51e-4599-958e-a7a80e5d4f9c' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-vm_noshard_1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2017-08-18T07:47:55,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=gluster://10.70.36.78/data/1cc3328f-0462-4b9e-9cd2-b22b1bce0bcc/images/6cd3f0a7-3e81-46db-a25d-7a92d9234a8f/c308f044-d17d-47b0-a164-cea091b908b9,file.debug=4,format=qcow2,if=none,id=drive-scsi0-0-0-0,serial=6cd3f0a7-3e81-46db-a25d-7a92d9234a8f,cache=none,werror=stop,rerror=stop,aio=threads -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=35 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:45:55:21:48:0d,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/6c540118-b51e-4599-958e-a7a80e5d4f9c.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/6c540118-b51e-4599-958e-a7a80e5d4f9c.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice tls-port=5901,addr=10.70.34.43,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=default,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=8388608,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 -msg timestamp=on [2017-08-18 07:47:56.512892] I [afr.c:94:fix_quorum_options] 0-data-replicate-0: reindeer: incoming qtype = auto [2017-08-18 07:47:56.512931] I [afr.c:116:fix_quorum_options] 0-data-replicate-0: reindeer: quorum_count = 2147483647 [2017-08-18 07:47:56.514517] W [MSGID: 101174] [graph.c:360:_log_if_unknown_option] 0-data-readdir-ahead: option 'parallel-readdir' is not recognized [2017-08-18 07:47:56.514580] I [MSGID: 104045] [glfs-master.c:91:notify] 0-gfapi: New graph 79617272-6f77-2e6c-6162-2e656e672e62 (0) coming up [2017-08-18 07:47:56.514613] I [MSGID: 114020] [client.c:2356:notify] 0-data-client-0: parent translators are ready, attempting connect on transport [2017-08-18 07:47:56.518721] I [MSGID: 114020] [client.c:2356:notify] 0-data-client-1: parent translators are ready, attempting connect on transport [2017-08-18 07:47:56.519082] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-data-client-0: changing port to 49153 (from 0) [2017-08-18 07:47:56.522525] I [MSGID: 114020] [client.c:2356:notify] 0-data-client-2: parent translators are ready, attempting connect on transport [2017-08-18 07:47:56.526842] I [MSGID: 114057] [client-handshake.c:1450:select_server_supported_programs] 0-data-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-08-18 07:47:56.527094] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-data-client-1: changing port to 49153 (from 0) [2017-08-18 07:47:56.530333] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-data-client-0: Connected to data-client-0, attached to remote volume '/gluster_bricks/data/data'. [2017-08-18 07:47:56.530364] I [MSGID: 114047] [client-handshake.c:1226:client_setvolume_cbk] 0-data-client-0: Server and Client lk-version numbers are not same, reopening the fds [2017-08-18 07:47:56.530464] I [MSGID: 108005] [afr-common.c:4706:afr_notify] 0-data-replicate-0: Subvolume 'data-client-0' came back up; going online. [2017-08-18 07:47:56.530893] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-data-client-0: Server lk version = 1 [2017-08-18 07:47:56.531269] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-data-client-2: changing port to 49153 (from 0) [2017-08-18 07:47:56.534429] I [MSGID: 114057] [client-handshake.c:1450:select_server_supported_programs] 0-data-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-08-18 07:47:56.535133] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-data-client-1: Connected to data-client-1, attached to remote volume '/gluster_bricks/data/data'. [2017-08-18 07:47:56.535157] I [MSGID: 114047] [client-handshake.c:1226:client_setvolume_cbk] 0-data-client-1: Server and Client lk-version numbers are not same, reopening the fds [2017-08-18 07:47:56.535348] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-data-client-1: Server lk version = 1 [2017-08-18 07:47:56.537986] I [MSGID: 114057] [client-handshake.c:1450:select_server_supported_programs] 0-data-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-08-18 07:47:56.538625] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-data-client-2: Connected to data-client-2, attached to remote volume '/gluster_bricks/data/data'. [2017-08-18 07:47:56.538644] I [MSGID: 114047] [client-handshake.c:1226:client_setvolume_cbk] 0-data-client-2: Server and Client lk-version numbers are not same, reopening the fds [2017-08-18 07:47:56.538873] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-data-client-2: Server lk version = 1 [2017-08-18 07:47:56.541463] I [MSGID: 108031] [afr-common.c:2264:afr_local_discovery_cbk] 0-data-replicate-0: selecting local read_child data-client-0 [2017-08-18 07:47:56.542170] I [MSGID: 104041] [glfs-resolve.c:970:__glfs_active_subvol] 0-data: switched to graph 79617272-6f77-2e6c-6162-2e656e672e62 (0) [2017-08-18 07:47:56.547327] W [MSGID: 114031] [client-rpc-fops.c:2211:client3_3_seek_cbk] 0-data-client-0: remote operation failed [No such device or address] [2017-08-18 07:47:56.548794] I [MSGID: 114021] [client.c:2365:notify] 0-data-client-0: current graph is no longer active, destroying rpc_client [2017-08-18 07:47:56.548845] I [MSGID: 114021] [client.c:2365:notify] 0-data-client-1: current graph is no longer active, destroying rpc_client [2017-08-18 07:47:56.548878] I [MSGID: 114021] [client.c:2365:notify] 0-data-client-2: current graph is no longer active, destroying rpc_client [2017-08-18 07:47:56.548871] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-data-client-0: disconnected from data-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2017-08-18 07:47:56.548916] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-data-client-2: disconnected from data-client-2. Client process will keep trying to connect to glusterd until brick's port is available [2017-08-18 07:47:56.548910] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-data-client-1: disconnected from data-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2017-08-18 07:47:56.548935] W [MSGID: 108001] [afr-common.c:4820:afr_notify] 0-data-replicate-0: Client-quorum is not met [2017-08-18 07:47:56.549058] E [MSGID: 108006] [afr-common.c:4731:afr_notify] 0-data-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2017-08-18 07:47:56.549468] I [io-stats.c:4006:fini] 0-data: io-stats translator unloaded [2017-08-18 07:47:56.549951] I [MSGID: 101191] [event-epoll.c:633:event_dispatch_epoll_worker] 0-epoll: Exited thread with index 1 [2017-08-18 07:47:56.549979] I [MSGID: 101191] [event-epoll.c:633:event_dispatch_epoll_worker] 0-epoll: Exited thread with index 2 2017-08-18T07:47:57.496671Z qemu-kvm: -drive file=gluster://10.70.36.78/data/1cc3328f-0462-4b9e-9cd2-b22b1bce0bcc/images/6cd3f0a7-3e81-46db-a25d-7a92d9234a8f/c308f044-d17d-47b0-a164-cea091b908b9,file.debug=4,format=qcow2,if=none,id=drive-scsi0-0-0-0,serial=6cd3f0a7-3e81-46db-a25d-7a92d9234a8f,cache=none,werror=stop,rerror=stop,aio=threads: Could not read L1 table: Input/output error 2017-08-18 07:47:57.514+0000: shutting down, reason=failed Thanks kasturi
Sahina mentioned that a very similar problem has been reported earlier: - https://bugs.launchpad.net/bugs/1644754 - https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg04742.html The same "Could not read L1 table: Input/output error" is mentioned there. # qemu-img info gluster://gluster01/gv0/bar2.qcow2 qemu-img: Could not open 'gluster://gluster01/gv0/bar2.qcow2': Could not read L1 table: Input/output error The trace log from comment #18 has reasonable return values for the *_read_cbk functions. There is one exception though: [2017-08-23 10:32:43.003671] T [MSGID: 0] [client-rpc-fops.c:3011:client3_3_readv_cbk] 0-stack-trace: stack-address: 0x5594f710f020, data-client-2 returned 8 [2017-08-23 10:32:43.003691] T [MSGID: 0] [afr-inode-read.c:1838:afr_readv_cbk] 0-stack-trace: stack-address: 0x5594f710f020, data-replicate-0 returned 8 [2017-08-23 10:32:43.003706] T [MSGID: 0] [dht-inode-read.c:502:dht_readv_cbk] 0-stack-trace: stack-address: 0x5594f710f020, data-dht returned 8 [2017-08-23 10:32:43.003720] T [MSGID: 0] [shard.c:3122:shard_readv_do_cbk] 0-stack-trace: stack-address: 0x5594f710f020, data-shard returned 512 Krutika, this suggests that shard changed the returned data that was read (I do not see the requested size in the log) from 8 bytes to 512. It looks rather unexpected to me that more bytes are returned than possibly have been requested. Do you have an idea if that can be intentional? Kasturi, it is possible to capture a tcpdump on the hypervisor while starting the VM? That should give the exact requests and replies that go over the network. The logs from comment #20 do not have tracing enabled, so we can not see if the same behaviour is happening without sharding. Those logs have "remote operation failed [No such device or address]" as a reply on lseek. This should only be returned in the following case: ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is beyond the end of the file. With a packet capture, we can see what seek operation is executed, and verify the size of the file on the brick(s).
Kasturi, is this an arbiter volume? Possibly the SEEK FOP is sent to the arbiter and there the size of the file will always be 0. Any seek operations will not behave correctly. It seems that there is no 'support' for SEEK in arbiter yet.
Arbiter bricks are always a little confusing to me. So this patch has been sent to get it clarified https://review.gluster.org/18103 (either with an implementation or a comment in the sources).
(In reply to Niels de Vos from comment #21) > Sahina mentioned that a very similar problem has been reported earlier: > > - https://bugs.launchpad.net/bugs/1644754 > - https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg04742.html > > The same "Could not read L1 table: Input/output error" is mentioned there. > > # qemu-img info gluster://gluster01/gv0/bar2.qcow2 > qemu-img: Could not open 'gluster://gluster01/gv0/bar2.qcow2': Could not > read > L1 table: Input/output error > > > The trace log from comment #18 has reasonable return values for the > *_read_cbk functions. There is one exception though: > > [2017-08-23 10:32:43.003671] T [MSGID: 0] > [client-rpc-fops.c:3011:client3_3_readv_cbk] 0-stack-trace: stack-address: > 0x5594f710f020, data-client-2 returned 8 > [2017-08-23 10:32:43.003691] T [MSGID: 0] > [afr-inode-read.c:1838:afr_readv_cbk] 0-stack-trace: stack-address: > 0x5594f710f020, data-replicate-0 returned 8 > [2017-08-23 10:32:43.003706] T [MSGID: 0] > [dht-inode-read.c:502:dht_readv_cbk] 0-stack-trace: stack-address: > 0x5594f710f020, data-dht returned 8 > [2017-08-23 10:32:43.003720] T [MSGID: 0] [shard.c:3122:shard_readv_do_cbk] > 0-stack-trace: stack-address: 0x5594f710f020, data-shard returned 512 > > > Krutika, this suggests that shard changed the returned data that was read (I > do not see the requested size in the log) from 8 bytes to 512. It looks > rather unexpected to me that more bytes are returned than possibly have been > requested. Do you have an idea if that can be intentional? > > > Kasturi, it is possible to capture a tcpdump on the hypervisor while > starting the VM? That should give the exact requests and replies that go > over the network. Niels, sure i can do that. I will capture it tomorrow and update the bug with the results. This is with sharding enabled right. > > > The logs from comment #20 do not have tracing enabled, so we can not see if > the same behaviour is happening without sharding. Those logs have "remote > operation failed [No such device or address]" as a reply on lseek. This > should only be returned in the following case: > > ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is beyond the > end of the file. > > With a packet capture, we can see what seek operation is executed, and > verify the size of the file on the brick(s).
(In reply to Niels de Vos from comment #22) > Kasturi, is this an arbiter volume? Possibly the SEEK FOP is sent to the > arbiter and there the size of the file will always be 0. Any seek operations > will not behave correctly. It seems that there is no 'support' for SEEK in > arbiter yet. Niels, i am observing this with both replicate and arbiter volumes.
setting the need info's back as it got cleared with my update.
(In reply to Niels de Vos from comment #21) > Sahina mentioned that a very similar problem has been reported earlier: > > - https://bugs.launchpad.net/bugs/1644754 > - https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg04742.html > > The same "Could not read L1 table: Input/output error" is mentioned there. > > # qemu-img info gluster://gluster01/gv0/bar2.qcow2 > qemu-img: Could not open 'gluster://gluster01/gv0/bar2.qcow2': Could not > read > L1 table: Input/output error > > > The trace log from comment #18 has reasonable return values for the > *_read_cbk functions. There is one exception though: > > [2017-08-23 10:32:43.003671] T [MSGID: 0] > [client-rpc-fops.c:3011:client3_3_readv_cbk] 0-stack-trace: stack-address: > 0x5594f710f020, data-client-2 returned 8 > [2017-08-23 10:32:43.003691] T [MSGID: 0] > [afr-inode-read.c:1838:afr_readv_cbk] 0-stack-trace: stack-address: > 0x5594f710f020, data-replicate-0 returned 8 > [2017-08-23 10:32:43.003706] T [MSGID: 0] > [dht-inode-read.c:502:dht_readv_cbk] 0-stack-trace: stack-address: > 0x5594f710f020, data-dht returned 8 > [2017-08-23 10:32:43.003720] T [MSGID: 0] [shard.c:3122:shard_readv_do_cbk] > 0-stack-trace: stack-address: 0x5594f710f020, data-shard returned 512 > > > Krutika, this suggests that shard changed the returned data that was read (I > do not see the requested size in the log) from 8 bytes to 512. It looks > rather unexpected to me that more bytes are returned than possibly have been > requested. Do you have an idea if that can be intentional? Gah! Too bad we don't log the readv parameters in the wind path in trace. But that's a good point, Niels. I checked code in shard and shard seems to return the requested size in the call back too, irrespective of how many bytes could actually be read from the backend. And then there have been two patches since May 2016 that touch shard's readv codepath. Need to see if they are causing any problems. I'm not sure to what extent tcpdump output will help here because it won't give the exact details of what offset and length shard received and what it wound further down. If tcpdump doesn't help, we can change the volfile manually to load trace translators below and above shard and then test again. -Krutika > > > Kasturi, it is possible to capture a tcpdump on the hypervisor while > starting the VM? That should give the exact requests and replies that go > over the network. > > > The logs from comment #20 do not have tracing enabled, so we can not see if > the same behaviour is happening without sharding. Those logs have "remote > operation failed [No such device or address]" as a reply on lseek. This > should only be returned in the following case: > > ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is beyond the > end of the file. > > With a packet capture, we can see what seek operation is executed, and > verify the size of the file on the brick(s).
Restoring the needinfo on Kasturi for the tcpdump output.
(In reply to Niels de Vos from comment #23) > Arbiter bricks are always a little confusing to me. So this patch has been > sent to get it clarified https://review.gluster.org/18103 (either with an > implementation or a comment in the sources). This is now tracked in bz1484722. AFR should prevent SEEK and other FOPs from reaching the arbiter.
(In reply to Krutika Dhananjay from comment #27) > (In reply to Niels de Vos from comment #21) .. > > Krutika, this suggests that shard changed the returned data that was read (I > > do not see the requested size in the log) from 8 bytes to 512. It looks > > rather unexpected to me that more bytes are returned than possibly have been > > requested. Do you have an idea if that can be intentional? > > Gah! Too bad we don't log the readv parameters in the wind path in trace. > > But that's a good point, Niels. I checked code in shard and shard seems to > return > the requested size in the call back too, irrespective of how many bytes > could actually be read > from the backend. And then there have been two patches since May 2016 that > touch shard's readv codepath. > Need to see if they are causing any problems. I do not suspect shard to be the problem here, because it also happens without it. That does not mean there are no potential problems with shards though, maybe there are two different bugs being hit here. > I'm not sure to what extent tcpdump output will help here because it won't > give the exact details of > what offset and length shard received and what it wound further down. > If tcpdump doesn't help, we can change the volfile manually to load trace > translators below and above > shard and then test again. I guess my request was not clear enough. The tcpdump would be most helpful when sharding is not used. Otherwise the requests might get split over different bricks, and things are a little more difficult to debug.
Kasturi provided the tcpdump output for analysis. Based on this, https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13918723 is the build containing the fix to incorrect shard size being returned and has been given to Kasturi for testing. Let's see if it fixes the issue. Clearing needinfos. -Krutika
Kasturi, I checked the logs associated with kasturi_newbuild and this time the nunber-of-bytes-read is correctly returned by shard as 8. So there is definitely more to this than the fact that shard was converting 8 to 512 while unwinding readvs. -Krutika
This issue doesn't seem to be consistently reproducible - I tried creating a template and starting the VM on your setup and was able to. I was also able to clone the VM from which you created template, and start it. For the VM that fails to start from template, when mounting and checking the disk: [root@dhcp46-21 ~]# xfs_repair -n /dev/sdb1 Phase 1 - find and verify superblock... error reading superblock 1 -- seek to offset 268435456 failed couldn't verify primary superblock - attempted to perform I/O beyond EOF !!! attempting to find secondary superblock... ................................................Sorry, could not find valid secondary superblock Exiting now. Not sure how the disk got to this state.
(In reply to Sahina Bose from comment #33) > For the VM that fails to start from template, when mounting and checking the > disk: > > [root@dhcp46-21 ~]# xfs_repair -n /dev/sdb1 > Phase 1 - find and verify superblock... > error reading superblock 1 -- seek to offset 268435456 failed > couldn't verify primary superblock - attempted to perform I/O beyond EOF !!! > > attempting to find secondary superblock... > ................................................Sorry, could not find valid > secondary superblock > Exiting now. > > > Not sure how the disk got to this state. If you specify which hypervisor this is, and how "xfs_repair" can be run inside the VM, we could run ltrace against the qemu process (needs qemu + glusterfs debuginfo) and see what libgfapi functions are called and what they return. There has been a report (bz1425293) where glfs_seek() returns bogus values, but I have not been able to reproduce that yet. glfs_seek() should return that it is not supported, and QEMU is expected to handle that gracefully.
(In reply to Niels de Vos from comment #35) > (In reply to Sahina Bose from comment #33) > > For the VM that fails to start from template, when mounting and checking the > > disk: > > > > [root@dhcp46-21 ~]# xfs_repair -n /dev/sdb1 > > Phase 1 - find and verify superblock... > > error reading superblock 1 -- seek to offset 268435456 failed > > couldn't verify primary superblock - attempted to perform I/O beyond EOF !!! > > > > attempting to find secondary superblock... > > ................................................Sorry, could not find valid > > secondary superblock > > Exiting now. > > > > > > Not sure how the disk got to this state. > > If you specify which hypervisor this is, and how "xfs_repair" can be run > inside the VM, we could run ltrace against the qemu process (needs qemu + > glusterfs debuginfo) and see what libgfapi functions are called and what > they return. There has been a report (bz1425293) where glfs_seek() returns > bogus values, but I have not been able to reproduce that yet. glfs_seek() > should return that it is not supported, and QEMU is expected to handle that > gracefully. Have sent you the details of the hypervisor in a separate email
So I loaded trace translator above md-cache translator on Kasturi's setup to trace the different file system calls, the offsets and lengths of the IO, etc. Right after we create "New VM" from template, the UUID directory under image gets created and another UUID (call it UUID2) file, its .meta and .lease files also get created under this directory along with hard links to the original template file. When I checked the file operations on UUID2 file, it seems like for some time writes are sent on it at meaningful offsets and lengths - like offset=0,size=65536, offset=131072,size=65536, etc. At this point, the total file size is 131072 + 65536 = 196608, a number exactly divisible by 512, O_DIRECT compliant. All good till now. But after this, there are a bunch of writes that gluster receives that are at rather odd offsets and size (like 8 bytes, 12 bytes etc), and NOT O_DIRECT compliant, such as the following: [2017-08-30 09:33:56.320058] I [trace.c:2310:trace_writev] 0-vmstore-md-cache-parent: 3721: gfid=e69cc192-5207-45ee-bfb1-e98414f3ab4a fd=0x7f0734077630, count=1, offset=196608 flags=08002 write_size=8 [2017-08-30 09:33:56.320144] I [trace.c:2310:trace_writev] 0-vmstore-md-cache-parent: 3723: gfid=e69cc192-5207-45ee-bfb1-e98414f3ab4a fd=0x7f0734077630, count=1, offset=36 flags=08002 write_size=12 [2017-08-30 09:33:56.320306] I [trace.c:2310:trace_writev] 0-vmstore-md-cache-parent: 3725: gfid=e69cc192-5207-45ee-bfb1-e98414f3ab4a fd=0x7f0734077630, count=1, offset=24 flags=08002 write_size=8 This doesn't seem normal to me. Denis, could you confirm if these writes are normal? -Krutika
Hi, I tried to compare the setup where creation of vm was working from a template with mine. The difference i found was with the storage allocation policy. I see that when the vm being created from the template which has storage allocation policy as 'thin' and disk format as 'qcow2' fails to boot. If the vm being created from the template has storage allocation policy as 'clone' and disk format in 'raw' booting of vm works fine with out any issues. Attaching screenshots here for the same. So, i feel that issue does not lie in libgfapi as once i change the storage allocation policy i could see that vm boots on a setup where libgfapi is enabled. However i am not sure why creating a vm from template which has storage allocation policy as 'thin' and disk format as 'qcow2' does not work. Thanks kasturi
Created attachment 1320108 [details] screenshot where storage allocation policy was clone and disk format is qcow2
Created attachment 1320111 [details] screenshot where storage allocation policy was clone and disk format is raw
Created attachment 1320112 [details] screenshot where storage allocation policy was thin and disk format is qcow2
Sahina, I'm not quite sure what info is request from me here. All RHV does here is to orchestrate the call to qemu-img convert that creates the disk (either as a raw or qcow2 volume). Given that, I'm not quite sure what VDSM can do here. - It could be we're using the wrong command line to create the volume, or the wrong parameters to access it. If so, we should fix it, but we'd need some guidance as to what we should do differently. - If we're using the correct calls, the bug is somewhere between qemu and gluster.
Hi Allon, The thing that we're at the moment suspecting is the occurrence of writes which are neither O_DIRECT aligned and also appear to be really short during vm creation from template [2017-08-30 09:33:56.320058] I [trace.c:2310:trace_writev] 0-vmstore-md-cache-parent: 3721: gfid=e69cc192-5207-45ee-bfb1-e98414f3ab4a fd=0x7f0734077630, count=1, offset=196608 flags=08002 write_size=8 [2017-08-30 09:33:56.320144] I [trace.c:2310:trace_writev] 0-vmstore-md-cache-parent: 3723: gfid=e69cc192-5207-45ee-bfb1-e98414f3ab4a fd=0x7f0734077630, count=1, offset=36 flags=08002 write_size=12 [2017-08-30 09:33:56.320306] I [trace.c:2310:trace_writev] 0-vmstore-md-cache-parent: 3725: gfid=e69cc192-5207-45ee-bfb1-e98414f3ab4a fd=0x7f0734077630, count=1, offset=24 flags=08002 write_size=8 The first write was of size=8bytes. The second write was at offset 36 and size 12 Third at offset 24 and size 8. At the moment this is the only oddity we could recognise. If you think this is the job of qemu and the right persons to ask are qemu developers, just let us know and we'll reach out to them. -Krutika
I have tried to create vms out of windows template as well and while booting the windows vm i see an error "A disk read error occured". Attaching screenshot for the same
Created attachment 1320912 [details] Attaching screenshot while booting windows vm
The issue is happening by virtue of md-cache being enabled. Disabling md-cache causes the vms created from template to boot fine. Cancelling needinfo on Allon.
The bug is in shard. When vm creation from template is initiated, RHV creates a hard link of the template under /xxxx../images/$UUID where UUID is the unique identifier of the vm being created. In glusterfs, LINK fop is overloaded to return the attributes of the file as well in the callback, so that this information can be cached at various layers - md-cache being one of them - so that a subsequent stat/lookup can directly be served off the cache, and thereby help improve performance. During LINK in this scenario, the following is what happened: <snip1> [2017-09-05 06:41:18.080640] I [trace.c:2012:trace_link] 0-vmstore-parent-mdc: 602: oldgfid=2ba927dc-d4ab-4ca0-bcf8-6ba629d17cf5 oldpath=/84d1d265-a58b-4951-a8f3-8bc7669b33ab/images/7807fb2b-420e-4dc7-b477-ba7796faa205/e52e483d-174d-4d95-89b5-41e0e37853c0 --> newgfid=2ba927dc-d4ab-4ca0-bcf8-6ba629d17cf5 newpath=/84d1d265-a58b-4951-a8f3-8bc7669b33ab/images/8e8d9148-4867-4d63-b6de-3092da32b1e8/e52e483d-174d-4d95-89b5-41e0e37853c0 [2017-09-05 06:41:18.080654] I [trace.c:2012:trace_link] 0-vmstore-parent-shard: 602: oldgfid=2ba927dc-d4ab-4ca0-bcf8-6ba629d17cf5 oldpath=/84d1d265-a58b-4951-a8f3-8bc7669b33ab/images/7807fb2b-420e-4dc7-b477-ba7796faa205/e52e483d-174d-4d95-89b5-41e0e37853c0 --> newgfid=2ba927dc-d4ab-4ca0-bcf8-6ba629d17cf5 newpath=/84d1d265-a58b-4951-a8f3-8bc7669b33ab/images/8e8d9148-4867-4d63-b6de-3092da32b1e8/e52e483d-174d-4d95-89b5-41e0e37853c0 [2017-09-05 06:41:18.080664] I [trace.c:2012:trace_link] 0-vmstore-child-shard: 602: oldgfid=2ba927dc-d4ab-4ca0-bcf8-6ba629d17cf5 oldpath=/84d1d265-a58b-4951-a8f3-8bc7669b33ab/images/7807fb2b-420e-4dc7-b477-ba7796faa205/e52e483d-174d-4d95-89b5-41e0e37853c0 --> newgfid=2ba927dc-d4ab-4ca0-bcf8-6ba629d17cf5 newpath=/84d1d265-a58b-4951-a8f3-8bc7669b33ab/images/8e8d9148-4867-4d63-b6de-3092da32b1e8/e52e483d-174d-4d95-89b5-41e0e37853c0 [2017-09-05 06:41:18.081413] I [trace.c:789:trace_link_cbk] 0-vmstore-child-shard: 602: (op_ret=0, *stbuf = {gfid=2ba927dc-d4ab-4ca0-bcf8-6ba629d17cf5 ino=13616751834765032693, mode=100660, nlink=2, uid=36, gid=36, ************size=67108864***********, blocks=129104, atime=2017-09-05-05:59:36 mtime=2017-09-05-05:59:37 ctime=2017-09-05-06:41:18 atime_sec=1504591176, atime_nsec=290615297, mtime_sec=1504591177, mtime_nsec=297623454, ctime_sec=1504593678, ctime_nsec=79878645}, *prebuf = {gfid=68d302a5-7fed-411e-9f81-bb2efcc282e4 ino=11493673534509712100, mode=40755, nlink=2, uid=36, gid=36, size=4096, blocks=8, atime=2017-09-05-06:41:18 mtime=2017-09-05-06:41:18 ctime=2017-09-05-06:41:18 atime_sec=1504593678, atime_nsec=27878000, mtime_sec=1504593678, mtime_nsec=27878000, ctime_sec=1504593678, ctime_nsec=29878241}, *postbuf = {gfid=68d302a5-7fed-411e-9f81-bb2efcc282e4 ino=11493673534509712100, mode=40755, nlink=2, uid=36, gid=36, size=4096, blocks=8, atime=2017-09-05-06:41:18 mtime=2017-09-05-06:41:18 ctime=2017-09-05-06:41:18 atime_sec=1504593678, atime_nsec=27878000, mtime_sec=1504593678, mtime_nsec=79878645, ctime_sec=1504593678, ctime_nsec=79878645}) [2017-09-05 06:41:18.081439] I [trace.c:789:trace_link_cbk] 0-vmstore-parent-shard: 602: (op_ret=0, *stbuf = {gfid=2ba927dc-d4ab-4ca0-bcf8-6ba629d17cf5 ino=13616751834765032693, mode=100660, nlink=2, uid=36, gid=36, *******size=67108864***********, blocks=129104, atime=2017-09-05-05:59:36 mtime=2017-09-05-05:59:37 ctime=2017-09-05-06:41:18 atime_sec=1504591176, atime_nsec=290615297, mtime_sec=1504591177, mtime_nsec=297623454, ctime_sec=1504593678, ctime_nsec=79878645}, *prebuf = {gfid=68d302a5-7fed-411e-9f81-bb2efcc282e4 ino=11493673534509712100, mode=40755, nlink=2, uid=36, gid=36, size=4096, blocks=8, atime=2017-09-05-06:41:18 mtime=2017-09-05-06:41:18 ctime=2017-09-05-06:41:18 atime_sec=1504593678, atime_nsec=27878000, mtime_sec=1504593678, mtime_nsec=27878000, ctime_sec=1504593678, ctime_nsec=29878241}, *postbuf = {gfid=68d302a5-7fed-411e-9f81-bb2efcc282e4 ino=11493673534509712100, mode=40755, nlink=2, uid=36, gid=36, size=4096, blocks=8, atime=2017-09-05-06:41:18 mtime=2017-09-05-06:41:18 ctime=2017-09-05-06:41:18 atime_sec=1504593678, atime_nsec=27878000, mtime_sec=1504593678, mtime_nsec=79878645, ctime_sec=1504593678, ctime_nsec=79878645}) [2017-09-05 06:41:18.081480] I [trace.c:789:trace_link_cbk] 0-vmstore-parent-mdc: 602: (op_ret=0, *stbuf = {gfid=2ba927dc-d4ab-4ca0-bcf8-6ba629d17cf5 ino=13616751834765032693, mode=100660, nlink=2, uid=36, gid=36, size=67108864, blocks=129104, atime=2017-09-05-05:59:36 mtime=2017-09-05-05:59:37 ctime=2017-09-05-06:41:18 atime_sec=1504591176, atime_nsec=290615297, mtime_sec=1504591177, mtime_nsec=297623454, ctime_sec=1504593678, ctime_nsec=79878645}, *prebuf = {gfid=68d302a5-7fed-411e-9f81-bb2efcc282e4 ino=11493673534509712100, mode=40755, nlink=2, uid=36, gid=36, size=4096, blocks=8, atime=2017-09-05-06:41:18 mtime=2017-09-05-06:41:18 ctime=2017-09-05-06:41:18 atime_sec=1504593678, atime_nsec=27878000, mtime_sec=1504593678, mtime_nsec=27878000, ctime_sec=1504593678, ctime_nsec=29878241}, *postbuf = {gfid=68d302a5-7fed-411e-9f81-bb2efcc282e4 ino=11493673534509712100, mode=40755, nlink=2, uid=36, gid=36, size=4096, blocks=8, atime=2017-09-05-06:41:18 mtime=2017-09-05-06:41:18 ctime=2017-09-05-06:41:18 atime_sec=1504593678, atime_nsec=27878000, mtime_sec=1504593678, mtime_nsec=79878645, ctime_sec=1504593678, ctime_nsec=79878645}) </snip1> Basically shard is supposed to return among other attributes the aggregated size of the image and not the physical size of just the base shard. While this is correctly handled in all other fops, in LINK it continued to return only the physical size of the base shard which is nothing but the shard-block-size. So instead of returning 20GB in iatt, it ended up returning 67108864 which is equal to 64MB. The problem is not so much that it returned the incorrect size. The real problem starts when md-cache caches this and uses this to serve the immediate stat on the file, which is exactly what happened - <snip2> [2017-09-05 06:41:18.088070] I [trace.c:1746:trace_stat] 0-vmstore-parent-mdc: 611: gfid=2ba927dc-d4ab-4ca0-bcf8-6ba629d17cf5 path=/84d1d265-a58b-4951-a8f3-8bc7669b33ab/images/8e8d9148-4867-4d63-b6de-3092da32b1e8/e52e483d-174d-4d95-89b5-41e0e37853c0 [2017-09-05 06:41:18.088087] I [trace.c:182:trace_stat_cbk] 0-vmstore-parent-mdc: 611: gfid=2ba927dc-d4ab-4ca0-bcf8-6ba629d17cf5 op_ret=0 buf=gfid=2ba927dc-d4ab-4ca0-bcf8-6ba629d17cf5 ino=13616751834765032693, mode=100660, nlink=2, uid=36, gid=36, ***********size=67108864*************, blocks=129104, atime=2017-09-05-05:59:36 mtime=2017-09-05-05:59:37 ctime=2017-09-05-06:41:18 atime_sec=1504591176, atime_nsec=290615297, mtime_sec=1504591177, mtime_nsec=297623454, ctime_sec=1504593678, ctime_nsec=79878645 </snip2> The problem is that all the remaining steps towards creating the vm from template depend on the correctness of the size returned in STAT. And getting the size wrong implies everything else will go wrong. I fixed this bug in shard, made a brew build and tested it on Kasturi's setup and vms booted fine even with stat-prefetch on. I tried this on 3 vms and all 3 of them ran fine. The build that contains the fix is available here - https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14003344 - in case anyone is interested. The patch itself isn't all that big - 42 lines of code added in shard. Big thanks to Kasturi for her cooperation and patience in resolving the bug. :) I'll prettify the patch and add a test case if possible and send it on upstream soon. -Krutika
https://review.gluster.org/18209
I am proposing this bug as a blocker. Without the fix, one can't create VMs out of thin template. It also affects classic RHV-RHGS usecase as well as RHHI 1.1
(In reply to SATHEESARAN from comment #49) > I am proposing this bug as a blocker. > > Without the fix, one can't create VMs out of thin template. > It also affects classic RHV-RHGS usecase as well as RHHI 1.1 This bug is denied blocker for RHGS 3.3.0. The workaround will be documented. Proposing it for 3.3.1
upstream mainline patch : https://review.gluster.org/#/c/18209/ upstream 3.12 patch : https://review.gluster.org/#/c/18211/ downstream patch : https://code.engineering.redhat.com/gerrit/117426
Looks good.
verified and works fine with build glusterfs-3.8.4-45.el7rhgs.x86_64. with stat-prefetch on i see that vms created from template boots fine and there is no issue seen. [root@rhsqa-grafton1 ~]# gluster volume get vmstore performance.stat-prefetch Option Value ------ ----- performance.stat-prefetch on
Hi Krutika, I've edited the Doc Text for it's associated Errata. Request you to review the same and let me know in case of any concerns. If no changes are required, request you to provide your approval for the same.
Looks good to me.
Minor changes to Doc Text basis peer review.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3276