+++ This bug was initially created as a clone of Bug #1663431 +++ Description of problem: Get errors"Could not read qcow2 header" when read qcow2 file in glusterfs Version-Release number of selected component (if applicable): gluster server:glusterfs-3.12.2-19.el7rhgs.x86_64 client:glusterfs-3.12.2-15.4.el8.x86_64 How reproducible: 100% Steps to Reproduce: 1.Mount the gluster directory to local /mnt # mount.glusterfs 10.66.4.119:/gv0 /mnt/ 2.Create a new qcow2 file # qemu-img create -f qcow2 /mnt/qcow2mnt.img 10M 3.check it with qemu-img with gluster [root@localhost ~]# qemu-img info gluster://10.66.4.119/gv0/qcow2mnt.img qemu-img: Could not open 'gluster://10.66.4.119/gv0/qcow2mnt.img': Could not read L1 table: Input/output error Actual results: As above Expected results: Can get the correct info of the qcow2 file. Additional info: 1."raw" image is ok in this scenario. 2.qemu-img info /mnt/qcow2mnt.img works well --- Additional comment from Red Hat Bugzilla Rules Engine on 2019-01-04 10:08:17 UTC --- This bug is automatically being proposed for a Z-stream release of Red Hat Gluster Storage 3 under active development and open for bug fixes, by setting the release flag 'rhgs‑3.4.z' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from on 2019-02-01 10:09:43 UTC --- hit the same issue: (.libvirt-ci-venv-ci-runtest-sBLGCJ) [root@hp-dl380g9-02 virtual_disks]# rpm -qa | grep gluster libvirt-daemon-driver-storage-gluster-4.5.0-19.module+el8+2712+4c318da1.x86_64 glusterfs-client-xlators-3.12.2-32.1.el8.x86_64 qemu-kvm-block-gluster-2.12.0-59.module+el8+2714+6d9351dd.x86_64 glusterfs-cli-3.12.2-32.1.el8.x86_64 glusterfs-libs-3.12.2-32.1.el8.x86_64 glusterfs-api-3.12.2-32.1.el8.x86_64 glusterfs-3.12.2-32.1.el8.x86_64 glusterfs-fuse-3.12.2-32.1.el8.x86_64 (.libvirt-ci-venv-ci-runtest-sBLGCJ) [root@hp-dl380g9-02 virtual_disks]# qemu-img info gluster://10.66.7.98/gluster-vol1/aaa.qcow2 qemu-img: Could not open 'gluster://10.66.7.98/gluster-vol1/aaa.qcow2': Could not read L1 table: Input/output error And this will block vm using the gluster disk, so escalate the priority (.libvirt-ci-venv-ci-runtest-sBLGCJ) [root@hp-dl380g9-02 virtual_disks]# cat gdisk <disk device="disk" type="network"><driver cache="none" name="qemu" type="qcow2" /><target bus="virtio" dev="vdb" /><source name="gluster-vol1/aaa.qcow2" protocol="gluster"><host name="10.66.7.98" port="24007" /></source></disk> (.libvirt-ci-venv-ci-runtest-sBLGCJ) [root@hp-dl380g9-02 virtual_disks]# virsh attach-device avocado-vt-vm1 gdisk error: Failed to attach device from gdisk error: internal error: unable to execute QEMU command 'device_add': Property 'virtio-blk-device.drive' can't find value 'drive-virtio-disk1' --- Additional comment from Amar Tumballi on 2019-03-13 13:38:36 UTC --- Moving bug to Krutika as she is more experienced in Virt workloads. Meantime, looking at glusterfs version, this is RHGS 3.4 builds. --- Additional comment from Krutika Dhananjay on 2019-03-14 06:49:47 UTC --- Could you share the following two pieces of information - 1. output of `gluster volume info $VOLNAME` 2. Are the glusterfs client and server running the same version of gluster/RHGS? -Krutika --- Additional comment from Krutika Dhananjay on 2019-03-14 06:53:36 UTC --- (In reply to Krutika Dhananjay from comment #4) > Could you share the following two pieces of information - > > 1. output of `gluster volume info $VOLNAME` > 2. Are the glusterfs client and server running the same version of > gluster/RHGS? Let me clarify why I'm asking about the versions - the bug's "Description" section says this - gluster server:glusterfs-3.12.2-19.el7rhgs.x86_64 client:glusterfs-3.12.2-15.4.el8.x86_64" but comment 2 lists the client package as glusterfs-client-xlators-3.12.2-32.1.el8.x86_64 Want to be sure about the exact versions being used so I can recreate it. (Looked at the logs, not much clue there) -Krutika > > -Krutika --- Additional comment from gaojianan on 2019-03-15 01:51:43 UTC --- (In reply to Krutika Dhananjay from comment #4) > Could you share the following two pieces of information - > > 1. output of `gluster volume info $VOLNAME` > 2. Are the glusterfs client and server running the same version of > gluster/RHGS? > > -Krutika 1.`gluster volume info $VOLNAME` [root@node1 ~]# gluster volume info gv1 Volume Name: gv1 Type: Distribute Volume ID: de5d9272-e237-4a4e-8a30-a7c737f393db Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 10.66.4.119:/br2 Options Reconfigured: nfs.disable: on transport.address-family: inet 2.Server version: [root@node1 ~]# rpm -qa |grep gluster libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64 glusterfs-api-devel-3.12.2-19.el7rhgs.x86_64 pcp-pmda-gluster-4.1.0-4.el7.x86_64 glusterfs-3.12.2-19.el7rhgs.x86_64 python2-gluster-3.12.2-19.el7rhgs.x86_64 glusterfs-server-3.12.2-19.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-19.el7rhgs.x86_64 glusterfs-api-3.12.2-19.el7rhgs.x86_64 glusterfs-devel-3.12.2-19.el7rhgs.x86_64 glusterfs-debuginfo-3.12.2-18.el7.x86_64 glusterfs-libs-3.12.2-19.el7rhgs.x86_64 glusterfs-cli-3.12.2-19.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-19.el7rhgs.x86_64 glusterfs-fuse-3.12.2-19.el7rhgs.x86_64 glusterfs-rdma-3.12.2-19.el7rhgs.x86_64 glusterfs-events-3.12.2-19.el7rhgs.x86_64 samba-vfs-glusterfs-4.8.3-4.el7.x86_64 Client version: [root@nssguest ~]# rpm -qa |grep gluster qemu-kvm-block-gluster-3.1.0-18.module+el8+2834+fa8bb6e2.x86_64 glusterfs-3.12.2-32.1.el8.x86_64 glusterfs-client-xlators-3.12.2-32.1.el8.x86_64 libvirt-daemon-driver-storage-gluster-5.0.0-6.virtcov.el8.x86_64 glusterfs-libs-3.12.2-32.1.el8.x86_64 glusterfs-cli-3.12.2-32.1.el8.x86_64 glusterfs-api-3.12.2-32.1.el8.x86_64 --- Additional comment from Krutika Dhananjay on 2019-03-18 05:59:49 UTC --- (In reply to gaojianan from comment #6) > (In reply to Krutika Dhananjay from comment #4) > > Could you share the following two pieces of information - > > > > 1. output of `gluster volume info $VOLNAME` > > 2. Are the glusterfs client and server running the same version of > > gluster/RHGS? > > > > -Krutika > > 1.`gluster volume info $VOLNAME` > [root@node1 ~]# gluster volume info gv1 > > Volume Name: gv1 > Type: Distribute > Volume ID: de5d9272-e237-4a4e-8a30-a7c737f393db > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: 10.66.4.119:/br2 > Options Reconfigured: > nfs.disable: on > transport.address-family: inet > > > 2.Server version: > [root@node1 ~]# rpm -qa |grep gluster > libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64 > glusterfs-api-devel-3.12.2-19.el7rhgs.x86_64 > pcp-pmda-gluster-4.1.0-4.el7.x86_64 > glusterfs-3.12.2-19.el7rhgs.x86_64 > python2-gluster-3.12.2-19.el7rhgs.x86_64 > glusterfs-server-3.12.2-19.el7rhgs.x86_64 > glusterfs-geo-replication-3.12.2-19.el7rhgs.x86_64 > glusterfs-api-3.12.2-19.el7rhgs.x86_64 > glusterfs-devel-3.12.2-19.el7rhgs.x86_64 > glusterfs-debuginfo-3.12.2-18.el7.x86_64 > glusterfs-libs-3.12.2-19.el7rhgs.x86_64 > glusterfs-cli-3.12.2-19.el7rhgs.x86_64 > glusterfs-client-xlators-3.12.2-19.el7rhgs.x86_64 > glusterfs-fuse-3.12.2-19.el7rhgs.x86_64 > glusterfs-rdma-3.12.2-19.el7rhgs.x86_64 > glusterfs-events-3.12.2-19.el7rhgs.x86_64 > samba-vfs-glusterfs-4.8.3-4.el7.x86_64 > > Client version: > [root@nssguest ~]# rpm -qa |grep gluster > qemu-kvm-block-gluster-3.1.0-18.module+el8+2834+fa8bb6e2.x86_64 > glusterfs-3.12.2-32.1.el8.x86_64 > glusterfs-client-xlators-3.12.2-32.1.el8.x86_64 > libvirt-daemon-driver-storage-gluster-5.0.0-6.virtcov.el8.x86_64 > glusterfs-libs-3.12.2-32.1.el8.x86_64 > glusterfs-cli-3.12.2-32.1.el8.x86_64 > glusterfs-api-3.12.2-32.1.el8.x86_64 Thanks. I tried the same set of steps with the same versions of gluster client and server and the test works for me everytime. Perhaps the ONLY difference between your configuration and mine is that my gluster-client is also on rhel7 unlike yours where you're running rhel8 on the client machine. Also the qemu-img versions could be different. Are you hitting this issue even with fuse mount, i.e., when you run `qemu-img info` this way - `qemu-img info $FUSE_MOUNT_PATH/aaa.qcow2`? If yes, could you run both `qemu-img create` and `qemu-img info` commands with strace for a fresh file: # strace -ff -T -v -o /tmp/qemu-img-create.out qemu-img create -f qcow2 $IMAGE_PATH 10M # strace -ff -T -v -o /tmp/qemu-img-info.out info $IMAGE_PATH_OVER_FUSE_MOUNT and share all of the resultant output files having format qemu-img-create.out* and qemu-img-info.out*? -Krutika --- Additional comment from gaojianan on 2019-03-18 07:07:51 UTC --- --- Additional comment from gaojianan on 2019-03-18 07:10:05 UTC --- (In reply to Krutika Dhananjay from comment #7) > (In reply to gaojianan from comment #6) > > (In reply to Krutika Dhananjay from comment #4) > > > Could you share the following two pieces of information - > > > > > > 1. output of `gluster volume info $VOLNAME` > > > 2. Are the glusterfs client and server running the same version of > > > gluster/RHGS? > > > > > > -Krutika > > > > 1.`gluster volume info $VOLNAME` > > [root@node1 ~]# gluster volume info gv1 > > > > Volume Name: gv1 > > Type: Distribute > > Volume ID: de5d9272-e237-4a4e-8a30-a7c737f393db > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 > > Transport-type: tcp > > Bricks: > > Brick1: 10.66.4.119:/br2 > > Options Reconfigured: > > nfs.disable: on > > transport.address-family: inet > > > > > > 2.Server version: > > [root@node1 ~]# rpm -qa |grep gluster > > libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64 > > glusterfs-api-devel-3.12.2-19.el7rhgs.x86_64 > > pcp-pmda-gluster-4.1.0-4.el7.x86_64 > > glusterfs-3.12.2-19.el7rhgs.x86_64 > > python2-gluster-3.12.2-19.el7rhgs.x86_64 > > glusterfs-server-3.12.2-19.el7rhgs.x86_64 > > glusterfs-geo-replication-3.12.2-19.el7rhgs.x86_64 > > glusterfs-api-3.12.2-19.el7rhgs.x86_64 > > glusterfs-devel-3.12.2-19.el7rhgs.x86_64 > > glusterfs-debuginfo-3.12.2-18.el7.x86_64 > > glusterfs-libs-3.12.2-19.el7rhgs.x86_64 > > glusterfs-cli-3.12.2-19.el7rhgs.x86_64 > > glusterfs-client-xlators-3.12.2-19.el7rhgs.x86_64 > > glusterfs-fuse-3.12.2-19.el7rhgs.x86_64 > > glusterfs-rdma-3.12.2-19.el7rhgs.x86_64 > > glusterfs-events-3.12.2-19.el7rhgs.x86_64 > > samba-vfs-glusterfs-4.8.3-4.el7.x86_64 > > > > Client version: > > [root@nssguest ~]# rpm -qa |grep gluster > > qemu-kvm-block-gluster-3.1.0-18.module+el8+2834+fa8bb6e2.x86_64 > > glusterfs-3.12.2-32.1.el8.x86_64 > > glusterfs-client-xlators-3.12.2-32.1.el8.x86_64 > > libvirt-daemon-driver-storage-gluster-5.0.0-6.virtcov.el8.x86_64 > > glusterfs-libs-3.12.2-32.1.el8.x86_64 > > glusterfs-cli-3.12.2-32.1.el8.x86_64 > > glusterfs-api-3.12.2-32.1.el8.x86_64 > > Thanks. > > I tried the same set of steps with the same versions of gluster client and > server and the test works for me everytime. > Perhaps the ONLY difference between your configuration and mine is that my > gluster-client is also on rhel7 unlike yours where you're running rhel8 on > the client machine. Also the qemu-img versions could be different. > > Are you hitting this issue even with fuse mount, i.e., when you run > `qemu-img info` this way - `qemu-img info $FUSE_MOUNT_PATH/aaa.qcow2`? > > If yes, could you run both `qemu-img create` and `qemu-img info` commands > with strace for a fresh file: > > # strace -ff -T -v -o /tmp/qemu-img-create.out qemu-img create -f qcow2 > $IMAGE_PATH 10M > # strace -ff -T -v -o /tmp/qemu-img-info.out info $IMAGE_PATH_OVER_FUSE_MOUNT > > > and share all of the resultant output files having format > qemu-img-create.out* and qemu-img-info.out*? > > -Krutika I think this bug only happens when we create a file on the mounted path and check it with `qemu-img info gluster://$ip/filename` ,and `qemu-img info $FUSE_MOUNT_PATH/filename ` works well. --- Additional comment from Krutika Dhananjay on 2019-03-20 05:20:47 UTC --- OK, I took a look at the traces. Unfortunately in the libgfapi-access case, we need ltrace output instead of strace since all calls are made in the userspace. I did test ltrace command before sharing it with you just to be sure it works. But i see that the arguments to the library calls are not printed as symbols. Since you're seeing this issue only with gfapi, I'm passing this issue over to gfapi experts for a faster resolution. Poornima/Soumya/Jiffin, Could one of you help? -Krutika --- Additional comment from Soumya Koduri on 2019-03-20 17:38:10 UTC --- To start with, getting the logs exclusive to gfapi access and tcpdump while the below command is ran would be helpful - qemu-img info gluster://$ip/filename --- Additional comment from Krutika Dhananjay on 2019-03-21 05:45:15 UTC --- Setting needinfo on the reporter to get the info requested in comment 11. --- Additional comment from gaojianan on 2019-03-22 06:55:49 UTC --- --- Additional comment from Yaniv Kaul on 2019-04-22 07:19:24 UTC --- Status? --- Additional comment from PnT Account Manager on 2019-11-04 22:30:24 UTC --- Employee 'pgurusid' has left the company. --- Additional comment from Mohit Agrawal on 2019-11-19 13:52:39 UTC --- @Soumya Did you get a chance to analyze the logs and tcpdump? Thanks, Mohit Agrawal --- Additional comment from Soumya Koduri on 2019-11-20 18:01:32 UTC --- (In reply to Mohit Agrawal from comment #16) > @Soumya > > Did you get a chance to analyze the logs and tcpdump? > > Thanks, > Mohit Agrawal Hi, I just looked at the files uploaded. The tcpdump doesnt have gluster traffic captured. Please ensure if the command was issued on the right machine (where the command is being executed) and verify the filters (for the right interface and IP etc) From the logs, I see there is a failure for SEEK() fop - [2019-03-22 06:47:34.557047] T [MSGID: 0] [dht-hashfn.c:94:dht_hash_compute] 0-gv1-dht: trying regex for test.img [2019-03-22 06:47:34.557059] D [MSGID: 0] [dht-common.c:3675:dht_lookup] 0-gv1-dht: Calling fresh lookup for /test.img on gv1-client-0 [2019-03-22 06:47:34.557067] T [MSGID: 0] [dht-common.c:3679:dht_lookup] 0-stack-trace: stack-address: 0x55ce03dd1720, winding from gv1-dht to gv1-client-0 [2019-03-22 06:47:34.557079] T [rpc-clnt.c:1496:rpc_clnt_record] 0-gv1-client-0: Auth Info: pid: 10233, uid: 0, gid: 0, owner: [2019-03-22 06:47:34.557086] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 420, payload: 348, rpc hdr: 72 [2019-03-22 06:47:34.557110] T [rpc-clnt.c:1699:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0xb Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (gv1-client-0) [2019-03-22 06:47:34.557513] T [rpc-clnt.c:675:rpc_clnt_reply_init] 0-gv1-client-0: received rpc message (RPC XID: 0xb Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport (gv1-client-0) [2019-03-22 06:47:34.557536] T [MSGID: 0] [client-rpc-fops.c:2873:client3_3_lookup_cbk] 0-stack-trace: stack-address: 0x55ce03dd1720, gv1-client-0 returned 0 [2019-03-22 06:47:34.557549] D [MSGID: 0] [dht-common.c:3228:dht_lookup_cbk] 0-gv1-dht: fresh_lookup returned for /test.img with op_ret 0 >> LOOKUP on /test.img was successful [2019-03-22 06:47:34.563416] T [MSGID: 0] [defaults.c:2927:default_seek] 0-stack-trace: stack-address: 0x55ce03dd1720, winding from gv1-read-ahead to gv1-write-behind [2019-03-22 06:47:34.563424] T [MSGID: 0] [defaults.c:2927:default_seek] 0-stack-trace: stack-address: 0x55ce03dd1720, winding from gv1-write-behind to gv1-dht [2019-03-22 06:47:34.563432] T [MSGID: 0] [defaults.c:2927:default_seek] 0-stack-trace: stack-address: 0x55ce03dd1720, winding from gv1-dht to gv1-client-0 [2019-03-22 06:47:34.563443] T [rpc-clnt.c:1496:rpc_clnt_record] 0-gv1-client-0: Auth Info: pid: 10233, uid: 0, gid: 0, owner: [2019-03-22 06:47:34.563451] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 112, payload: 40, rpc hdr: 72 [2019-03-22 06:47:34.563478] T [rpc-clnt.c:1699:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0xc Program: GlusterFS 3.3, ProgVers: 330, Proc: 48) to rpc-transport (gv1-client-0) [2019-03-22 06:47:34.563990] T [rpc-clnt.c:675:rpc_clnt_reply_init] 0-gv1-client-0: received rpc message (RPC XID: 0xc Program: GlusterFS 3.3, ProgVers: 330, Proc: 48) from rpc-transport (gv1-client-0) [2019-03-22 06:47:34.564008] W [MSGID: 114031] [client-rpc-fops.c:2156:client3_3_seek_cbk] 0-gv1-client-0: remote operation failed [No such device or address] [2019-03-22 06:47:34.564028] D [MSGID: 0] [client-rpc-fops.c:2160:client3_3_seek_cbk] 0-stack-trace: stack-address: 0x55ce03dd1720, gv1-client-0 returned -1 error: No such device or address [No such device or address] [2019-03-22 06:47:34.564041] D [MSGID: 0] [defaults.c:1531:default_seek_cbk] 0-stack-trace: stack-address: 0x55ce03dd1720, gv1-io-threads returned -1 error: No such device or address [No such device or address] [2019-03-22 06:47:34.564051] D [MSGID: 0] [io-stats.c:2548:io_stats_seek_cbk] 0-stack-trace: stack-address: 0x55ce03dd1720, gv1 returned -1 error: No such device or address [No such device or address] client3_seek_cbk() received '-1'. We may first need to check why the fop was failed by server. If its reproducible, it should be fairly easy to check. --- Additional comment from Mohit Agrawal on 2019-11-21 02:53:42 UTC --- @gaojianan Can you share the data asked by Soumya and share the brick logs along with data(client-logs and tcpdump)? --- Additional comment from gaojianan on 2019-11-21 06:55:31 UTC --- (In reply to Mohit Agrawal from comment #18) > @gaojianan > Can you share the data asked by Soumya and share the brick logs along with > data(client-logs and tcpdump)? client version: glusterfs-client-xlators-6.0-20.el8.x86_64 glusterfs-libs-6.0-20.el8.x86_64 qemu-kvm-block-gluster-4.1.0-13.module+el8.1.0+4313+ef76ec61.x86_64 glusterfs-fuse-6.0-20.el8.x86_64 libvirt-daemon-driver-storage-gluster-5.6.0-7.module+el8.1.1+4483+2f45aaa2.x86_64 glusterfs-api-6.0-20.el8.x86_64 glusterfs-cli-6.0-20.el8.x86_64 glusterfs-6.0-20.el8.x86_64 Try again with the step as comment1. Steps to Reproduce: 1.Mount the gluster directory to local /tmp/gluster # mount.glusterfs 10.66.85.243:/jgao-vol1 /tmp/gluster 2.Create a new qcow2 file # qemu-img create -f qcow2 /tmp/gluster/test.img 100M 3.check it with qemu-img with gluster [root@localhost ~]# qemu-img info gluster://10.66.85.243/jgao-vol1/test.img qemu-img: Could not open 'gluster://10.66.85.243/jgao-vol1/test.img': Could not read L1 table: Input/output error More detail info in the attachment If any other question,you can needinfo me again. --- Additional comment from Han Han on 2019-11-21 07:11:05 UTC --- (In reply to gaojianan from comment #19) > Created attachment 1638327 [details] > tcpdump log and gfapi log of the client the tcpdump file contains too much other protocol data. It is better to use filter to get only glusterfs related network traffic. BTW, I have a questions, what ports are used in gluserfs by default for gluster-server-6.0.x ? 24007-24009? 49152? > > (In reply to Mohit Agrawal from comment #18) > > @gaojianan > > Can you share the data asked by Soumya and share the brick logs along with > > data(client-logs and tcpdump)? > client version: > glusterfs-client-xlators-6.0-20.el8.x86_64 > glusterfs-libs-6.0-20.el8.x86_64 > qemu-kvm-block-gluster-4.1.0-13.module+el8.1.0+4313+ef76ec61.x86_64 > glusterfs-fuse-6.0-20.el8.x86_64 > libvirt-daemon-driver-storage-gluster-5.6.0-7.module+el8.1.1+4483+2f45aaa2. > x86_64 > glusterfs-api-6.0-20.el8.x86_64 > glusterfs-cli-6.0-20.el8.x86_64 > glusterfs-6.0-20.el8.x86_64 > > > > Try again with the step as comment1. > Steps to Reproduce: > 1.Mount the gluster directory to local /tmp/gluster > # mount.glusterfs 10.66.85.243:/jgao-vol1 /tmp/gluster > > 2.Create a new qcow2 file > # qemu-img create -f qcow2 /tmp/gluster/test.img 100M > > 3.check it with qemu-img with gluster > [root@localhost ~]# qemu-img info gluster://10.66.85.243/jgao-vol1/test.img > qemu-img: Could not open 'gluster://10.66.85.243/jgao-vol1/test.img': Could > not read L1 table: Input/output error > > More detail info in the attachment > If any other question,you can needinfo me again. --- Additional comment from Han Han on 2019-11-21 07:14:37 UTC --- What's more, please update brick logs as comment18 said. That log is located in /var/log/glusterfs/bricks/ on glusterfs server. --- Additional comment from gaojianan on 2019-11-21 07:57:53 UTC --- In the bricks log,the "gluster-vol1" is the same as "jgao-vol1" in other two files because i destroyed my env and setup again. --- Additional comment from Mohit Agrawal on 2019-11-22 04:45:08 UTC --- @Soumya Please check the latest logs and tcpdump? Thanks, Mohit Agrawal --- Additional comment from Soumya Koduri on 2019-11-22 06:57:08 UTC --- From the latest debug.log provided, I see this error - [2019-11-21 06:34:15.127610] D [MSGID: 0] [client-helpers.c:427:client_get_remote_fd] 0-jgao-vol1-client-0: not a valid fd for gfid: 59ca8bf2-f75a-427f-857e-98843a85dbac [Bad file descriptor] [2019-11-21 06:34:15.127620] W [MSGID: 114061] [client-common.c:1288:client_pre_seek] 0-jgao-vol1-client-0: (59ca8bf2-f75a-427f-857e-98843a85dbac) remote_fd is -1. EBADFD [File descriptor in bad state] [2019-11-21 06:34:15.127628] D [MSGID: 0] [client-rpc-fops.c:5949:client3_3_seek] 0-stack-trace: stack-address: 0x5625eed41b08, jgao-vol1-client-0 returned -1 error: File descriptor in bad state [File descriptor in bad state] [2019-11-21 06:34:15.127636] D [MSGID: 0] [defaults.c:1617:default_seek_cbk] 0-stack-trace: stack-address: 0x5625eed41b08, jgao-vol1-io-threads returned -1 error: File descriptor in bad state [File descriptor in bad state] client3_seek fop got EBADFD error. The fd used in the flag may have got flushed and no more valid. On further code-reading found that there is a bug in glfs_seek() fop. There is a missing ref on glfd which may have led to this issue. I will send patch to fix that. But however I am unable to reproduce this issue to test it. On my system the test always pass - [root@dhcp35-198 ~]# qemu-img create -f qcow2 /fuse-mnt/test.img 100M Formatting '/fuse-mnt/test.img', fmt=qcow2 size=104857600 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 [root@dhcp35-198 ~]# [root@dhcp35-198 ~]# [root@dhcp35-198 ~]# qemu-img info gluster://localhost/rep_vol/test.img [2019-11-22 06:36:43.703941] E [MSGID: 108006] [afr-common.c:5322:__afr_handle_child_down_event] 0-rep_vol-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up. [2019-11-22 06:36:43.705035] I [io-stats.c:4027:fini] 0-rep_vol: io-stats translator unloaded image: gluster://localhost/rep_vol/test.img file format: qcow2 virtual size: 100M (104857600 bytes) disk size: 193K cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false [root@dhcp35-198 ~]# I am using latest master branch of gluster. I shall post the fix for the bug in glfs_seek mentioned above. But if someone could test it, that shall be helpful.
REVIEW: https://review.gluster.org/23739 (gfapi/seek: Fix missing ref on glfd) posted (#1) for review on master by soumya k
As Niels pointed out, glfs_seek() is not a public API and gets called only from glfs_lseek() which already takes ref on glfd. This is not a bug.