Bug 1775512 - Get errors"Could not read qcow2 header" when read qcow2 file with glusterfs
Summary: Get errors"Could not read qcow2 header" when read qcow2 file with glusterfs
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: GlusterFS
Classification: Community
Component: libgfapi
Version: mainline
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: Soumya Koduri
QA Contact: bugs@gluster.org
URL:
Whiteboard:
Depends On:
Blocks: 1663431
TreeView+ depends on / blocked
 
Reported: 2019-11-22 06:58 UTC by Soumya Koduri
Modified: 2019-11-25 06:24 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1663431
Environment:
Last Closed: 2019-11-25 06:24:54 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 23739 0 None Abandoned gfapi/seek: Fix missing ref on glfd 2019-11-25 06:23:44 UTC

Description Soumya Koduri 2019-11-22 06:58:12 UTC
+++ This bug was initially created as a clone of Bug #1663431 +++

Description of problem:
Get errors"Could not read qcow2 header" when  read  qcow2 file in glusterfs

Version-Release number of selected component (if applicable):
gluster server:glusterfs-3.12.2-19.el7rhgs.x86_64
client:glusterfs-3.12.2-15.4.el8.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Mount the gluster directory to local /mnt
# mount.glusterfs 10.66.4.119:/gv0  /mnt/

2.Create a new qcow2 file 
# qemu-img create -f qcow2 /mnt/qcow2mnt.img 10M

3.check it with qemu-img with gluster
[root@localhost ~]# qemu-img info gluster://10.66.4.119/gv0/qcow2mnt.img
qemu-img: Could not open 'gluster://10.66.4.119/gv0/qcow2mnt.img': Could not read L1 table: Input/output error

Actual results:
As above

Expected results:
Can get the correct info of the qcow2 file.

Additional info:
1."raw" image is ok in this scenario.
2.qemu-img info /mnt/qcow2mnt.img works well

--- Additional comment from Red Hat Bugzilla Rules Engine on 2019-01-04 10:08:17 UTC ---

This bug is automatically being proposed for a Z-stream release of Red Hat Gluster Storage 3 under active development and open for bug fixes, by setting the release flag 'rhgs‑3.4.z' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from  on 2019-02-01 10:09:43 UTC ---

hit the same issue:
(.libvirt-ci-venv-ci-runtest-sBLGCJ) [root@hp-dl380g9-02 virtual_disks]# rpm -qa | grep gluster
libvirt-daemon-driver-storage-gluster-4.5.0-19.module+el8+2712+4c318da1.x86_64
glusterfs-client-xlators-3.12.2-32.1.el8.x86_64
qemu-kvm-block-gluster-2.12.0-59.module+el8+2714+6d9351dd.x86_64
glusterfs-cli-3.12.2-32.1.el8.x86_64
glusterfs-libs-3.12.2-32.1.el8.x86_64
glusterfs-api-3.12.2-32.1.el8.x86_64
glusterfs-3.12.2-32.1.el8.x86_64
glusterfs-fuse-3.12.2-32.1.el8.x86_64

(.libvirt-ci-venv-ci-runtest-sBLGCJ) [root@hp-dl380g9-02 virtual_disks]# qemu-img info gluster://10.66.7.98/gluster-vol1/aaa.qcow2
qemu-img: Could not open 'gluster://10.66.7.98/gluster-vol1/aaa.qcow2': Could not read L1 table: Input/output error

And this will block vm using the gluster disk, so escalate the priority
(.libvirt-ci-venv-ci-runtest-sBLGCJ) [root@hp-dl380g9-02 virtual_disks]# cat gdisk
<disk device="disk" type="network"><driver cache="none" name="qemu" type="qcow2" /><target bus="virtio" dev="vdb" /><source name="gluster-vol1/aaa.qcow2" protocol="gluster"><host name="10.66.7.98" port="24007" /></source></disk>

(.libvirt-ci-venv-ci-runtest-sBLGCJ) [root@hp-dl380g9-02 virtual_disks]# virsh attach-device avocado-vt-vm1 gdisk
error: Failed to attach device from gdisk
error: internal error: unable to execute QEMU command 'device_add': Property 'virtio-blk-device.drive' can't find value 'drive-virtio-disk1'

--- Additional comment from Amar Tumballi on 2019-03-13 13:38:36 UTC ---

Moving bug to Krutika as she is more experienced in Virt workloads. 

Meantime, looking at glusterfs version, this is RHGS 3.4 builds.

--- Additional comment from Krutika Dhananjay on 2019-03-14 06:49:47 UTC ---

Could you share the following two pieces of information -

1. output of `gluster volume info $VOLNAME`
2. Are the glusterfs client and server running the same version of gluster/RHGS?

-Krutika

--- Additional comment from Krutika Dhananjay on 2019-03-14 06:53:36 UTC ---

(In reply to Krutika Dhananjay from comment #4)
> Could you share the following two pieces of information -
> 
> 1. output of `gluster volume info $VOLNAME`
> 2. Are the glusterfs client and server running the same version of
> gluster/RHGS?

Let me clarify why I'm asking about the versions - the bug's "Description" section says this -
gluster server:glusterfs-3.12.2-19.el7rhgs.x86_64
client:glusterfs-3.12.2-15.4.el8.x86_64"

but comment 2 lists the client package as glusterfs-client-xlators-3.12.2-32.1.el8.x86_64

Want to be sure about the exact versions being used so I can recreate it.
(Looked at the logs, not much clue there)

-Krutika

> 
> -Krutika

--- Additional comment from gaojianan on 2019-03-15 01:51:43 UTC ---

(In reply to Krutika Dhananjay from comment #4)
> Could you share the following two pieces of information -
> 
> 1. output of `gluster volume info $VOLNAME`
> 2. Are the glusterfs client and server running the same version of
> gluster/RHGS?
> 
> -Krutika

1.`gluster volume info $VOLNAME`
[root@node1 ~]# gluster volume info gv1
 
Volume Name: gv1
Type: Distribute
Volume ID: de5d9272-e237-4a4e-8a30-a7c737f393db
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.66.4.119:/br2
Options Reconfigured:
nfs.disable: on
transport.address-family: inet


2.Server version:
[root@node1 ~]# rpm -qa |grep gluster
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64
glusterfs-api-devel-3.12.2-19.el7rhgs.x86_64
pcp-pmda-gluster-4.1.0-4.el7.x86_64
glusterfs-3.12.2-19.el7rhgs.x86_64
python2-gluster-3.12.2-19.el7rhgs.x86_64
glusterfs-server-3.12.2-19.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-19.el7rhgs.x86_64
glusterfs-api-3.12.2-19.el7rhgs.x86_64
glusterfs-devel-3.12.2-19.el7rhgs.x86_64
glusterfs-debuginfo-3.12.2-18.el7.x86_64
glusterfs-libs-3.12.2-19.el7rhgs.x86_64
glusterfs-cli-3.12.2-19.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-19.el7rhgs.x86_64
glusterfs-fuse-3.12.2-19.el7rhgs.x86_64
glusterfs-rdma-3.12.2-19.el7rhgs.x86_64
glusterfs-events-3.12.2-19.el7rhgs.x86_64
samba-vfs-glusterfs-4.8.3-4.el7.x86_64

Client version:
[root@nssguest ~]# rpm -qa |grep gluster
qemu-kvm-block-gluster-3.1.0-18.module+el8+2834+fa8bb6e2.x86_64
glusterfs-3.12.2-32.1.el8.x86_64
glusterfs-client-xlators-3.12.2-32.1.el8.x86_64
libvirt-daemon-driver-storage-gluster-5.0.0-6.virtcov.el8.x86_64
glusterfs-libs-3.12.2-32.1.el8.x86_64
glusterfs-cli-3.12.2-32.1.el8.x86_64
glusterfs-api-3.12.2-32.1.el8.x86_64

--- Additional comment from Krutika Dhananjay on 2019-03-18 05:59:49 UTC ---

(In reply to gaojianan from comment #6)
> (In reply to Krutika Dhananjay from comment #4)
> > Could you share the following two pieces of information -
> > 
> > 1. output of `gluster volume info $VOLNAME`
> > 2. Are the glusterfs client and server running the same version of
> > gluster/RHGS?
> > 
> > -Krutika
> 
> 1.`gluster volume info $VOLNAME`
> [root@node1 ~]# gluster volume info gv1
>  
> Volume Name: gv1
> Type: Distribute
> Volume ID: de5d9272-e237-4a4e-8a30-a7c737f393db
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: 10.66.4.119:/br2
> Options Reconfigured:
> nfs.disable: on
> transport.address-family: inet
> 
> 
> 2.Server version:
> [root@node1 ~]# rpm -qa |grep gluster
> libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64
> glusterfs-api-devel-3.12.2-19.el7rhgs.x86_64
> pcp-pmda-gluster-4.1.0-4.el7.x86_64
> glusterfs-3.12.2-19.el7rhgs.x86_64
> python2-gluster-3.12.2-19.el7rhgs.x86_64
> glusterfs-server-3.12.2-19.el7rhgs.x86_64
> glusterfs-geo-replication-3.12.2-19.el7rhgs.x86_64
> glusterfs-api-3.12.2-19.el7rhgs.x86_64
> glusterfs-devel-3.12.2-19.el7rhgs.x86_64
> glusterfs-debuginfo-3.12.2-18.el7.x86_64
> glusterfs-libs-3.12.2-19.el7rhgs.x86_64
> glusterfs-cli-3.12.2-19.el7rhgs.x86_64
> glusterfs-client-xlators-3.12.2-19.el7rhgs.x86_64
> glusterfs-fuse-3.12.2-19.el7rhgs.x86_64
> glusterfs-rdma-3.12.2-19.el7rhgs.x86_64
> glusterfs-events-3.12.2-19.el7rhgs.x86_64
> samba-vfs-glusterfs-4.8.3-4.el7.x86_64
> 
> Client version:
> [root@nssguest ~]# rpm -qa |grep gluster
> qemu-kvm-block-gluster-3.1.0-18.module+el8+2834+fa8bb6e2.x86_64
> glusterfs-3.12.2-32.1.el8.x86_64
> glusterfs-client-xlators-3.12.2-32.1.el8.x86_64
> libvirt-daemon-driver-storage-gluster-5.0.0-6.virtcov.el8.x86_64
> glusterfs-libs-3.12.2-32.1.el8.x86_64
> glusterfs-cli-3.12.2-32.1.el8.x86_64
> glusterfs-api-3.12.2-32.1.el8.x86_64

Thanks.

I tried the same set of steps with the same versions of gluster client and server and the test works for me everytime.
Perhaps the ONLY difference between your configuration and mine is that my gluster-client is also on rhel7 unlike yours where you're running rhel8 on the client machine. Also the qemu-img versions could be different.

Are you hitting this issue even with fuse mount, i.e., when you run `qemu-img info` this way - `qemu-img info $FUSE_MOUNT_PATH/aaa.qcow2`?

If yes, could you run both `qemu-img create` and `qemu-img info` commands with strace for a fresh file:

# strace -ff -T -v -o /tmp/qemu-img-create.out qemu-img create -f qcow2 $IMAGE_PATH 10M
# strace -ff -T -v -o /tmp/qemu-img-info.out info $IMAGE_PATH_OVER_FUSE_MOUNT


and share all of the resultant output files having format qemu-img-create.out* and qemu-img-info.out*?

-Krutika

--- Additional comment from gaojianan on 2019-03-18 07:07:51 UTC ---



--- Additional comment from gaojianan on 2019-03-18 07:10:05 UTC ---

(In reply to Krutika Dhananjay from comment #7)
> (In reply to gaojianan from comment #6)
> > (In reply to Krutika Dhananjay from comment #4)
> > > Could you share the following two pieces of information -
> > > 
> > > 1. output of `gluster volume info $VOLNAME`
> > > 2. Are the glusterfs client and server running the same version of
> > > gluster/RHGS?
> > > 
> > > -Krutika
> > 
> > 1.`gluster volume info $VOLNAME`
> > [root@node1 ~]# gluster volume info gv1
> >  
> > Volume Name: gv1
> > Type: Distribute
> > Volume ID: de5d9272-e237-4a4e-8a30-a7c737f393db
> > Status: Started
> > Snapshot Count: 0
> > Number of Bricks: 1
> > Transport-type: tcp
> > Bricks:
> > Brick1: 10.66.4.119:/br2
> > Options Reconfigured:
> > nfs.disable: on
> > transport.address-family: inet
> > 
> > 
> > 2.Server version:
> > [root@node1 ~]# rpm -qa |grep gluster
> > libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64
> > glusterfs-api-devel-3.12.2-19.el7rhgs.x86_64
> > pcp-pmda-gluster-4.1.0-4.el7.x86_64
> > glusterfs-3.12.2-19.el7rhgs.x86_64
> > python2-gluster-3.12.2-19.el7rhgs.x86_64
> > glusterfs-server-3.12.2-19.el7rhgs.x86_64
> > glusterfs-geo-replication-3.12.2-19.el7rhgs.x86_64
> > glusterfs-api-3.12.2-19.el7rhgs.x86_64
> > glusterfs-devel-3.12.2-19.el7rhgs.x86_64
> > glusterfs-debuginfo-3.12.2-18.el7.x86_64
> > glusterfs-libs-3.12.2-19.el7rhgs.x86_64
> > glusterfs-cli-3.12.2-19.el7rhgs.x86_64
> > glusterfs-client-xlators-3.12.2-19.el7rhgs.x86_64
> > glusterfs-fuse-3.12.2-19.el7rhgs.x86_64
> > glusterfs-rdma-3.12.2-19.el7rhgs.x86_64
> > glusterfs-events-3.12.2-19.el7rhgs.x86_64
> > samba-vfs-glusterfs-4.8.3-4.el7.x86_64
> > 
> > Client version:
> > [root@nssguest ~]# rpm -qa |grep gluster
> > qemu-kvm-block-gluster-3.1.0-18.module+el8+2834+fa8bb6e2.x86_64
> > glusterfs-3.12.2-32.1.el8.x86_64
> > glusterfs-client-xlators-3.12.2-32.1.el8.x86_64
> > libvirt-daemon-driver-storage-gluster-5.0.0-6.virtcov.el8.x86_64
> > glusterfs-libs-3.12.2-32.1.el8.x86_64
> > glusterfs-cli-3.12.2-32.1.el8.x86_64
> > glusterfs-api-3.12.2-32.1.el8.x86_64
> 
> Thanks.
> 
> I tried the same set of steps with the same versions of gluster client and
> server and the test works for me everytime.
> Perhaps the ONLY difference between your configuration and mine is that my
> gluster-client is also on rhel7 unlike yours where you're running rhel8 on
> the client machine. Also the qemu-img versions could be different.
> 
> Are you hitting this issue even with fuse mount, i.e., when you run
> `qemu-img info` this way - `qemu-img info $FUSE_MOUNT_PATH/aaa.qcow2`?
> 
> If yes, could you run both `qemu-img create` and `qemu-img info` commands
> with strace for a fresh file:
> 
> # strace -ff -T -v -o /tmp/qemu-img-create.out qemu-img create -f qcow2
> $IMAGE_PATH 10M
> # strace -ff -T -v -o /tmp/qemu-img-info.out info $IMAGE_PATH_OVER_FUSE_MOUNT
> 
> 
> and share all of the resultant output files having format
> qemu-img-create.out* and qemu-img-info.out*?
> 
> -Krutika

I think this bug only happens when we create a file on the mounted path and check it with `qemu-img info gluster://$ip/filename` ,and `qemu-img info $FUSE_MOUNT_PATH/filename ` works well.

--- Additional comment from Krutika Dhananjay on 2019-03-20 05:20:47 UTC ---

OK, I took a look at the traces. Unfortunately in the libgfapi-access case, we need ltrace output instead of strace since all calls are made in the userspace.
I did test ltrace command before sharing it with you just to be sure it works. But i see that the arguments to the library calls are not printed as symbols.

Since you're seeing this issue only with gfapi, I'm passing this issue over to gfapi experts for a faster resolution.

Poornima/Soumya/Jiffin,

Could one of you help?

-Krutika

--- Additional comment from Soumya Koduri on 2019-03-20 17:38:10 UTC ---

To start with, getting the logs exclusive to gfapi access and tcpdump while the below command is ran would be helpful -

qemu-img info gluster://$ip/filename

--- Additional comment from Krutika Dhananjay on 2019-03-21 05:45:15 UTC ---

Setting needinfo on the reporter to get the info requested in comment 11.

--- Additional comment from gaojianan on 2019-03-22 06:55:49 UTC ---



--- Additional comment from Yaniv Kaul on 2019-04-22 07:19:24 UTC ---

Status?

--- Additional comment from PnT Account Manager on 2019-11-04 22:30:24 UTC ---

Employee 'pgurusid' has left the company.

--- Additional comment from Mohit Agrawal on 2019-11-19 13:52:39 UTC ---

@Soumya

Did you get a chance to analyze the logs and tcpdump?

Thanks,
Mohit Agrawal

--- Additional comment from Soumya Koduri on 2019-11-20 18:01:32 UTC ---

(In reply to Mohit Agrawal from comment #16)
> @Soumya
> 
> Did you get a chance to analyze the logs and tcpdump?
> 
> Thanks,
> Mohit Agrawal

Hi,

I just looked at the files uploaded. The tcpdump doesnt have gluster traffic captured. Please ensure if the command was issued on the right machine (where the command is being executed) and verify the filters (for the right interface and IP etc)

From the logs, I see there is a failure for SEEK() fop -



[2019-03-22 06:47:34.557047] T [MSGID: 0] [dht-hashfn.c:94:dht_hash_compute] 0-gv1-dht: trying regex for test.img
[2019-03-22 06:47:34.557059] D [MSGID: 0] [dht-common.c:3675:dht_lookup] 0-gv1-dht: Calling fresh lookup for /test.img on gv1-client-0
[2019-03-22 06:47:34.557067] T [MSGID: 0] [dht-common.c:3679:dht_lookup] 0-stack-trace: stack-address: 0x55ce03dd1720, winding from gv1-dht to gv1-client-0
[2019-03-22 06:47:34.557079] T [rpc-clnt.c:1496:rpc_clnt_record] 0-gv1-client-0: Auth Info: pid: 10233, uid: 0, gid: 0, owner: 
[2019-03-22 06:47:34.557086] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 420, payload: 348, rpc hdr: 72
[2019-03-22 06:47:34.557110] T [rpc-clnt.c:1699:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0xb Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (gv1-client-0)
[2019-03-22 06:47:34.557513] T [rpc-clnt.c:675:rpc_clnt_reply_init] 0-gv1-client-0: received rpc message (RPC XID: 0xb Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport (gv1-client-0)
[2019-03-22 06:47:34.557536] T [MSGID: 0] [client-rpc-fops.c:2873:client3_3_lookup_cbk] 0-stack-trace: stack-address: 0x55ce03dd1720, gv1-client-0 returned 0
[2019-03-22 06:47:34.557549] D [MSGID: 0] [dht-common.c:3228:dht_lookup_cbk] 0-gv1-dht: fresh_lookup returned for /test.img with op_ret 0


>> LOOKUP on  /test.img was successful



[2019-03-22 06:47:34.563416] T [MSGID: 0] [defaults.c:2927:default_seek] 0-stack-trace: stack-address: 0x55ce03dd1720, winding from gv1-read-ahead to gv1-write-behind
[2019-03-22 06:47:34.563424] T [MSGID: 0] [defaults.c:2927:default_seek] 0-stack-trace: stack-address: 0x55ce03dd1720, winding from gv1-write-behind to gv1-dht
[2019-03-22 06:47:34.563432] T [MSGID: 0] [defaults.c:2927:default_seek] 0-stack-trace: stack-address: 0x55ce03dd1720, winding from gv1-dht to gv1-client-0
[2019-03-22 06:47:34.563443] T [rpc-clnt.c:1496:rpc_clnt_record] 0-gv1-client-0: Auth Info: pid: 10233, uid: 0, gid: 0, owner: 
[2019-03-22 06:47:34.563451] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 112, payload: 40, rpc hdr: 72
[2019-03-22 06:47:34.563478] T [rpc-clnt.c:1699:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0xc Program: GlusterFS 3.3, ProgVers: 330, Proc: 48) to rpc-transport (gv1-client-0)
[2019-03-22 06:47:34.563990] T [rpc-clnt.c:675:rpc_clnt_reply_init] 0-gv1-client-0: received rpc message (RPC XID: 0xc Program: GlusterFS 3.3, ProgVers: 330, Proc: 48) from rpc-transport (gv1-client-0)
[2019-03-22 06:47:34.564008] W [MSGID: 114031] [client-rpc-fops.c:2156:client3_3_seek_cbk] 0-gv1-client-0: remote operation failed [No such device or address]
[2019-03-22 06:47:34.564028] D [MSGID: 0] [client-rpc-fops.c:2160:client3_3_seek_cbk] 0-stack-trace: stack-address: 0x55ce03dd1720, gv1-client-0 returned -1 error: No such device or address [No such device or address]
[2019-03-22 06:47:34.564041] D [MSGID: 0] [defaults.c:1531:default_seek_cbk] 0-stack-trace: stack-address: 0x55ce03dd1720, gv1-io-threads returned -1 error: No such device or address [No such device or address]
[2019-03-22 06:47:34.564051] D [MSGID: 0] [io-stats.c:2548:io_stats_seek_cbk] 0-stack-trace: stack-address: 0x55ce03dd1720, gv1 returned -1 error: No such device or address [No such device or address]

client3_seek_cbk() received '-1'. We may first need to check why the fop was failed by server. If its reproducible, it should be fairly easy to check.

--- Additional comment from Mohit Agrawal on 2019-11-21 02:53:42 UTC ---

@gaojianan
Can you share the data asked by Soumya and share the brick logs along with data(client-logs and tcpdump)?

--- Additional comment from gaojianan on 2019-11-21 06:55:31 UTC ---

(In reply to Mohit Agrawal from comment #18)
> @gaojianan
> Can you share the data asked by Soumya and share the brick logs along with
> data(client-logs and tcpdump)?
client version:
glusterfs-client-xlators-6.0-20.el8.x86_64
glusterfs-libs-6.0-20.el8.x86_64
qemu-kvm-block-gluster-4.1.0-13.module+el8.1.0+4313+ef76ec61.x86_64
glusterfs-fuse-6.0-20.el8.x86_64
libvirt-daemon-driver-storage-gluster-5.6.0-7.module+el8.1.1+4483+2f45aaa2.x86_64
glusterfs-api-6.0-20.el8.x86_64
glusterfs-cli-6.0-20.el8.x86_64
glusterfs-6.0-20.el8.x86_64



Try again with the step as comment1.
Steps to Reproduce:
1.Mount the gluster directory to local /tmp/gluster
# mount.glusterfs 10.66.85.243:/jgao-vol1 /tmp/gluster

2.Create a new qcow2 file 
# qemu-img create -f qcow2 /tmp/gluster/test.img 100M

3.check it with qemu-img with gluster
[root@localhost ~]# qemu-img info gluster://10.66.85.243/jgao-vol1/test.img
qemu-img: Could not open 'gluster://10.66.85.243/jgao-vol1/test.img': Could not read L1 table: Input/output error

More detail info in the attachment
If any other question,you can needinfo me again.

--- Additional comment from Han Han on 2019-11-21 07:11:05 UTC ---

(In reply to gaojianan from comment #19)
> Created attachment 1638327 [details]
> tcpdump log and gfapi log of the client
the tcpdump file contains too much other protocol data. It is better to use
filter to get only glusterfs related network traffic.

BTW, I have a questions, what ports are used in gluserfs by default for gluster-server-6.0.x ?
24007-24009? 49152?

> 
> (In reply to Mohit Agrawal from comment #18)
> > @gaojianan
> > Can you share the data asked by Soumya and share the brick logs along with
> > data(client-logs and tcpdump)?
> client version:
> glusterfs-client-xlators-6.0-20.el8.x86_64
> glusterfs-libs-6.0-20.el8.x86_64
> qemu-kvm-block-gluster-4.1.0-13.module+el8.1.0+4313+ef76ec61.x86_64
> glusterfs-fuse-6.0-20.el8.x86_64
> libvirt-daemon-driver-storage-gluster-5.6.0-7.module+el8.1.1+4483+2f45aaa2.
> x86_64
> glusterfs-api-6.0-20.el8.x86_64
> glusterfs-cli-6.0-20.el8.x86_64
> glusterfs-6.0-20.el8.x86_64
> 
> 
> 
> Try again with the step as comment1.
> Steps to Reproduce:
> 1.Mount the gluster directory to local /tmp/gluster
> # mount.glusterfs 10.66.85.243:/jgao-vol1 /tmp/gluster
> 
> 2.Create a new qcow2 file 
> # qemu-img create -f qcow2 /tmp/gluster/test.img 100M
> 
> 3.check it with qemu-img with gluster
> [root@localhost ~]# qemu-img info gluster://10.66.85.243/jgao-vol1/test.img
> qemu-img: Could not open 'gluster://10.66.85.243/jgao-vol1/test.img': Could
> not read L1 table: Input/output error
> 
> More detail info in the attachment
> If any other question,you can needinfo me again.

--- Additional comment from Han Han on 2019-11-21 07:14:37 UTC ---

What's more, please update brick logs as comment18 said. That log is located in /var/log/glusterfs/bricks/ on glusterfs server.

--- Additional comment from gaojianan on 2019-11-21 07:57:53 UTC ---

In the bricks log,the "gluster-vol1" is the same as "jgao-vol1" in other two files because i destroyed my env and setup again.

--- Additional comment from Mohit Agrawal on 2019-11-22 04:45:08 UTC ---

@Soumya

Please check the latest logs and tcpdump?

Thanks,
Mohit Agrawal

--- Additional comment from Soumya Koduri on 2019-11-22 06:57:08 UTC ---

From the latest debug.log provided, I see this error -

[2019-11-21 06:34:15.127610] D [MSGID: 0] [client-helpers.c:427:client_get_remote_fd] 0-jgao-vol1-client-0: not a valid fd for gfid: 59ca8bf2-f75a-427f-857e-98843a85dbac [Bad file descriptor]
[2019-11-21 06:34:15.127620] W [MSGID: 114061] [client-common.c:1288:client_pre_seek] 0-jgao-vol1-client-0:  (59ca8bf2-f75a-427f-857e-98843a85dbac) remote_fd is -1. EBADFD [File descriptor in bad state]
[2019-11-21 06:34:15.127628] D [MSGID: 0] [client-rpc-fops.c:5949:client3_3_seek] 0-stack-trace: stack-address: 0x5625eed41b08, jgao-vol1-client-0 returned -1 error: File descriptor in bad state [File descriptor in bad state]
[2019-11-21 06:34:15.127636] D [MSGID: 0] [defaults.c:1617:default_seek_cbk] 0-stack-trace: stack-address: 0x5625eed41b08, jgao-vol1-io-threads returned -1 error: File descriptor in bad state [File descriptor in bad state]

client3_seek fop got EBADFD error. The fd used in the flag may have got flushed and no more valid. On further code-reading found that there is a bug in glfs_seek() fop. There is a missing ref on glfd which may have led to this issue. I will send patch to fix that.

But however I am unable to reproduce this issue to test it. On my system the test always pass -


[root@dhcp35-198 ~]# qemu-img create -f qcow2 /fuse-mnt/test.img 100M
Formatting '/fuse-mnt/test.img', fmt=qcow2 size=104857600 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
[root@dhcp35-198 ~]# 
[root@dhcp35-198 ~]# 
[root@dhcp35-198 ~]# qemu-img info gluster://localhost/rep_vol/test.img
[2019-11-22 06:36:43.703941] E [MSGID: 108006] [afr-common.c:5322:__afr_handle_child_down_event] 0-rep_vol-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2019-11-22 06:36:43.705035] I [io-stats.c:4027:fini] 0-rep_vol: io-stats translator unloaded
image: gluster://localhost/rep_vol/test.img
file format: qcow2
virtual size: 100M (104857600 bytes)
disk size: 193K
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
[root@dhcp35-198 ~]# 

I am using latest master branch of gluster. I shall post the fix for the bug in glfs_seek mentioned above. But if someone could test it, that shall be helpful.

Comment 1 Worker Ant 2019-11-22 07:00:41 UTC
REVIEW: https://review.gluster.org/23739 (gfapi/seek: Fix missing ref on glfd) posted (#1) for review on master by soumya k

Comment 2 Soumya Koduri 2019-11-25 06:24:54 UTC
As Niels pointed out, glfs_seek() is not a public API and gets called only from glfs_lseek() which already takes ref on glfd. This is not a bug.


Note You need to log in before you can comment on or make changes to this bug.