Bug 1663431

Summary:

Get errors"Could not read qcow2 header" when read qcow2 file with glusterfs

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

gaojianan <jgao>

Component:

libgfapi

Assignee:

Michael Adam <madam>

Status:

CLOSED WORKSFORME

QA Contact:

Vivek Das <vdas>

Severity:

high

Docs Contact:

Priority:

high

Version:

rhgs-3.4

CC:

dyuan, hhan, h.moeller, jgao, jthottan, kdhananj, lmen, madam, moagrawa, ndevos, pasik, pkarampu, rgowdapp, rhs-bugs, skoduri, storage-qa-internal, vbellur, xuzhang, yafu, yalzhang, yisun

Target Milestone:

---

Keywords:

ZStream

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Clones:

1775512 (view as bug list)

Environment:

Last Closed:

2019-12-03 07:15:33 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1775512

Bug Blocks:

Attachments:

Description	Flags
about gluster server log and local glusterfs log	none
The info of `qemu-img create` and `info`	none
There are the gfapi log and tcpdump in the attachment	none
tcpdump log and gfapi log of the client	none
update brick log and tcp log for last log file	none

Description gaojianan 2019-01-04 10:08:13 UTC

Created attachment 1518327 [details]
about gluster server log and local glusterfs log

Description of problem:
Get errors"Could not read qcow2 header" when  read  qcow2 file in glusterfs

Version-Release number of selected component (if applicable):
gluster server:glusterfs-3.12.2-19.el7rhgs.x86_64
client:glusterfs-3.12.2-15.4.el8.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Mount the gluster directory to local /mnt
# mount.glusterfs 10.66.4.119:/gv0  /mnt/

2.Create a new qcow2 file 
# qemu-img create -f qcow2 /mnt/qcow2mnt.img 10M

3.check it with qemu-img with gluster
[root@localhost ~]# qemu-img info gluster://10.66.4.119/gv0/qcow2mnt.img
qemu-img: Could not open 'gluster://10.66.4.119/gv0/qcow2mnt.img': Could not read L1 table: Input/output error

Actual results:
As above

Expected results:
Can get the correct info of the qcow2 file.

Additional info:
1."raw" image is ok in this scenario.
2.qemu-img info /mnt/qcow2mnt.img works well

Comment 2 yisun 2019-02-01 10:09:43 UTC

hit the same issue:
(.libvirt-ci-venv-ci-runtest-sBLGCJ) [root@hp-dl380g9-02 virtual_disks]# rpm -qa | grep gluster
libvirt-daemon-driver-storage-gluster-4.5.0-19.module+el8+2712+4c318da1.x86_64
glusterfs-client-xlators-3.12.2-32.1.el8.x86_64
qemu-kvm-block-gluster-2.12.0-59.module+el8+2714+6d9351dd.x86_64
glusterfs-cli-3.12.2-32.1.el8.x86_64
glusterfs-libs-3.12.2-32.1.el8.x86_64
glusterfs-api-3.12.2-32.1.el8.x86_64
glusterfs-3.12.2-32.1.el8.x86_64
glusterfs-fuse-3.12.2-32.1.el8.x86_64

(.libvirt-ci-venv-ci-runtest-sBLGCJ) [root@hp-dl380g9-02 virtual_disks]# qemu-img info gluster://10.66.7.98/gluster-vol1/aaa.qcow2
qemu-img: Could not open 'gluster://10.66.7.98/gluster-vol1/aaa.qcow2': Could not read L1 table: Input/output error

And this will block vm using the gluster disk, so escalate the priority
(.libvirt-ci-venv-ci-runtest-sBLGCJ) [root@hp-dl380g9-02 virtual_disks]# cat gdisk
<disk device="disk" type="network"><driver cache="none" name="qemu" type="qcow2" /><target bus="virtio" dev="vdb" /><source name="gluster-vol1/aaa.qcow2" protocol="gluster"><host name="10.66.7.98" port="24007" /></source></disk>

(.libvirt-ci-venv-ci-runtest-sBLGCJ) [root@hp-dl380g9-02 virtual_disks]# virsh attach-device avocado-vt-vm1 gdisk
error: Failed to attach device from gdisk
error: internal error: unable to execute QEMU command 'device_add': Property 'virtio-blk-device.drive' can't find value 'drive-virtio-disk1'

Comment 3 Amar Tumballi 2019-03-13 13:38:36 UTC

Moving bug to Krutika as she is more experienced in Virt workloads. 

Meantime, looking at glusterfs version, this is RHGS 3.4 builds.

Comment 4 Krutika Dhananjay 2019-03-14 06:49:47 UTC

Could you share the following two pieces of information -

1. output of `gluster volume info $VOLNAME`
2. Are the glusterfs client and server running the same version of gluster/RHGS?

-Krutika

Comment 5 Krutika Dhananjay 2019-03-14 06:53:36 UTC

(In reply to Krutika Dhananjay from comment #4)
> Could you share the following two pieces of information -
> 
> 1. output of `gluster volume info $VOLNAME`
> 2. Are the glusterfs client and server running the same version of
> gluster/RHGS?

Let me clarify why I'm asking about the versions - the bug's "Description" section says this -
gluster server:glusterfs-3.12.2-19.el7rhgs.x86_64
client:glusterfs-3.12.2-15.4.el8.x86_64"

but comment 2 lists the client package as glusterfs-client-xlators-3.12.2-32.1.el8.x86_64

Want to be sure about the exact versions being used so I can recreate it.
(Looked at the logs, not much clue there)

-Krutika

> 
> -Krutika

Comment 6 gaojianan 2019-03-15 01:51:43 UTC

(In reply to Krutika Dhananjay from comment #4)
> Could you share the following two pieces of information -
> 
> 1. output of `gluster volume info $VOLNAME`
> 2. Are the glusterfs client and server running the same version of
> gluster/RHGS?
> 
> -Krutika

1.`gluster volume info $VOLNAME`
[root@node1 ~]# gluster volume info gv1
 
Volume Name: gv1
Type: Distribute
Volume ID: de5d9272-e237-4a4e-8a30-a7c737f393db
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.66.4.119:/br2
Options Reconfigured:
nfs.disable: on
transport.address-family: inet


2.Server version:
[root@node1 ~]# rpm -qa |grep gluster
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64
glusterfs-api-devel-3.12.2-19.el7rhgs.x86_64
pcp-pmda-gluster-4.1.0-4.el7.x86_64
glusterfs-3.12.2-19.el7rhgs.x86_64
python2-gluster-3.12.2-19.el7rhgs.x86_64
glusterfs-server-3.12.2-19.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-19.el7rhgs.x86_64
glusterfs-api-3.12.2-19.el7rhgs.x86_64
glusterfs-devel-3.12.2-19.el7rhgs.x86_64
glusterfs-debuginfo-3.12.2-18.el7.x86_64
glusterfs-libs-3.12.2-19.el7rhgs.x86_64
glusterfs-cli-3.12.2-19.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-19.el7rhgs.x86_64
glusterfs-fuse-3.12.2-19.el7rhgs.x86_64
glusterfs-rdma-3.12.2-19.el7rhgs.x86_64
glusterfs-events-3.12.2-19.el7rhgs.x86_64
samba-vfs-glusterfs-4.8.3-4.el7.x86_64

Client version:
[root@nssguest ~]# rpm -qa |grep gluster
qemu-kvm-block-gluster-3.1.0-18.module+el8+2834+fa8bb6e2.x86_64
glusterfs-3.12.2-32.1.el8.x86_64
glusterfs-client-xlators-3.12.2-32.1.el8.x86_64
libvirt-daemon-driver-storage-gluster-5.0.0-6.virtcov.el8.x86_64
glusterfs-libs-3.12.2-32.1.el8.x86_64
glusterfs-cli-3.12.2-32.1.el8.x86_64
glusterfs-api-3.12.2-32.1.el8.x86_64

Comment 7 Krutika Dhananjay 2019-03-18 05:59:49 UTC

(In reply to gaojianan from comment #6)
> (In reply to Krutika Dhananjay from comment #4)
> > Could you share the following two pieces of information -
> > 
> > 1. output of `gluster volume info $VOLNAME`
> > 2. Are the glusterfs client and server running the same version of
> > gluster/RHGS?
> > 
> > -Krutika
> 
> 1.`gluster volume info $VOLNAME`
> [root@node1 ~]# gluster volume info gv1
>  
> Volume Name: gv1
> Type: Distribute
> Volume ID: de5d9272-e237-4a4e-8a30-a7c737f393db
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: 10.66.4.119:/br2
> Options Reconfigured:
> nfs.disable: on
> transport.address-family: inet
> 
> 
> 2.Server version:
> [root@node1 ~]# rpm -qa |grep gluster
> libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64
> glusterfs-api-devel-3.12.2-19.el7rhgs.x86_64
> pcp-pmda-gluster-4.1.0-4.el7.x86_64
> glusterfs-3.12.2-19.el7rhgs.x86_64
> python2-gluster-3.12.2-19.el7rhgs.x86_64
> glusterfs-server-3.12.2-19.el7rhgs.x86_64
> glusterfs-geo-replication-3.12.2-19.el7rhgs.x86_64
> glusterfs-api-3.12.2-19.el7rhgs.x86_64
> glusterfs-devel-3.12.2-19.el7rhgs.x86_64
> glusterfs-debuginfo-3.12.2-18.el7.x86_64
> glusterfs-libs-3.12.2-19.el7rhgs.x86_64
> glusterfs-cli-3.12.2-19.el7rhgs.x86_64
> glusterfs-client-xlators-3.12.2-19.el7rhgs.x86_64
> glusterfs-fuse-3.12.2-19.el7rhgs.x86_64
> glusterfs-rdma-3.12.2-19.el7rhgs.x86_64
> glusterfs-events-3.12.2-19.el7rhgs.x86_64
> samba-vfs-glusterfs-4.8.3-4.el7.x86_64
> 
> Client version:
> [root@nssguest ~]# rpm -qa |grep gluster
> qemu-kvm-block-gluster-3.1.0-18.module+el8+2834+fa8bb6e2.x86_64
> glusterfs-3.12.2-32.1.el8.x86_64
> glusterfs-client-xlators-3.12.2-32.1.el8.x86_64
> libvirt-daemon-driver-storage-gluster-5.0.0-6.virtcov.el8.x86_64
> glusterfs-libs-3.12.2-32.1.el8.x86_64
> glusterfs-cli-3.12.2-32.1.el8.x86_64
> glusterfs-api-3.12.2-32.1.el8.x86_64

Thanks.

I tried the same set of steps with the same versions of gluster client and server and the test works for me everytime.
Perhaps the ONLY difference between your configuration and mine is that my gluster-client is also on rhel7 unlike yours where you're running rhel8 on the client machine. Also the qemu-img versions could be different.

Are you hitting this issue even with fuse mount, i.e., when you run `qemu-img info` this way - `qemu-img info $FUSE_MOUNT_PATH/aaa.qcow2`?

If yes, could you run both `qemu-img create` and `qemu-img info` commands with strace for a fresh file:

# strace -ff -T -v -o /tmp/qemu-img-create.out qemu-img create -f qcow2 $IMAGE_PATH 10M
# strace -ff -T -v -o /tmp/qemu-img-info.out info $IMAGE_PATH_OVER_FUSE_MOUNT


and share all of the resultant output files having format qemu-img-create.out* and qemu-img-info.out*?

-Krutika

Comment 8 gaojianan 2019-03-18 07:07:51 UTC

Created attachment 1545120 [details]
The info of `qemu-img create` and `info`

Comment 9 gaojianan 2019-03-18 07:10:05 UTC

(In reply to Krutika Dhananjay from comment #7)
> (In reply to gaojianan from comment #6)
> > (In reply to Krutika Dhananjay from comment #4)
> > > Could you share the following two pieces of information -
> > > 
> > > 1. output of `gluster volume info $VOLNAME`
> > > 2. Are the glusterfs client and server running the same version of
> > > gluster/RHGS?
> > > 
> > > -Krutika
> > 
> > 1.`gluster volume info $VOLNAME`
> > [root@node1 ~]# gluster volume info gv1
> >  
> > Volume Name: gv1
> > Type: Distribute
> > Volume ID: de5d9272-e237-4a4e-8a30-a7c737f393db
> > Status: Started
> > Snapshot Count: 0
> > Number of Bricks: 1
> > Transport-type: tcp
> > Bricks:
> > Brick1: 10.66.4.119:/br2
> > Options Reconfigured:
> > nfs.disable: on
> > transport.address-family: inet
> > 
> > 
> > 2.Server version:
> > [root@node1 ~]# rpm -qa |grep gluster
> > libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64
> > glusterfs-api-devel-3.12.2-19.el7rhgs.x86_64
> > pcp-pmda-gluster-4.1.0-4.el7.x86_64
> > glusterfs-3.12.2-19.el7rhgs.x86_64
> > python2-gluster-3.12.2-19.el7rhgs.x86_64
> > glusterfs-server-3.12.2-19.el7rhgs.x86_64
> > glusterfs-geo-replication-3.12.2-19.el7rhgs.x86_64
> > glusterfs-api-3.12.2-19.el7rhgs.x86_64
> > glusterfs-devel-3.12.2-19.el7rhgs.x86_64
> > glusterfs-debuginfo-3.12.2-18.el7.x86_64
> > glusterfs-libs-3.12.2-19.el7rhgs.x86_64
> > glusterfs-cli-3.12.2-19.el7rhgs.x86_64
> > glusterfs-client-xlators-3.12.2-19.el7rhgs.x86_64
> > glusterfs-fuse-3.12.2-19.el7rhgs.x86_64
> > glusterfs-rdma-3.12.2-19.el7rhgs.x86_64
> > glusterfs-events-3.12.2-19.el7rhgs.x86_64
> > samba-vfs-glusterfs-4.8.3-4.el7.x86_64
> > 
> > Client version:
> > [root@nssguest ~]# rpm -qa |grep gluster
> > qemu-kvm-block-gluster-3.1.0-18.module+el8+2834+fa8bb6e2.x86_64
> > glusterfs-3.12.2-32.1.el8.x86_64
> > glusterfs-client-xlators-3.12.2-32.1.el8.x86_64
> > libvirt-daemon-driver-storage-gluster-5.0.0-6.virtcov.el8.x86_64
> > glusterfs-libs-3.12.2-32.1.el8.x86_64
> > glusterfs-cli-3.12.2-32.1.el8.x86_64
> > glusterfs-api-3.12.2-32.1.el8.x86_64
> 
> Thanks.
> 
> I tried the same set of steps with the same versions of gluster client and
> server and the test works for me everytime.
> Perhaps the ONLY difference between your configuration and mine is that my
> gluster-client is also on rhel7 unlike yours where you're running rhel8 on
> the client machine. Also the qemu-img versions could be different.
> 
> Are you hitting this issue even with fuse mount, i.e., when you run
> `qemu-img info` this way - `qemu-img info $FUSE_MOUNT_PATH/aaa.qcow2`?
> 
> If yes, could you run both `qemu-img create` and `qemu-img info` commands
> with strace for a fresh file:
> 
> # strace -ff -T -v -o /tmp/qemu-img-create.out qemu-img create -f qcow2
> $IMAGE_PATH 10M
> # strace -ff -T -v -o /tmp/qemu-img-info.out info $IMAGE_PATH_OVER_FUSE_MOUNT
> 
> 
> and share all of the resultant output files having format
> qemu-img-create.out* and qemu-img-info.out*?
> 
> -Krutika

I think this bug only happens when we create a file on the mounted path and check it with `qemu-img info gluster://$ip/filename` ,and `qemu-img info $FUSE_MOUNT_PATH/filename ` works well.

Comment 10 Krutika Dhananjay 2019-03-20 05:20:47 UTC

OK, I took a look at the traces. Unfortunately in the libgfapi-access case, we need ltrace output instead of strace since all calls are made in the userspace.
I did test ltrace command before sharing it with you just to be sure it works. But i see that the arguments to the library calls are not printed as symbols.

Since you're seeing this issue only with gfapi, I'm passing this issue over to gfapi experts for a faster resolution.

Poornima/Soumya/Jiffin,

Could one of you help?

-Krutika

Comment 11 Soumya Koduri 2019-03-20 17:38:10 UTC

To start with, getting the logs exclusive to gfapi access and tcpdump while the below command is ran would be helpful -

qemu-img info gluster://$ip/filename

Comment 12 Krutika Dhananjay 2019-03-21 05:45:15 UTC

Setting needinfo on the reporter to get the info requested in comment 11.

Comment 13 gaojianan 2019-03-22 06:55:49 UTC

Created attachment 1546788 [details]
There are the gfapi log and tcpdump in the attachment

Comment 14 Yaniv Kaul 2019-04-22 07:19:24 UTC

Status?

Comment 16 Mohit Agrawal 2019-11-19 13:52:39 UTC

@Soumya

Did you get a chance to analyze the logs and tcpdump?

Thanks,
Mohit Agrawal

Comment 17 Soumya Koduri 2019-11-20 18:01:32 UTC

(In reply to Mohit Agrawal from comment #16)
> @Soumya
> 
> Did you get a chance to analyze the logs and tcpdump?
> 
> Thanks,
> Mohit Agrawal

Hi,

I just looked at the files uploaded. The tcpdump doesnt have gluster traffic captured. Please ensure if the command was issued on the right machine (where the command is being executed) and verify the filters (for the right interface and IP etc)

From the logs, I see there is a failure for SEEK() fop -



[2019-03-22 06:47:34.557047] T [MSGID: 0] [dht-hashfn.c:94:dht_hash_compute] 0-gv1-dht: trying regex for test.img
[2019-03-22 06:47:34.557059] D [MSGID: 0] [dht-common.c:3675:dht_lookup] 0-gv1-dht: Calling fresh lookup for /test.img on gv1-client-0
[2019-03-22 06:47:34.557067] T [MSGID: 0] [dht-common.c:3679:dht_lookup] 0-stack-trace: stack-address: 0x55ce03dd1720, winding from gv1-dht to gv1-client-0
[2019-03-22 06:47:34.557079] T [rpc-clnt.c:1496:rpc_clnt_record] 0-gv1-client-0: Auth Info: pid: 10233, uid: 0, gid: 0, owner: 
[2019-03-22 06:47:34.557086] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 420, payload: 348, rpc hdr: 72
[2019-03-22 06:47:34.557110] T [rpc-clnt.c:1699:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0xb Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (gv1-client-0)
[2019-03-22 06:47:34.557513] T [rpc-clnt.c:675:rpc_clnt_reply_init] 0-gv1-client-0: received rpc message (RPC XID: 0xb Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport (gv1-client-0)
[2019-03-22 06:47:34.557536] T [MSGID: 0] [client-rpc-fops.c:2873:client3_3_lookup_cbk] 0-stack-trace: stack-address: 0x55ce03dd1720, gv1-client-0 returned 0
[2019-03-22 06:47:34.557549] D [MSGID: 0] [dht-common.c:3228:dht_lookup_cbk] 0-gv1-dht: fresh_lookup returned for /test.img with op_ret 0


>> LOOKUP on  /test.img was successful



[2019-03-22 06:47:34.563416] T [MSGID: 0] [defaults.c:2927:default_seek] 0-stack-trace: stack-address: 0x55ce03dd1720, winding from gv1-read-ahead to gv1-write-behind
[2019-03-22 06:47:34.563424] T [MSGID: 0] [defaults.c:2927:default_seek] 0-stack-trace: stack-address: 0x55ce03dd1720, winding from gv1-write-behind to gv1-dht
[2019-03-22 06:47:34.563432] T [MSGID: 0] [defaults.c:2927:default_seek] 0-stack-trace: stack-address: 0x55ce03dd1720, winding from gv1-dht to gv1-client-0
[2019-03-22 06:47:34.563443] T [rpc-clnt.c:1496:rpc_clnt_record] 0-gv1-client-0: Auth Info: pid: 10233, uid: 0, gid: 0, owner: 
[2019-03-22 06:47:34.563451] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 112, payload: 40, rpc hdr: 72
[2019-03-22 06:47:34.563478] T [rpc-clnt.c:1699:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0xc Program: GlusterFS 3.3, ProgVers: 330, Proc: 48) to rpc-transport (gv1-client-0)
[2019-03-22 06:47:34.563990] T [rpc-clnt.c:675:rpc_clnt_reply_init] 0-gv1-client-0: received rpc message (RPC XID: 0xc Program: GlusterFS 3.3, ProgVers: 330, Proc: 48) from rpc-transport (gv1-client-0)
[2019-03-22 06:47:34.564008] W [MSGID: 114031] [client-rpc-fops.c:2156:client3_3_seek_cbk] 0-gv1-client-0: remote operation failed [No such device or address]
[2019-03-22 06:47:34.564028] D [MSGID: 0] [client-rpc-fops.c:2160:client3_3_seek_cbk] 0-stack-trace: stack-address: 0x55ce03dd1720, gv1-client-0 returned -1 error: No such device or address [No such device or address]
[2019-03-22 06:47:34.564041] D [MSGID: 0] [defaults.c:1531:default_seek_cbk] 0-stack-trace: stack-address: 0x55ce03dd1720, gv1-io-threads returned -1 error: No such device or address [No such device or address]
[2019-03-22 06:47:34.564051] D [MSGID: 0] [io-stats.c:2548:io_stats_seek_cbk] 0-stack-trace: stack-address: 0x55ce03dd1720, gv1 returned -1 error: No such device or address [No such device or address]

client3_seek_cbk() received '-1'. We may first need to check why the fop was failed by server. If its reproducible, it should be fairly easy to check.

Comment 18 Mohit Agrawal 2019-11-21 02:53:42 UTC

@gaojianan
Can you share the data asked by Soumya and share the brick logs along with data(client-logs and tcpdump)?

Comment 19 gaojianan 2019-11-21 06:55:31 UTC

Created attachment 1638327 [details]
tcpdump log and gfapi log of the client

(In reply to Mohit Agrawal from comment #18)
> @gaojianan
> Can you share the data asked by Soumya and share the brick logs along with
> data(client-logs and tcpdump)?
client version:
glusterfs-client-xlators-6.0-20.el8.x86_64
glusterfs-libs-6.0-20.el8.x86_64
qemu-kvm-block-gluster-4.1.0-13.module+el8.1.0+4313+ef76ec61.x86_64
glusterfs-fuse-6.0-20.el8.x86_64
libvirt-daemon-driver-storage-gluster-5.6.0-7.module+el8.1.1+4483+2f45aaa2.x86_64
glusterfs-api-6.0-20.el8.x86_64
glusterfs-cli-6.0-20.el8.x86_64
glusterfs-6.0-20.el8.x86_64



Try again with the step as comment1.
Steps to Reproduce:
1.Mount the gluster directory to local /tmp/gluster
# mount.glusterfs 10.66.85.243:/jgao-vol1 /tmp/gluster

2.Create a new qcow2 file 
# qemu-img create -f qcow2 /tmp/gluster/test.img 100M

3.check it with qemu-img with gluster
[root@localhost ~]# qemu-img info gluster://10.66.85.243/jgao-vol1/test.img
qemu-img: Could not open 'gluster://10.66.85.243/jgao-vol1/test.img': Could not read L1 table: Input/output error

More detail info in the attachment
If any other question,you can needinfo me again.

Comment 20 Han Han 2019-11-21 07:11:05 UTC

(In reply to gaojianan from comment #19)
> Created attachment 1638327 [details]
> tcpdump log and gfapi log of the client
the tcpdump file contains too much other protocol data. It is better to use
filter to get only glusterfs related network traffic.

BTW, I have a questions, what ports are used in gluserfs by default for gluster-server-6.0.x ?
24007-24009? 49152?

> 
> (In reply to Mohit Agrawal from comment #18)
> > @gaojianan
> > Can you share the data asked by Soumya and share the brick logs along with
> > data(client-logs and tcpdump)?
> client version:
> glusterfs-client-xlators-6.0-20.el8.x86_64
> glusterfs-libs-6.0-20.el8.x86_64
> qemu-kvm-block-gluster-4.1.0-13.module+el8.1.0+4313+ef76ec61.x86_64
> glusterfs-fuse-6.0-20.el8.x86_64
> libvirt-daemon-driver-storage-gluster-5.6.0-7.module+el8.1.1+4483+2f45aaa2.
> x86_64
> glusterfs-api-6.0-20.el8.x86_64
> glusterfs-cli-6.0-20.el8.x86_64
> glusterfs-6.0-20.el8.x86_64
> 
> 
> 
> Try again with the step as comment1.
> Steps to Reproduce:
> 1.Mount the gluster directory to local /tmp/gluster
> # mount.glusterfs 10.66.85.243:/jgao-vol1 /tmp/gluster
> 
> 2.Create a new qcow2 file 
> # qemu-img create -f qcow2 /tmp/gluster/test.img 100M
> 
> 3.check it with qemu-img with gluster
> [root@localhost ~]# qemu-img info gluster://10.66.85.243/jgao-vol1/test.img
> qemu-img: Could not open 'gluster://10.66.85.243/jgao-vol1/test.img': Could
> not read L1 table: Input/output error
> 
> More detail info in the attachment
> If any other question,you can needinfo me again.

Comment 21 Han Han 2019-11-21 07:14:37 UTC

What's more, please update brick logs as comment18 said. That log is located in /var/log/glusterfs/bricks/ on glusterfs server.

Comment 22 gaojianan 2019-11-21 07:57:53 UTC

Created attachment 1638331 [details]
update brick log and tcp log for last log file

In the bricks log,the "gluster-vol1" is the same as "jgao-vol1" in other two files because i destroyed my env and setup again.

Comment 23 Mohit Agrawal 2019-11-22 04:45:08 UTC

@Soumya

Please check the latest logs and tcpdump?

Thanks,
Mohit Agrawal

Comment 24 Soumya Koduri 2019-11-22 06:57:08 UTC

From the latest debug.log provided, I see this error -

[2019-11-21 06:34:15.127610] D [MSGID: 0] [client-helpers.c:427:client_get_remote_fd] 0-jgao-vol1-client-0: not a valid fd for gfid: 59ca8bf2-f75a-427f-857e-98843a85dbac [Bad file descriptor]
[2019-11-21 06:34:15.127620] W [MSGID: 114061] [client-common.c:1288:client_pre_seek] 0-jgao-vol1-client-0:  (59ca8bf2-f75a-427f-857e-98843a85dbac) remote_fd is -1. EBADFD [File descriptor in bad state]
[2019-11-21 06:34:15.127628] D [MSGID: 0] [client-rpc-fops.c:5949:client3_3_seek] 0-stack-trace: stack-address: 0x5625eed41b08, jgao-vol1-client-0 returned -1 error: File descriptor in bad state [File descriptor in bad state]
[2019-11-21 06:34:15.127636] D [MSGID: 0] [defaults.c:1617:default_seek_cbk] 0-stack-trace: stack-address: 0x5625eed41b08, jgao-vol1-io-threads returned -1 error: File descriptor in bad state [File descriptor in bad state]

client3_seek fop got EBADFD error. The fd used in the flag may have got flushed and no more valid. On further code-reading found that there is a bug in glfs_seek() fop. There is a missing ref on glfd which may have led to this issue. I will send patch to fix that.

But however I am unable to reproduce this issue to test it. On my system the test always pass -


[root@dhcp35-198 ~]# qemu-img create -f qcow2 /fuse-mnt/test.img 100M
Formatting '/fuse-mnt/test.img', fmt=qcow2 size=104857600 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
[root@dhcp35-198 ~]# 
[root@dhcp35-198 ~]# 
[root@dhcp35-198 ~]# qemu-img info gluster://localhost/rep_vol/test.img
[2019-11-22 06:36:43.703941] E [MSGID: 108006] [afr-common.c:5322:__afr_handle_child_down_event] 0-rep_vol-replicate-0: All subvolumes are down. Going offline until at least one of them comes back up.
[2019-11-22 06:36:43.705035] I [io-stats.c:4027:fini] 0-rep_vol: io-stats translator unloaded
image: gluster://localhost/rep_vol/test.img
file format: qcow2
virtual size: 100M (104857600 bytes)
disk size: 193K
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
[root@dhcp35-198 ~]# 

I am using latest master branch of gluster. I shall post the fix for the bug in glfs_seek mentioned above. But if someone could test it, that shall be helpful.

Comment 25 Mohit Agrawal 2019-11-22 07:04:57 UTC

We can share the test build if Jianan agrees to test the same.

@gaojianan

Would it be possible for you to test the patch?
Can you please confirm if you are able to reproduce the issue on rhgs 3.5?

Thanks,
Mohit Agrawal

Comment 26 Soumya Koduri 2019-11-22 07:32:02 UTC

https://review.gluster.org/#/c/glusterfs/+/23739/ is the patch posted for fix in glfs_seek

Comment 27 gaojianan 2019-11-22 08:25:45 UTC

(In reply to Mohit Agrawal from comment #25)
> We can share the test build if Jianan agrees to test the same.
> 
> @gaojianan
> 
> Would it be possible for you to test the patch?
> Can you please confirm if you are able to reproduce the issue on rhgs 3.5?
> 
> Thanks,
> Mohit Agrawal

ok,i will try it as soon as possible