Bug 1665145 - Writes on Gluster 5 volumes fail with EIO when "cluster.consistent-metadata" is set
Summary: Writes on Gluster 5 volumes fail with EIO when "cluster.consistent-metadata" ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: 5
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Ravishankar N
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1670253
TreeView+ depends on / blocked
 
Reported: 2019-01-10 15:03 UTC by hansmi
Modified: 2022-03-13 16:43 UTC (History)
4 users (show)

Fixed In Version: glusterfs-5.5
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1670253 (view as bug list)
Environment:
Last Closed: 2019-02-04 14:53:29 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 22139 0 None Merged readdir-ahead: do not zero-out iatt in fop cbk 2019-02-04 14:53:27 UTC

Description hansmi 2019-01-10 15:03:46 UTC
After upgrading a Gluster cluster from version 3.12.15 to 5.2 writes to volumes with "cluster.consistent-metadata" enabled fail with EIO. Reproduction steps follow.

## Package information

Package source: http://mirror.centos.org/centos/7/storage/x86_64/gluster-5/

# gluster --version
glusterfs 5.2

# lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: RedHatEnterpriseServer
Description:    Red Hat Enterprise Linux Server release 7.6 (Maipo)
Release:        7.6
Codename:       Maipo

## Cluster information

Peers:

* storage1
* storage2
* storage3

# gluster peer status
Number of Peers: 2

Hostname: storage2
Uuid: 0acf152f-4cba-4757-bdab-113c34333eeb
State: Peer in Cluster (Connected)

Hostname: storage3
Uuid: c0302a36-149a-44ca-984f-8120a1fa360a
State: Peer in Cluster (Connected)

## Reproduction steps

# On each storage server
mkdir /data/testpv1

# Create volume
gluster volume create testpv1 replica 3 storage1:/data/testpv1/brick storage2:/data/testpv1/brick storage3:/data/testpv1/brick force
gluster volume start testpv1

# Mount volume on a client
mount -t glusterfs -o log-level=DEBUG storage1:/testpv1 /mnt/gluster

# Writes succeed
$ echo foo > /mnt/gluster/a; ls -l /mnt/gluster/a
-rw-r--r--. 1 root root 4 10. Jan 15:53 /mnt/gluster/a

# Enable "cluster.consistent-metadata"
gluster volume set testpv1 cluster.consistent-metadata on

# Writes fail with EIO after truncation
$ echo foo > /mnt/gluster/a; ls -l /mnt/gluster/a
bash: /mnt/gluster/a: Input/output error
-rw-r--r--. 1 root root 0 10. Jan 15:53 /mnt/gluster/a

# Disable "cluster.consistent-metadata" again
gluster volume set testpv1 cluster.consistent-metadata off

# Writes work again
$ echo foo > /mnt/gluster/a; ls -l /mnt/gluster/a
-rw-r--r--. 1 root root 4 10. Jan 15:54 /mnt/gluster/a

# Debug logs from FUSE client

[2019-01-10 14:56:30.725669] D [MSGID: 0] [dht-common.c:1559:dht_revalidate_cbk] 0-testpv1-dht: revalidate lookup of / returned with op_ret 0 [Structure needs cleaning]
[2019-01-10 14:56:30.725747] D [MSGID: 0] [dht-common.c:1648:dht_revalidate_cbk] 0-testpv1-dht: internal xattr trusted.glusterfs.dht.mds is not present on path / gfid is 00000000-0000-0000-0000-000000000001
[2019-01-10 14:56:30.726483] D [write-behind.c:1764:wb_process_queue] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcad4e13f1b] (--> /usr/lib64/glusterfs/5.2/xlator/performance/write-behind.so(+0x9a0b)[0x7fcac6b20a0b] (--> /usr/lib64/glusterfs/5.2/xlator/performance/write-behind.so(+0xd288)[0x7fcac6b24288] (--> /lib64/libglusterfs.so.0(default_lookup+0xdd)[0x7fcad4ea5c1d] (--> /lib64/libglusterfs.so.0(default_lookup+0xdd)[0x7fcad4ea5c1d] ))))) 0-testpv1-write-behind: processing queues
[2019-01-10 14:56:30.726519] D [MSGID: 0] [write-behind.c:1710:__wb_pick_winds] 0-testpv1-write-behind: (unique=24, fop=LOOKUP, gfid=60fcb860-2470-45b4-9060-853979a8bc4d, gen=0): picking the request for winding
[2019-01-10 14:56:30.726572] D [MSGID: 0] [dht-common.c:3374:dht_do_revalidate] 0-testpv1-dht: path = /a. Calling revalidate lookup on testpv1-replicate-0
[2019-01-10 14:56:30.726939] D [write-behind.c:419:__wb_request_unref] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcad4e13f1b] (--> /usr/lib64/glusterfs/5.2/xlator/performance/write-behind.so(+0x26b2)[0x7fcac6b196b2] (--> /usr/lib64/glusterfs/5.2/xlator/performance/write-behind.so(+0x504c)[0x7fcac6b1c04c] (--> /usr/lib64/glusterfs/5.2/xlator/performance/write-behind.so(+0x98c1)[0x7fcac6b208c1] (--> /usr/lib64/glusterfs/5.2/xlator/performance/write-behind.so(+0x99cb)[0x7fcac6b209cb] ))))) 0-testpv1-write-behind: (unique = 24, fop=LOOKUP, gfid=60fcb860-2470-45b4-9060-853979a8bc4d, gen=0): destroying request, removing from all queues
[2019-01-10 14:56:30.727875] D [MSGID: 0] [dht-common.c:1559:dht_revalidate_cbk] 0-testpv1-dht: revalidate lookup of /a returned with op_ret 0
[2019-01-10 14:56:30.728405] D [write-behind.c:1764:wb_process_queue] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcad4e13f1b] (--> /usr/lib64/glusterfs/5.2/xlator/performance/write-behind.so(+0x9a0b)[0x7fcac6b20a0b] (--> /usr/lib64/glusterfs/5.2/xlator/performance/write-behind.so(+0xc268)[0x7fcac6b23268] (--> /usr/lib64/glusterfs/5.2/xlator/performance/read-ahead.so(+0x7331)[0x7fcac690e331] (--> /usr/lib64/glusterfs/5.2/xlator/performance/readdir-ahead.so(+0x44f2)[0x7fcac66fb4f2] ))))) 0-testpv1-write-behind: processing queues
[2019-01-10 14:56:30.728442] D [MSGID: 0] [write-behind.c:1710:__wb_pick_winds] 0-testpv1-write-behind: (unique=27, fop=TRUNCATE, gfid=60fcb860-2470-45b4-9060-853979a8bc4d, gen=0): picking the request for winding
[2019-01-10 14:56:30.728722] D [write-behind.c:419:__wb_request_unref] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcad4e13f1b] (--> /usr/lib64/glusterfs/5.2/xlator/performance/write-behind.so(+0x26b2)[0x7fcac6b196b2] (--> /usr/lib64/glusterfs/5.2/xlator/performance/write-behind.so(+0x504c)[0x7fcac6b1c04c] (--> /usr/lib64/glusterfs/5.2/xlator/performance/write-behind.so(+0x98c1)[0x7fcac6b208c1] (--> /usr/lib64/glusterfs/5.2/xlator/performance/write-behind.so(+0x99cb)[0x7fcac6b209cb] ))))) 0-testpv1-write-behind: (unique = 27, fop=TRUNCATE, gfid=60fcb860-2470-45b4-9060-853979a8bc4d, gen=0): destroying request, removing from all queues
[2019-01-10 14:56:30.729207] D [MSGID: 0] [afr-transaction.c:2070:afr_post_nonblocking_inodelk_cbk] 0-testpv1-replicate-0: Non blocking inodelks done. Proceeding to FOP
[2019-01-10 14:56:30.729245] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.dirty' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.729271] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.testpv1-client-2' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.729286] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.testpv1-client-1' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.729297] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.testpv1-client-0' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.729345] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.dirty' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.729368] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.testpv1-client-2' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.729380] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.testpv1-client-1' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.729396] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.testpv1-client-0' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.729435] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.dirty' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.729450] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.testpv1-client-2' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.729461] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.testpv1-client-1' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.729472] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.testpv1-client-0' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.730173] D [MSGID: 0] [client-rpc-fops_v2.c:1553:client4_0_xattrop_cbk] 0-testpv1-client-0: resetting op_ret to 0 from 0
[2019-01-10 14:56:30.730289] D [MSGID: 0] [client-rpc-fops_v2.c:1553:client4_0_xattrop_cbk] 0-testpv1-client-2: resetting op_ret to 0 from 0
[2019-01-10 14:56:30.730516] D [MSGID: 0] [client-rpc-fops_v2.c:1553:client4_0_xattrop_cbk] 0-testpv1-client-1: resetting op_ret to 0 from 0
[2019-01-10 14:56:30.731446] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.dirty' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.731483] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.testpv1-client-2' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.731499] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.testpv1-client-1' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.731513] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.testpv1-client-0' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.731582] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.dirty' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.731609] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.testpv1-client-2' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.731624] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.testpv1-client-1' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.731637] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.testpv1-client-0' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.731680] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.dirty' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.731710] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.testpv1-client-2' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.731726] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.testpv1-client-1' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.731740] D [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.afr.testpv1-client-0' would not be sent on wire in the future [Invalid argument]
[2019-01-10 14:56:30.732161] D [MSGID: 0] [client-rpc-fops_v2.c:1553:client4_0_xattrop_cbk] 0-testpv1-client-0: resetting op_ret to 0 from 0
[2019-01-10 14:56:30.732480] D [MSGID: 0] [client-rpc-fops_v2.c:1553:client4_0_xattrop_cbk] 0-testpv1-client-2: resetting op_ret to 0 from 0
[2019-01-10 14:56:30.732548] D [MSGID: 0] [client-rpc-fops_v2.c:1553:client4_0_xattrop_cbk] 0-testpv1-client-1: resetting op_ret to 0 from 0
[2019-01-10 14:56:30.735749] D [write-behind.c:1764:wb_process_queue] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcad4e13f1b] (--> /usr/lib64/glusterfs/5.2/xlator/performance/write-behind.so(+0x9a0b)[0x7fcac6b20a0b] (--> /usr/lib64/glusterfs/5.2/xlator/performance/write-behind.so(+0xd288)[0x7fcac6b24288] (--> /lib64/libglusterfs.so.0(default_lookup+0xdd)[0x7fcad4ea5c1d] (--> /lib64/libglusterfs.so.0(default_lookup+0xdd)[0x7fcad4ea5c1d] ))))) 0-testpv1-write-behind: processing queues
[2019-01-10 14:56:30.735787] D [MSGID: 0] [write-behind.c:1710:__wb_pick_winds] 0-testpv1-write-behind: (unique=29, fop=LOOKUP, gfid=60fcb860-2470-45b4-9060-853979a8bc4d, gen=0): picking the request for winding
[2019-01-10 14:56:30.735827] D [MSGID: 0] [dht-common.c:3374:dht_do_revalidate] 0-testpv1-dht: path = /a. Calling revalidate lookup on testpv1-replicate-0
[2019-01-10 14:56:30.736354] D [write-behind.c:419:__wb_request_unref] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fcad4e13f1b] (--> /usr/lib64/glusterfs/5.2/xlator/performance/write-behind.so(+0x26b2)[0x7fcac6b196b2] (--> /usr/lib64/glusterfs/5.2/xlator/performance/write-behind.so(+0x504c)[0x7fcac6b1c04c] (--> /usr/lib64/glusterfs/5.2/xlator/performance/write-behind.so(+0x98c1)[0x7fcac6b208c1] (--> /usr/lib64/glusterfs/5.2/xlator/performance/write-behind.so(+0x99cb)[0x7fcac6b209cb] ))))) 0-testpv1-write-behind: (unique = 29, fop=LOOKUP, gfid=60fcb860-2470-45b4-9060-853979a8bc4d, gen=0): destroying request, removing from all queues
[2019-01-10 14:56:30.738253] D [MSGID: 0] [dht-common.c:1559:dht_revalidate_cbk] 0-testpv1-dht: revalidate lookup of /a returned with op_ret 0
[2019-01-10 14:56:30.738311] D [fuse-bridge.c:53:fuse_invalidate] 0-fuse: Invalidate inode id 140508580080760.
[2019-01-10 14:56:30.738390] E [fuse-bridge.c:219:check_and_dump_fuse_W] 0-glusterfs-fuse: writing to fuse device failed: No such file or directory

Comment 2 Ravishankar N 2019-01-28 07:37:55 UTC
> $ echo foo > /mnt/gluster/a; ls -l /mnt/gluster/a
> bash: /mnt/gluster/a: Input/output error
> -rw-r--r--. 1 root root 0 10. Jan 15:53 /mnt/gluster/a

It is the truncate that is failing. This seems to be a regression introduced by commit c9bde3021202f1d5c5a2d19ac05a510fc1f788ac (performance/readdir-ahead: keep stats of cached dentries in sync with modifications). When you set .consistent-metadata on, AFR zeroes out the ctime (commit 4c4624c9bad2edf27128cb122c64f15d7d63bbc8).  When ctime is zero, readdir-ahead xlator seems to zero out the entire stat buf including the ia_type and ia_gfid.

Let me attempt a fix. As a workaround, if you want to enable consistent-metadata, you can disable 'performance.readdir-ahead'. That should solve the issue interim.

Comment 3 Worker Ant 2019-02-01 08:32:56 UTC
REVIEW: https://review.gluster.org/22139 (readdir-ahead: do not zero-out iatt in fop cbk) posted (#1) for review on release-5 by Ravishankar N

Comment 4 Worker Ant 2019-02-04 14:53:29 UTC
REVIEW: https://review.gluster.org/22139 (readdir-ahead: do not zero-out iatt in fop cbk) merged (#2) on release-5 by Shyamsundar Ranganathan

Comment 5 Shyamsundar 2019-03-27 13:44:02 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.5, please open a new bug report.

glusterfs-5.5 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000119.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.