+++ This bug was initially created as a clone of Bug #1684385 +++ Description of problem: When gluster bits were upgraded in a hyperconverged ovirt-gluster setup, one node at a time in online mode from 3.12.5 to 5.3, the following log messages were seen - [2019-02-26 16:24:25.126963] E [shard.c:556:shard_modify_size_and_block_count] (-->/usr/lib64/glusterfs/5.3/xlator/cluster/distribute.so(+0x82a45) [0x7ff71d05ea45] -->/usr/lib64/glusterfs/5.3/xlator/features/shard.so(+0x5c77) [0x7ff71cdb4c77] -->/usr/lib64/glusterfs/5.3/xlator/features/shard.so(+0x592e) [0x7ff71cdb492e] ) 0-engine-shard: Failed to get trusted.glusterfs.shard.file-size for 3ad3f0c6-a4e6-4b17-bd29-97c32ecc54d7 Version-Release number of selected component (if applicable): How reproducible: 1/1 Steps to Reproduce: 1. 2. 3. Actual results: Expected results: shard.file.size xattr should always be accessible. Additional info: --- Additional comment from Krutika Dhananjay on 2019-03-01 07:13:48 UTC --- [root@tendrl25 glusterfs]# gluster v info engine Volume Name: engine Type: Replicate Volume ID: bb26f648-2842-4182-940e-6c8ede02195f Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: tendrl27.lab.eng.blr.redhat.com:/gluster_bricks/engine/engine Brick2: tendrl26.lab.eng.blr.redhat.com:/gluster_bricks/engine/engine Brick3: tendrl25.lab.eng.blr.redhat.com:/gluster_bricks/engine/engine Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on cluster.granular-entry-heal: enable --- Additional comment from Krutika Dhananjay on 2019-03-01 07:23:02 UTC --- On further investigation, it was found that the shard xattrs were genuinely missing on all 3 replicas - [root@tendrl27 ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/36ea5b11-19fb-4755-b664-088f6e5c4df2/dom_md/ids getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/36ea5b11-19fb-4755-b664-088f6e5c4df2/dom_md/ids security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-1=0x000000000000000000000000 trusted.afr.engine-client-2=0x000000000000000000000000 trusted.gfid=0x3ad3f0c6a4e64b17bd2997c32ecc54d7 trusted.gfid2path.5f2a4f417210b896=0x64373265323737612d353761642d343136322d613065332d6339346463316231366230322f696473 [root@localhost ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/36ea5b11-19fb-4755-b664-088f6e5c4df2/dom_md/ids getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/36ea5b11-19fb-4755-b664-088f6e5c4df2/dom_md/ids security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-0=0x0000000e0000000000000000 trusted.afr.engine-client-2=0x000000000000000000000000 trusted.gfid=0x3ad3f0c6a4e64b17bd2997c32ecc54d7 trusted.gfid2path.5f2a4f417210b896=0x64373265323737612d353761642d343136322d613065332d6339346463316231366230322f696473 [root@tendrl25 ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/36ea5b11-19fb-4755-b664-088f6e5c4df2/dom_md/ids getfattr: Removing leading '/' from absolute path names # file: gluster_bricks/engine/engine/36ea5b11-19fb-4755-b664-088f6e5c4df2/dom_md/ids security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.engine-client-0=0x000000100000000000000000 trusted.afr.engine-client-1=0x000000000000000000000000 trusted.gfid=0x3ad3f0c6a4e64b17bd2997c32ecc54d7 trusted.gfid2path.5f2a4f417210b896=0x64373265323737612d353761642d343136322d613065332d6339346463316231366230322f696473 Also from the logs, it appears the file underwent metadata self-heal moments before these errors started to appear- [2019-02-26 13:35:37.253896] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-engine-replicate-0: performing metadata selfheal on 3ad3f0c6-a4e6-4b17-bd29-97c32ecc54d7 [2019-02-26 13:35:37.254734] W [MSGID: 101016] [glusterfs3.h:752:dict_to_xdr] 0-dict: key 'trusted.glusterfs.shard.file-size' is not sent on wire [Invalid argument] [2019-02-26 13:35:37.254749] W [MSGID: 101016] [glusterfs3.h:752:dict_to_xdr] 0-dict: key 'trusted.glusterfs.shard.block-size' is not sent on wire [Invalid argument] [2019-02-26 13:35:37.255777] I [MSGID: 108026] [afr-self-heal-common.c:1729:afr_log_selfheal] 0-engine-replicate-0: Completed metadata selfheal on 3ad3f0c6-a4e6-4b17-bd29-97c32ecc54d7. sources=[0] sinks=2 [2019-02-26 13:35:37.258032] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-engine-replicate-0: performing metadata selfheal on 3ad3f0c6-a4e6-4b17-bd29-97c32ecc54d7 [2019-02-26 13:35:37.258792] W [MSGID: 101016] [glusterfs3.h:752:dict_to_xdr] 0-dict: key 'trusted.glusterfs.shard.file-size' is not sent on wire [Invalid argument] [2019-02-26 13:35:37.258807] W [MSGID: 101016] [glusterfs3.h:752:dict_to_xdr] 0-dict: key 'trusted.glusterfs.shard.block-size' is not sent on wire [Invalid argument] [2019-02-26 13:35:37.259633] I [MSGID: 108026] [afr-self-heal-common.c:1729:afr_log_selfheal] 0-engine-replicate-0: Completed metadata selfheal on 3ad3f0c6-a4e6-4b17-bd29-97c32ecc54d7. sources=[0] sinks=2 Metadata heal as we know does three things - 1. bulk getxattr from source brick; 2. removexattr on sink bricks 3. bulk setxattr on the sink bricks But what's clear from these logs is the dict_to_xdr() messages at the time of metadata heal, indicating that the shard xattrs were possibly not "sent on wire" as part of step 3. Turns out due to the newly introduced dict_to_xdr() code in 5.3 which is absent in 3.12.5. The bricks were upgraded to 5.3 in the order tendrl25 followed by tendrl26 with tendrl27 still at 3.12.5 when this issue was hit - Tendrl25: [2019-02-26 12:47:53.595647] I [MSGID: 100030] [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 5.3 (args: /usr/sbin/glusterfsd -s tendrl25.lab.eng.blr.redhat.com --volfile-id engine.tendrl25.lab.eng.blr.redhat.com.gluster_bricks-engine-engine -p /var/run/gluster/vols/engine/tendrl25.lab.eng.blr.redhat.com-gluster_bricks-engine-engine.pid -S /var/run/gluster/aae83600c9a783dd.socket --brick-name /gluster_bricks/engine/engine -l /var/log/glusterfs/bricks/gluster_bricks-engine-engine.log --xlator-option *-posix.glusterd-uuid=9373b871-cfce-41ba-a815-0b330f6975c8 --process-name brick --brick-port 49153 --xlator-option engine-server.listen-port=49153) Tendrl26: [2019-02-26 13:35:05.718052] I [MSGID: 100030] [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 5.3 (args: /usr/sbin/glusterfsd -s tendrl26.lab.eng.blr.redhat.com --volfile-id engine.tendrl26.lab.eng.blr.redhat.com.gluster_bricks-engine-engine -p /var/run/gluster/vols/engine/tendrl26.lab.eng.blr.redhat.com-gluster_bricks-engine-engine.pid -S /var/run/gluster/8010384b5524b493.socket --brick-name /gluster_bricks/engine/engine -l /var/log/glusterfs/bricks/gluster_bricks-engine-engine.log --xlator-option *-posix.glusterd-uuid=18fa886f-8d1a-427c-a5e6-9a4e9502ef7c --process-name brick --brick-port 49153 --xlator-option engine-server.listen-port=49153) Tendrl27: [root@tendrl27 bricks]# rpm -qa | grep gluster glusterfs-fuse-3.12.15-1.el7.x86_64 glusterfs-libs-3.12.15-1.el7.x86_64 glusterfs-3.12.15-1.el7.x86_64 glusterfs-server-3.12.15-1.el7.x86_64 glusterfs-client-xlators-3.12.15-1.el7.x86_64 glusterfs-api-3.12.15-1.el7.x86_64 glusterfs-events-3.12.15-1.el7.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.4.x86_64 glusterfs-gnfs-3.12.15-1.el7.x86_64 glusterfs-geo-replication-3.12.15-1.el7.x86_64 glusterfs-cli-3.12.15-1.el7.x86_64 vdsm-gluster-4.20.46-1.el7.x86_64 python2-gluster-3.12.15-1.el7.x86_64 glusterfs-rdma-3.12.15-1.el7.x86_64 And as per the metadata heal logs, the source was brick0 (corresponding to tendrl27) and sink was brick 2 (corresponding to tendrl 25). This means step 1 of metadata heal did a getxattr on tendrl27 which was still at 3.12.5 and got the dicts with a certain format which didn't have the "value" type (because it's only introduced in 5.3). And this same dict was used for setxattr in step 3 which silently fails to add "trusted.glusterfs.shard.block-size" and "trusted.glusterfs.shard.file-size" xattrs to the setxattr request because of the dict_to_xdr() conversion failure in protocol/client but succeeds the overall operation. So afr thought the heal succeeded although the xattr that needed heal was never sent over the wire. This led to one or more files ending up with shard xattrs removed on-disk failing every other operation on it pretty much. --- Additional comment from Krutika Dhananjay on 2019-03-01 07:29:29 UTC --- So the backward compatibility was broken with the introduction of the following patch - Patch that broke this compatibility - https://review.gluster.org/c/glusterfs/+/19098 commit 303cc2b54797bc5371be742543ccb289010c92f2 Author: Amar Tumballi <amarts> Date: Fri Dec 22 13:12:42 2017 +0530 protocol: make on-wire-change of protocol using new XDR definition. With this patchset, some major things are changed in XDR, mainly: * Naming: Instead of gfs3/gfs4 settle for gfx_ for xdr structures * add iattx as a separate structure, and add conversion methods * the *_rsp structure is now changed, and is also reduced in number (ie, no need for different strucutes if it is similar to other response). * use proper XDR methods for sending dict on wire. Also, with the change of xdr structure, there are changes needed outside of xlator protocol layer to handle these properly. Mainly because the abstraction was broken to support 0-copy RDMA with payload for write and read FOP. This made transport layer know about the xdr payload, hence with the change of xdr payload structure, transport layer needed to know about the change. Updates #384 Change-Id: I1448fbe9deab0a1b06cb8351f2f37488cefe461f Signed-off-by: Amar Tumballi <amarts> Any operation in a heterogeneous cluster which reads xattrs on-disk and subsequently writes it (like metadata heal for instance) will cause one or more on-disk xattrs to disappear. In fact logs suggest even dht on-disk layouts vanished - [2019-02-26 13:35:30.253348] I [MSGID: 109092] [dht-layout.c:744:dht_layout_dir_mismatch] 0-engine-dht: /36ea5b11-19fb-4755-b664-088f6e5c4df2: Disk layout missing, gfid = d0735acd-14ec-4ef9-8f5f-6a3c4ae12c08 --- Additional comment from Worker Ant on 2019-03-05 03:16:15 UTC --- REVIEW: https://review.gluster.org/22300 (dict: handle STR_OLD data type in xdr conversions) posted (#1) for review on master by Amar Tumballi
REVIEW: https://review.gluster.org/22317 (dict: handle STR_OLD data type in xdr conversions) posted (#1) for review on release-6 by Amar Tumballi
REVIEW: https://review.gluster.org/22317 (dict: handle STR_OLD data type in xdr conversions) merged (#2) on release-6 by Shyamsundar Ranganathan
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report. glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html [2] https://www.gluster.org/pipermail/gluster-users/