1686364 – [ovirt-gluster] Rolling gluster upgrade from 3.12.5 to 5.3 led to shard on-disk xattrs disappearing

Bug 1686364 - [ovirt-gluster] Rolling gluster upgrade from 3.12.5 to 5.3 led to shard on-disk xattrs disappearing

Summary: [ovirt-gluster] Rolling gluster upgrade from 3.12.5 to 5.3 led to shard on-di...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	6
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1684385
Blocks:	glusterfs-6.0 1732875
TreeView+	depends on / blocked

Reported:	2019-03-07 10:38 UTC by Amar Tumballi
Modified:	2019-07-24 15:04 UTC (History)
CC List:	4 users (show)
Fixed In Version:	glusterfs-6.0
Clone Of:	1684385
Environment:
Last Closed:	2019-03-08 14:09:19 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	22317	0	None	Merged	dict: handle STR_OLD data type in xdr conversions	2019-03-08 14:09:18 UTC

Description Amar Tumballi 2019-03-07 10:38:51 UTC

+++ This bug was initially created as a clone of Bug #1684385 +++

Description of problem:

When gluster bits were upgraded in a hyperconverged ovirt-gluster setup, one node at a time in online mode from 3.12.5 to 5.3, the following log messages were seen -

[2019-02-26 16:24:25.126963] E [shard.c:556:shard_modify_size_and_block_count] (-->/usr/lib64/glusterfs/5.3/xlator/cluster/distribute.so(+0x82a45) [0x7ff71d05ea45] -->/usr/lib64/glusterfs/5.3/xlator/features/shard.so(+0x5c77) [0x7ff71cdb4c77] -->/usr/lib64/glusterfs/5.3/xlator/features/shard.so(+0x592e) [0x7ff71cdb492e] ) 0-engine-shard: Failed to get trusted.glusterfs.shard.file-size for 3ad3f0c6-a4e6-4b17-bd29-97c32ecc54d7


Version-Release number of selected component (if applicable):


How reproducible:
1/1

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:
shard.file.size xattr should always be accessible.

Additional info:

--- Additional comment from Krutika Dhananjay on 2019-03-01 07:13:48 UTC ---

[root@tendrl25 glusterfs]# gluster v info engine
 
Volume Name: engine
Type: Replicate
Volume ID: bb26f648-2842-4182-940e-6c8ede02195f
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: tendrl27.lab.eng.blr.redhat.com:/gluster_bricks/engine/engine
Brick2: tendrl26.lab.eng.blr.redhat.com:/gluster_bricks/engine/engine
Brick3: tendrl25.lab.eng.blr.redhat.com:/gluster_bricks/engine/engine
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
storage.owner-uid: 36
storage.owner-gid: 36
network.ping-timeout: 30
performance.strict-o-direct: on
cluster.granular-entry-heal: enable

--- Additional comment from Krutika Dhananjay on 2019-03-01 07:23:02 UTC ---

On further investigation, it was found that the shard xattrs were genuinely missing on all 3 replicas -

[root@tendrl27 ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/36ea5b11-19fb-4755-b664-088f6e5c4df2/dom_md/ids
getfattr: Removing leading '/' from absolute path names
# file: gluster_bricks/engine/engine/36ea5b11-19fb-4755-b664-088f6e5c4df2/dom_md/ids
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.engine-client-1=0x000000000000000000000000
trusted.afr.engine-client-2=0x000000000000000000000000
trusted.gfid=0x3ad3f0c6a4e64b17bd2997c32ecc54d7
trusted.gfid2path.5f2a4f417210b896=0x64373265323737612d353761642d343136322d613065332d6339346463316231366230322f696473

[root@localhost ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/36ea5b11-19fb-4755-b664-088f6e5c4df2/dom_md/ids
getfattr: Removing leading '/' from absolute path names
# file: gluster_bricks/engine/engine/36ea5b11-19fb-4755-b664-088f6e5c4df2/dom_md/ids
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.engine-client-0=0x0000000e0000000000000000
trusted.afr.engine-client-2=0x000000000000000000000000
trusted.gfid=0x3ad3f0c6a4e64b17bd2997c32ecc54d7
trusted.gfid2path.5f2a4f417210b896=0x64373265323737612d353761642d343136322d613065332d6339346463316231366230322f696473

[root@tendrl25 ~]# getfattr -d -m . -e hex /gluster_bricks/engine/engine/36ea5b11-19fb-4755-b664-088f6e5c4df2/dom_md/ids
getfattr: Removing leading '/' from absolute path names
# file: gluster_bricks/engine/engine/36ea5b11-19fb-4755-b664-088f6e5c4df2/dom_md/ids
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.engine-client-0=0x000000100000000000000000
trusted.afr.engine-client-1=0x000000000000000000000000
trusted.gfid=0x3ad3f0c6a4e64b17bd2997c32ecc54d7
trusted.gfid2path.5f2a4f417210b896=0x64373265323737612d353761642d343136322d613065332d6339346463316231366230322f696473


Also from the logs, it appears the file underwent metadata self-heal moments before these errors started to appear-
[2019-02-26 13:35:37.253896] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-engine-replicate-0: performing metadata selfheal on 3ad3f0c6-a4e6-4b17-bd29-97c32ecc54d7
[2019-02-26 13:35:37.254734] W [MSGID: 101016] [glusterfs3.h:752:dict_to_xdr] 0-dict: key 'trusted.glusterfs.shard.file-size' is not sent on wire [Invalid argument]
[2019-02-26 13:35:37.254749] W [MSGID: 101016] [glusterfs3.h:752:dict_to_xdr] 0-dict: key 'trusted.glusterfs.shard.block-size' is not sent on wire [Invalid argument]
[2019-02-26 13:35:37.255777] I [MSGID: 108026] [afr-self-heal-common.c:1729:afr_log_selfheal] 0-engine-replicate-0: Completed metadata selfheal on 3ad3f0c6-a4e6-4b17-bd29-97c32ecc54d7. sources=[0]  sinks=2
[2019-02-26 13:35:37.258032] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-engine-replicate-0: performing metadata selfheal on 3ad3f0c6-a4e6-4b17-bd29-97c32ecc54d7
[2019-02-26 13:35:37.258792] W [MSGID: 101016] [glusterfs3.h:752:dict_to_xdr] 0-dict: key 'trusted.glusterfs.shard.file-size' is not sent on wire [Invalid argument]
[2019-02-26 13:35:37.258807] W [MSGID: 101016] [glusterfs3.h:752:dict_to_xdr] 0-dict: key 'trusted.glusterfs.shard.block-size' is not sent on wire [Invalid argument]
[2019-02-26 13:35:37.259633] I [MSGID: 108026] [afr-self-heal-common.c:1729:afr_log_selfheal] 0-engine-replicate-0: Completed metadata selfheal on 3ad3f0c6-a4e6-4b17-bd29-97c32ecc54d7. sources=[0]  sinks=2 


Metadata heal as we know does three things - 1. bulk getxattr from source brick; 2. removexattr on sink bricks 3. bulk setxattr on the sink bricks

But what's clear from these logs is the dict_to_xdr() messages at the time of metadata heal, indicating that the shard xattrs were possibly not "sent on wire" as part of step 3.
Turns out due to the newly introduced dict_to_xdr() code in 5.3 which is absent in 3.12.5.

The bricks were upgraded to 5.3 in the order tendrl25 followed by tendrl26 with tendrl27 still at 3.12.5 when this issue was hit -

Tendrl25:
[2019-02-26 12:47:53.595647] I [MSGID: 100030] [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 5.3 (args: /usr/sbin/glusterfsd -s tendrl25.lab.eng.blr.redhat.com --volfile-id engine.tendrl25.lab.eng.blr.redhat.com.gluster_bricks-engine-engine -p /var/run/gluster/vols/engine/tendrl25.lab.eng.blr.redhat.com-gluster_bricks-engine-engine.pid -S /var/run/gluster/aae83600c9a783dd.socket --brick-name /gluster_bricks/engine/engine -l /var/log/glusterfs/bricks/gluster_bricks-engine-engine.log --xlator-option *-posix.glusterd-uuid=9373b871-cfce-41ba-a815-0b330f6975c8 --process-name brick --brick-port 49153 --xlator-option engine-server.listen-port=49153)


Tendrl26:
[2019-02-26 13:35:05.718052] I [MSGID: 100030] [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 5.3 (args: /usr/sbin/glusterfsd -s tendrl26.lab.eng.blr.redhat.com --volfile-id engine.tendrl26.lab.eng.blr.redhat.com.gluster_bricks-engine-engine -p /var/run/gluster/vols/engine/tendrl26.lab.eng.blr.redhat.com-gluster_bricks-engine-engine.pid -S /var/run/gluster/8010384b5524b493.socket --brick-name /gluster_bricks/engine/engine -l /var/log/glusterfs/bricks/gluster_bricks-engine-engine.log --xlator-option *-posix.glusterd-uuid=18fa886f-8d1a-427c-a5e6-9a4e9502ef7c --process-name brick --brick-port 49153 --xlator-option engine-server.listen-port=49153)

Tendrl27:
[root@tendrl27 bricks]# rpm -qa | grep gluster
glusterfs-fuse-3.12.15-1.el7.x86_64
glusterfs-libs-3.12.15-1.el7.x86_64
glusterfs-3.12.15-1.el7.x86_64
glusterfs-server-3.12.15-1.el7.x86_64
glusterfs-client-xlators-3.12.15-1.el7.x86_64
glusterfs-api-3.12.15-1.el7.x86_64
glusterfs-events-3.12.15-1.el7.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.4.x86_64
glusterfs-gnfs-3.12.15-1.el7.x86_64
glusterfs-geo-replication-3.12.15-1.el7.x86_64
glusterfs-cli-3.12.15-1.el7.x86_64
vdsm-gluster-4.20.46-1.el7.x86_64
python2-gluster-3.12.15-1.el7.x86_64
glusterfs-rdma-3.12.15-1.el7.x86_64

And as per the metadata heal logs, the source was brick0 (corresponding to tendrl27) and sink was brick 2 (corresponding to tendrl 25).
This means step 1 of metadata heal did a getxattr on tendrl27 which was still at 3.12.5 and got the dicts with a certain format which didn't have the "value" type (because it's only introduced in 5.3).
And this same dict was used for setxattr in step 3 which silently fails to add "trusted.glusterfs.shard.block-size" and "trusted.glusterfs.shard.file-size" xattrs to the setxattr request because of the dict_to_xdr() conversion failure in protocol/client but succeeds the overall operation. So afr thought the heal succeeded although the xattr that needed heal was never sent over the wire. This led to one or more files ending up with shard xattrs removed on-disk failing every other operation on it pretty much.

--- Additional comment from Krutika Dhananjay on 2019-03-01 07:29:29 UTC ---

So the backward compatibility was broken with the introduction of the following patch -

Patch that broke this compatibility - 

https://review.gluster.org/c/glusterfs/+/19098

commit 303cc2b54797bc5371be742543ccb289010c92f2
Author: Amar Tumballi <amarts>
Date:   Fri Dec 22 13:12:42 2017 +0530

    protocol: make on-wire-change of protocol using new XDR definition.
    
    With this patchset, some major things are changed in XDR, mainly:
    
    * Naming: Instead of gfs3/gfs4 settle for gfx_ for xdr structures
    * add iattx as a separate structure, and add conversion methods
    * the *_rsp structure is now changed, and is also reduced in number
      (ie, no need for different strucutes if it is similar to other response).
    * use proper XDR methods for sending dict on wire.
    
    Also, with the change of xdr structure, there are changes needed
    outside of xlator protocol layer to handle these properly. Mainly
    because the abstraction was broken to support 0-copy RDMA with payload
    for write and read FOP. This made transport layer know about the xdr
    payload, hence with the change of xdr payload structure, transport layer
    needed to know about the change.
    
    Updates #384
    
    Change-Id: I1448fbe9deab0a1b06cb8351f2f37488cefe461f
    Signed-off-by: Amar Tumballi <amarts>


Any operation in a heterogeneous cluster which reads xattrs on-disk and subsequently writes it (like metadata heal for instance) will cause one or more on-disk xattrs to disappear.

In fact logs suggest even dht on-disk layouts vanished -

[2019-02-26 13:35:30.253348] I [MSGID: 109092] [dht-layout.c:744:dht_layout_dir_mismatch] 0-engine-dht: /36ea5b11-19fb-4755-b664-088f6e5c4df2: Disk layout missing, gfid = d0735acd-14ec-4ef9-8f5f-6a3c4ae12c08

--- Additional comment from Worker Ant on 2019-03-05 03:16:15 UTC ---

REVIEW: https://review.gluster.org/22300 (dict: handle STR_OLD data type in xdr conversions) posted (#1) for review on master by Amar Tumballi

Comment 1 Worker Ant 2019-03-07 10:41:44 UTC

REVIEW: https://review.gluster.org/22317 (dict: handle STR_OLD data type in xdr conversions) posted (#1) for review on release-6 by Amar Tumballi

Comment 2 Worker Ant 2019-03-08 14:09:19 UTC

REVIEW: https://review.gluster.org/22317 (dict: handle STR_OLD data type in xdr conversions) merged (#2) on release-6 by Shyamsundar Ranganathan

Comment 3 Shyamsundar 2019-03-25 16:33:26 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.