Bug 1566579

Summary: DHT Layout is missing on few bricks of a disperse sub-vol when rm -rf and mkdir are run in parallel
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Prasad Desala <tdesala>
Component: disperseAssignee: Ashish Pandey <aspandey>
Status: CLOSED NOTABUG QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: amukherj, nbalacha, rhs-bugs, sheggodu, storage-qa-internal, tdesala, ubansal
Target Milestone: ---Keywords: ZStream
Target Release: ---Flags: tdesala: needinfo-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-18 11:12:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
getfattr output of foo directory from all bricks none

Description Prasad Desala 2018-04-12 14:51:37 UTC
Description of problem:
=======================
DHT Layout is missing on few bricks of a disperse sub-vol when rm -rf and mkdir are run in parallel from multiple clients.

getfattr -d -e hex -m glusterfs.dht /bricks/brick1/ec-b1/foo
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick1/ec-b1/foo
trusted.glusterfs.dht=0x00000001000000006ffffffc7ffffffa
trusted.glusterfs.dht.mds=0x00000000

getfattr -d -e hex -m glusterfs.dht /bricks/brick1/ec-b1/foo
getfattr: /bricks/brick1/ec-b1/foo: No such file or directory  ---> layout missing on this brick

getfattr -d -e hex -m glusterfs.dht /bricks/brick1/ec-b1/foo
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick1/ec-b1/foo
trusted.glusterfs.dht=0x00000001000000006ffffffc7ffffffa
trusted.glusterfs.dht.mds=0x00000000

getfattr -d -e hex -m glusterfs.dht /bricks/brick1/ec-b1/foo
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick1/ec-b1/foo
trusted.glusterfs.dht=0x00000001000000006ffffffc7ffffffa
trusted.glusterfs.dht.mds=0x00000000

getfattr -d -e hex -m glusterfs.dht /bricks/brick1/ec-b1/foo
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick1/ec-b1/foo
trusted.glusterfs.dht=0x00000001000000006ffffffc7ffffffa
trusted.glusterfs.dht.mds=0x00000000

getfattr -d -e hex -m glusterfs.dht /bricks/brick1/ec-b1/foo
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick1/ec-b1/foo
trusted.glusterfs.dht=0x00000001000000006ffffffc7ffffffa
trusted.glusterfs.dht.mds=0x00000000


Version-Release number of selected component (if applicable):
3.12.2-7.el7rhgs.x86_64

How reproducible:
1/1

Steps to Reproduce:
===================
1) Create a Distributed-Disperse and start it.
2) FUSE mount it on multiple clients.
3) Create a directory structure as below,
mkdir -p foo/bar/goo
4) Run rm -rf * and mkdir 'foo' at same time.
Client-1: rm -rf *
Client-2: mkdir foo
Both above 2 commands should be run at once.
After executing the above commands, start running "mkdir foo" multiple times from the client until mkdir foo succeeds

Actual results:
===============
after some iterations, 
--> Layout is missing on few bricks of disperse sub-vol
--> rm -rf foo is failing with Input/output error
rm: cannot remove ‘foo’: Input/output error

Expected results:
=================
Layout is should be present on all the bricks of disperse sub-vol.

Comment 5 Prasad Desala 2018-04-12 16:53:56 UTC
Created attachment 1420954 [details]
getfattr output of foo directory from all bricks

Comment 12 Ashish Pandey 2018-04-16 11:43:08 UTC
(In reply to Prasad Desala from comment #0)
> Description of problem:
> =======================
> DHT Layout is missing on few bricks of a disperse sub-vol when rm -rf and
> mkdir are run in parallel from multiple clients.
> 
> getfattr -d -e hex -m glusterfs.dht /bricks/brick1/ec-b1/foo
> getfattr: Removing leading '/' from absolute path names
> # file: bricks/brick1/ec-b1/foo
> trusted.glusterfs.dht=0x00000001000000006ffffffc7ffffffa
> trusted.glusterfs.dht.mds=0x00000000
> 
> getfattr -d -e hex -m glusterfs.dht /bricks/brick1/ec-b1/foo
> getfattr: /bricks/brick1/ec-b1/foo: No such file or directory  ---> layout
> missing on this brick

Just a note- 

This dir is also present and have layout.
# file: bricks/brick0/ec1-b1/foo
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.ec.version=0x00000000000001e7000000000000024a
trusted.gfid=0x3749c883f72b4e9ab7ed214729c326ab
trusted.glusterfs.dht=0x00000001000000006ffffffc7ffffffa
trusted.glusterfs.dht.mds=0x00000000

This is the only *brick* which is having different path in this volume.

bricks/brick0/ec1-b1/foo while it should be bricks/brick1/ec-b1/foo

That's the reason it was not coming up while we used brick*/ec-b* to get xattrs






> 
> getfattr -d -e hex -m glusterfs.dht /bricks/brick1/ec-b1/foo
> getfattr: Removing leading '/' from absolute path names
> # file: bricks/brick1/ec-b1/foo
> trusted.glusterfs.dht=0x00000001000000006ffffffc7ffffffa
> trusted.glusterfs.dht.mds=0x00000000
> 
> getfattr -d -e hex -m glusterfs.dht /bricks/brick1/ec-b1/foo
> getfattr: Removing leading '/' from absolute path names
> # file: bricks/brick1/ec-b1/foo
> trusted.glusterfs.dht=0x00000001000000006ffffffc7ffffffa
> trusted.glusterfs.dht.mds=0x00000000
> 
> getfattr -d -e hex -m glusterfs.dht /bricks/brick1/ec-b1/foo
> getfattr: Removing leading '/' from absolute path names
> # file: bricks/brick1/ec-b1/foo
> trusted.glusterfs.dht=0x00000001000000006ffffffc7ffffffa
> trusted.glusterfs.dht.mds=0x00000000
> 
> getfattr -d -e hex -m glusterfs.dht /bricks/brick1/ec-b1/foo
> getfattr: Removing leading '/' from absolute path names
> # file: bricks/brick1/ec-b1/foo
> trusted.glusterfs.dht=0x00000001000000006ffffffc7ffffffa
> trusted.glusterfs.dht.mds=0x00000000
> 
> 
> Version-Release number of selected component (if applicable):
> 3.12.2-7.el7rhgs.x86_64
> 
> How reproducible:
> 1/1
> 
> Steps to Reproduce:
> ===================
> 1) Create a Distributed-Disperse and start it.
> 2) FUSE mount it on multiple clients.
> 3) Create a directory structure as below,
> mkdir -p foo/bar/goo
> 4) Run rm -rf * and mkdir 'foo' at same time.
> Client-1: rm -rf *
> Client-2: mkdir foo
> Both above 2 commands should be run at once.
> After executing the above commands, start running "mkdir foo" multiple times
> from the client until mkdir foo succeeds
> 
> Actual results:
> ===============
> after some iterations, 
> --> Layout is missing on few bricks of disperse sub-vol
> --> rm -rf foo is failing with Input/output error
> rm: cannot remove ‘foo’: Input/output error
> 
> Expected results:
> =================
> Layout is should be present on all the bricks of disperse sub-vol.

Comment 19 Atin Mukherjee 2018-11-09 11:13:39 UTC
Has this been hit during RHGS 3.4 regression testing ? If not, can this be closed please?