1118762 – DHT : few Files are not accessible and not listed on mount + more than one Directory have same gfid + (sometimes) attributes has ?? in ls output after renaming Directories from multiple client at same time

Bug 1118762 - DHT : few Files are not accessible and not listed on mount + more than one Directory have same gfid + (sometimes) attributes has ?? in ls output after renaming Directories from multiple client at same time

Summary: DHT : few Files are not accessible and not listed on mount + more than one Di...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	distribute
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Raghavendra G
QA Contact:	krishnaram Karthick
Docs Contact:
URL:
Whiteboard:	dht-gfid-dir,dht-file-access
Depends On:	1118770
Blocks:	1240333 1286577 1299184 1336698 1337022 1337394
TreeView+	depends on / blocked

Reported:	2014-07-11 13:20 UTC by Rachana Patel
Modified:	2016-10-13 06:14 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.7.9-6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1336698 (view as bug list)
Environment:
Last Closed:	2016-06-23 04:53:14 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Description Rachana Patel 2014-07-11 13:20:30 UTC

Description of problem:
=======================
try to rename Directories from multiple client at same time. e.g. from one mount give mv <src> <dest> and from another mv <src> <dest1> (where dest and dest1 is not hashing to same) or simply from one mount point mv <src> <dest> and from another mount point mv <src> <dest> .

If they are not done one after another, then due to race after rename is finished :-
- same gfid for different Directories (at same level and/or diffrent level)
- files inside those directories are not listed on mount and sometime even Directory is not listed (so unable to access data)
- sometimes ls output shows question mark in attribute filed.


Version-Release number :
=========================
3.6.0.24-1.el6rhs.x86_64


How reproducible:
=================
Intermittent 


Steps to Reproduce:
====================
1. create and mount distributed volume. (mount on multiple client)
2. create few files and Directories on mount point.
[root@OVM5 race]# mkdir src dest 
[root@OVM5 race]# touch src/f{1..5}

[to reproduce race, we are putting breakpoint at dht_rename_hashed_dir_cbk for both rename operation]

3. from one mount point execute 'mv src dest' and from other mpount point execute 'mv dest src'
Now continue execution from break point from both mount point.

4. verify data from another mount and bricks also

mount:-

[root@OVM5 race]# ls -l 
total 0 
drwxr-xr-x 3 root root 69 Jul 10 12:32 src    <---only src present

[root@OVM5 race]# mkdir dest 
mkdir: cannot create directory `dest': File exists    <--------
[root@OVM5 race]# ls -l                              <------- dest is not shown
total 0 
drwxr-xr-x 3 root root 69 Jul 10 12:32 src 
[root@OVM5 race]# touch src/f1 
touch: cannot touch `src/f1': Input/output error 
[root@OVM5 race]# 
[root@OVM5 race]# ls -l src     <-------f1, f2 and f5 are not listed
total 0 
drwxr-xr-x 3 root root 38 Jul 10 12:32 dest   <--- directory having same name
drwxr-xr-x 3 root root 38 Jul 10 12:32 dest 
-rw-r--r-- 1 root root  0 Jul 10 12:32 f3 
-rw-r--r-- 1 root root  0 Jul 10 12:32 f4 


another mount point:-
[root@OVM5 race]# ls -lR 
.: 
total 0 
drwxr-xr-x 3 root root 69 Jul 10 12:32 src 

./src: 
total 0 
drwxr-xr-x 3 root root 70 Jul 10 12:32 dest 
drwxr-xr-x 3 root root 70 Jul 10 12:32 dest 
-rw-r--r-- 1 root root  0 Jul 10 12:32 f3 
-rw-r--r-- 1 root root  0 Jul 10 12:32 f4 

./src/dest: 
ls: cannot access ./src/dest/src: No such file or directory 
ls: cannot access ./src/dest/src: No such file or directory 
total 0 
?????????? ? ? ? ?            ? src 
?????????? ? ? ? ?            ? src 

./src/dest: 
ls: cannot access ./src/dest/src: No such file or directory 
ls: cannot access ./src/dest/src: No such file or directory 
total 0 
?????????? ? ? ? ?            ? src        <------------------ ?? in attribute
?????????? ? ? ? ?            ? src 

bricks :-

[root@OVM5 race]# tree /brick*/race/ 
/brick1/race/ 
├── dest 
│   └── src 
│       ├── f1 
│       ├── f2 
│       └── f5 
└── src 
    └── dest 
/brick2/race/ 
└── src 
    ├── dest 
    │   └── src 
    └── f4 
/brick3/race/ 
├── dest 
└── src 
    ├── dest 
    │   └── src 
    └── f3 

11 directories, 5 files 


Actual results:
===============
- same gfid for different Directories (at same level and/or different level)
- files inside those directories are not listed on mount and sometime even Directory is not listed (so unable to access data)
- sometimes ls output shows question mark in attribute filed.


Expected results:
=================
- no two directory should have same gfid
- if both renames are fail/successful or one succeed and other fail; regardless of all these - all files inside those Directories should be accessible from mount point

Comment 5 Raghavendra G 2016-04-26 03:53:00 UTC

Fixed by https://code.engineering.redhat.com/gerrit/71596

Comment 6 Raghavendra G 2016-04-26 03:55:22 UTC

bz 1092510 happens while taking snapshots. Though symptoms are similar, this issue has different RCA than bz 1092510

Comment 7 Raghavendra G 2016-04-26 04:03:03 UTC

bz 1118770 is still in post

Comment 11 krishnaram Karthick 2016-05-17 07:03:22 UTC

Two tests were run to verify the fix by having breakpoint (on dht_rename_hashed_dir_cbk) from two mount points

1) 'mv src dst' and mv 'src dst1' --> Passed
2) 'mv src dst' and mv 'dst src' --> Failed

Test-1, second operation fails as expected.

After test-2, from the mountpoint, only 'dst' was listed. However, 'src' was seen under 'dst' from the backend bricks. This will leave the directories in inconsistent state.

Moving the bug back to Assigned.

Comment 13 Raghavendra G 2016-05-19 06:13:16 UTC

https://code.engineering.redhat.com/gerrit/#/c/74661/ addresses the issue that caused FailedQA

Comment 14 Milind Changire 2016-05-23 08:26:21 UTC

BZ added to RHEL 6 Errata for RHGS 3.1.3 ... moving to ON_QA

Comment 16 krishnaram Karthick 2016-05-24 06:48:49 UTC

The issue reported in the bug is still seen with build - glusterfs-3.7.9-6

Moving the bug back to assigned. same steps updated in 'steps to reproduce' were used. i.e., with breakpoint on dht_rename_hashed_dir_cbk from two mount points, 'mv src dst' and mv 'dst src' were run.

From mountpoint only dst directory was seen, however from backend dst/src was seen.

Moving the bug back to Assigned.

Comment 17 krishnaram Karthick 2016-05-24 10:01:52 UTC

The issue is not consistently reproducible as it was seen in earlier builds. However, with the same steps, following issue is seen.

steps:
 - mv abcd dcba
 - mv dcba abcd
 - From the mountpoint, only dir 'abcd' is seen. dcba is not seen from the mountpoint

[root@dhcp46-9 yamaha]# ll
total 20582408
drwxr-xr-x. 3 root root       4096 May 24 13:56 abcd

[root@dhcp46-9 yamaha]# ll abcd/
total 0

 - However, from the backend bricks following directory structure is seen from all the bricks.

[root@dhcp46-103 ~]# ll /bricks/brick0/yamaha/abcd/
total 0
drwxr-xr-x. 2 root root 6 May 24 08:24 dcba

 - directory structure seems to be broken

[root@dhcp47-171 glusterfs]# getfattr -d -m . -e hex /bricks/brick0/yamaha/abcd/
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick0/yamaha/abcd/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.gfid=0xfa38d55bedd1400d8f2504d766bd372a
trusted.glusterfs.dht=0x00000001000000007ffffffebffffffc

[root@dhcp47-171 glusterfs]# getfattr -d -m . -e hex /bricks/brick0/yamaha/abcd/dcba/
getfattr: Removing leading '/' from absolute path names
# file: bricks/brick0/yamaha/abcd/dcba/
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.gfid=0x1f8b735b340447f68f35f3deac835b12
trusted.glusterfs.dht=0x0000000100000000bffffffdffffffff


[root@dhcp47-171 glusterfs]# ls -l /bricks/brick0/yamaha/.glusterfs/1f/8b
total 0
lrwxrwxrwx. 1 root root 53 May 24 08:26 1f8b735b-3404-47f6-8f35-f3deac835b12 -> ../../fa/38/fa38d55b-edd1-400d-8f25-04d766bd372a/dcba

Brick logs shall be uploaded.

Comment 19 Raghavendra G 2016-05-25 04:02:59 UTC

On two of the bricks, gfid handle <brick-export/.glusterfs/fa/38/fa38d55b-edd1-400d-8f25-04d766bd372a was not present (ls said ENOENT). One of those bricks was hashed-subvolume of directory /mnt/glusterfs/abcd/dcba. So, stat sent along with dentry "dcba" from hashed-subvol was invalid (zeroed out). This resulted in readdirp not listing the directory and hence rmdir never came on "dcba". When rm crawled up to parent directory "abcd", rmdir failed with ENOTEMPTY as "dcba" was present on all bricks.

I assume the reason for absence of gfid handle for fa38d55b-edd1-400d-8f25-04d766bd372a was due to a race in handling gfid handles during rename at brick. I confirm the hypothesis once I figure exact sequence of steps that might result in such a loss of handle.

Comment 20 Raghavendra G 2016-05-25 05:23:12 UTC

From one of the bricks on which handle fa38d55b-edd1-400d-8f25-04d766bd372a is missing I could see following error messages:

[2016-05-24 08:26:22.422505] E [MSGID: 113071] [posix.c:2414:posix_rename] 0-yamaha-posix: rename of /bricks/brick0/yamaha/abcd to /bricks/brick0/yamaha/abcd/dcba/abcd failed [Invalid argument]
[2016-05-24 08:26:22.422670] W [MSGID: 113001] [posix.c:2464:posix_rename] 0-yamaha-posix: modification of parent gfid xattr failed (gfid:fa38d55b-edd1-400d-8f25-04d766bd372a)
[2016-05-24 08:26:22.422745] I [MSGID: 115061] [server-rpc-fops.c:1032:server_rename_cbk] 0-yamaha-server: 385: RENAME /abcd (00000000-0000-0000-0000-000000000000/abcd) -> /abcd/dcba/abcd (00000000-0000-0000-0000-000000000000/abcd) ==> (Invalid argument) [Invalid argument]

in posix_rename (storage/posix), for rename (src dst) we:

1. unset gfid handle of src.
2. call sys_rename.
3. If failure, unwind

As can be seen when sys_rename fails, gfid handle of src is never recreated. The rename logs above give an indication of sys_rename failure with fa38d55b-edd1-400d-8f25-04d766bd372a as src. So, incomplete failure handling in posix_rename is the root cause of this issue.

Please note that this issue can happen even on single brick non-dht setup. So, I assume we can file a new bug on this (also this is a failure path, unlike this bug which uncovers inconsistencies even in success path).

Comment 21 krishnaram Karthick 2016-05-25 08:15:27 UTC

A new bug has been raised based on comment#20

https://bugzilla.redhat.com/show_bug.cgi?id=1339501

Comment 22 krishnaram Karthick 2016-05-25 08:19:59 UTC

Moving this bug to verified as the issue in comment#17 turns out to have a different root cause and a different bug has been filed. The actual issue found by this bug is no more seen.

Comment 24 errata-xmlrpc 2016-06-23 04:53:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.