Description of problem: ======================= try to rename Directories from multiple client at same time. e.g. from one mount give mv <src> <dest> and from another mv <src> <dest1> (where dest and dest1 is not hashing to same) or simply from one mount point mv <src> <dest> and from another mount point mv <src> <dest> . If they are not done one after another, then due to race after rename is finished :- - same gfid for different Directories (at same level and/or diffrent level) - files inside those directories are not listed on mount and sometime even Directory is not listed (so unable to access data) - sometimes ls output shows question mark in attribute filed. Version-Release number : ========================= 3.6.0.24-1.el6rhs.x86_64 How reproducible: ================= Intermittent Steps to Reproduce: ==================== 1. create and mount distributed volume. (mount on multiple client) 2. create few files and Directories on mount point. [root@OVM5 race]# mkdir src dest [root@OVM5 race]# touch src/f{1..5} [to reproduce race, we are putting breakpoint at dht_rename_hashed_dir_cbk for both rename operation] 3. from one mount point execute 'mv src dest' and from other mpount point execute 'mv dest src' Now continue execution from break point from both mount point. 4. verify data from another mount and bricks also mount:- [root@OVM5 race]# ls -l total 0 drwxr-xr-x 3 root root 69 Jul 10 12:32 src <---only src present [root@OVM5 race]# mkdir dest mkdir: cannot create directory `dest': File exists <-------- [root@OVM5 race]# ls -l <------- dest is not shown total 0 drwxr-xr-x 3 root root 69 Jul 10 12:32 src [root@OVM5 race]# touch src/f1 touch: cannot touch `src/f1': Input/output error [root@OVM5 race]# [root@OVM5 race]# ls -l src <-------f1, f2 and f5 are not listed total 0 drwxr-xr-x 3 root root 38 Jul 10 12:32 dest <--- directory having same name drwxr-xr-x 3 root root 38 Jul 10 12:32 dest -rw-r--r-- 1 root root 0 Jul 10 12:32 f3 -rw-r--r-- 1 root root 0 Jul 10 12:32 f4 another mount point:- [root@OVM5 race]# ls -lR .: total 0 drwxr-xr-x 3 root root 69 Jul 10 12:32 src ./src: total 0 drwxr-xr-x 3 root root 70 Jul 10 12:32 dest drwxr-xr-x 3 root root 70 Jul 10 12:32 dest -rw-r--r-- 1 root root 0 Jul 10 12:32 f3 -rw-r--r-- 1 root root 0 Jul 10 12:32 f4 ./src/dest: ls: cannot access ./src/dest/src: No such file or directory ls: cannot access ./src/dest/src: No such file or directory total 0 ?????????? ? ? ? ? ? src ?????????? ? ? ? ? ? src ./src/dest: ls: cannot access ./src/dest/src: No such file or directory ls: cannot access ./src/dest/src: No such file or directory total 0 ?????????? ? ? ? ? ? src <------------------ ?? in attribute ?????????? ? ? ? ? ? src bricks :- [root@OVM5 race]# tree /brick*/race/ /brick1/race/ ├── dest │ └── src │ ├── f1 │ ├── f2 │ └── f5 └── src └── dest /brick2/race/ └── src ├── dest │ └── src └── f4 /brick3/race/ ├── dest └── src ├── dest │ └── src └── f3 11 directories, 5 files Actual results: =============== - same gfid for different Directories (at same level and/or different level) - files inside those directories are not listed on mount and sometime even Directory is not listed (so unable to access data) - sometimes ls output shows question mark in attribute filed. Expected results: ================= - no two directory should have same gfid - if both renames are fail/successful or one succeed and other fail; regardless of all these - all files inside those Directories should be accessible from mount point
Fixed by https://code.engineering.redhat.com/gerrit/71596
bz 1092510 happens while taking snapshots. Though symptoms are similar, this issue has different RCA than bz 1092510
bz 1118770 is still in post
Two tests were run to verify the fix by having breakpoint (on dht_rename_hashed_dir_cbk) from two mount points 1) 'mv src dst' and mv 'src dst1' --> Passed 2) 'mv src dst' and mv 'dst src' --> Failed Test-1, second operation fails as expected. After test-2, from the mountpoint, only 'dst' was listed. However, 'src' was seen under 'dst' from the backend bricks. This will leave the directories in inconsistent state. Moving the bug back to Assigned.
https://code.engineering.redhat.com/gerrit/#/c/74661/ addresses the issue that caused FailedQA
BZ added to RHEL 6 Errata for RHGS 3.1.3 ... moving to ON_QA
The issue reported in the bug is still seen with build - glusterfs-3.7.9-6 Moving the bug back to assigned. same steps updated in 'steps to reproduce' were used. i.e., with breakpoint on dht_rename_hashed_dir_cbk from two mount points, 'mv src dst' and mv 'dst src' were run. From mountpoint only dst directory was seen, however from backend dst/src was seen. Moving the bug back to Assigned.
The issue is not consistently reproducible as it was seen in earlier builds. However, with the same steps, following issue is seen. steps: - mv abcd dcba - mv dcba abcd - From the mountpoint, only dir 'abcd' is seen. dcba is not seen from the mountpoint [root@dhcp46-9 yamaha]# ll total 20582408 drwxr-xr-x. 3 root root 4096 May 24 13:56 abcd [root@dhcp46-9 yamaha]# ll abcd/ total 0 - However, from the backend bricks following directory structure is seen from all the bricks. [root@dhcp46-103 ~]# ll /bricks/brick0/yamaha/abcd/ total 0 drwxr-xr-x. 2 root root 6 May 24 08:24 dcba - directory structure seems to be broken [root@dhcp47-171 glusterfs]# getfattr -d -m . -e hex /bricks/brick0/yamaha/abcd/ getfattr: Removing leading '/' from absolute path names # file: bricks/brick0/yamaha/abcd/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.gfid=0xfa38d55bedd1400d8f2504d766bd372a trusted.glusterfs.dht=0x00000001000000007ffffffebffffffc [root@dhcp47-171 glusterfs]# getfattr -d -m . -e hex /bricks/brick0/yamaha/abcd/dcba/ getfattr: Removing leading '/' from absolute path names # file: bricks/brick0/yamaha/abcd/dcba/ security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.gfid=0x1f8b735b340447f68f35f3deac835b12 trusted.glusterfs.dht=0x0000000100000000bffffffdffffffff [root@dhcp47-171 glusterfs]# ls -l /bricks/brick0/yamaha/.glusterfs/1f/8b total 0 lrwxrwxrwx. 1 root root 53 May 24 08:26 1f8b735b-3404-47f6-8f35-f3deac835b12 -> ../../fa/38/fa38d55b-edd1-400d-8f25-04d766bd372a/dcba Brick logs shall be uploaded.
On two of the bricks, gfid handle <brick-export/.glusterfs/fa/38/fa38d55b-edd1-400d-8f25-04d766bd372a was not present (ls said ENOENT). One of those bricks was hashed-subvolume of directory /mnt/glusterfs/abcd/dcba. So, stat sent along with dentry "dcba" from hashed-subvol was invalid (zeroed out). This resulted in readdirp not listing the directory and hence rmdir never came on "dcba". When rm crawled up to parent directory "abcd", rmdir failed with ENOTEMPTY as "dcba" was present on all bricks. I assume the reason for absence of gfid handle for fa38d55b-edd1-400d-8f25-04d766bd372a was due to a race in handling gfid handles during rename at brick. I confirm the hypothesis once I figure exact sequence of steps that might result in such a loss of handle.
From one of the bricks on which handle fa38d55b-edd1-400d-8f25-04d766bd372a is missing I could see following error messages: [2016-05-24 08:26:22.422505] E [MSGID: 113071] [posix.c:2414:posix_rename] 0-yamaha-posix: rename of /bricks/brick0/yamaha/abcd to /bricks/brick0/yamaha/abcd/dcba/abcd failed [Invalid argument] [2016-05-24 08:26:22.422670] W [MSGID: 113001] [posix.c:2464:posix_rename] 0-yamaha-posix: modification of parent gfid xattr failed (gfid:fa38d55b-edd1-400d-8f25-04d766bd372a) [2016-05-24 08:26:22.422745] I [MSGID: 115061] [server-rpc-fops.c:1032:server_rename_cbk] 0-yamaha-server: 385: RENAME /abcd (00000000-0000-0000-0000-000000000000/abcd) -> /abcd/dcba/abcd (00000000-0000-0000-0000-000000000000/abcd) ==> (Invalid argument) [Invalid argument] in posix_rename (storage/posix), for rename (src dst) we: 1. unset gfid handle of src. 2. call sys_rename. 3. If failure, unwind As can be seen when sys_rename fails, gfid handle of src is never recreated. The rename logs above give an indication of sys_rename failure with fa38d55b-edd1-400d-8f25-04d766bd372a as src. So, incomplete failure handling in posix_rename is the root cause of this issue. Please note that this issue can happen even on single brick non-dht setup. So, I assume we can file a new bug on this (also this is a failure path, unlike this bug which uncovers inconsistencies even in success path).
A new bug has been raised based on comment#20 https://bugzilla.redhat.com/show_bug.cgi?id=1339501
Moving this bug to verified as the issue in comment#17 turns out to have a different root cause and a different bug has been filed. The actual issue found by this bug is no more seen.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240