Bug 1400093
Summary: | ls and move hung on disperse volume | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> | ||||
Component: | disperse | Assignee: | Pranith Kumar K <pkarampu> | ||||
Status: | CLOSED ERRATA | QA Contact: | Nag Pavan Chilakam <nchilaka> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | rhgs-3.2 | CC: | amukherj, aspandey, rcyriac, rhinduja, rhs-bugs, storage-qa-internal | ||||
Target Milestone: | --- | ||||||
Target Release: | RHGS 3.2.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | glusterfs-3.8.4-9 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1402710 (view as bug list) | Environment: | |||||
Last Closed: | 2017-03-23 05:53:24 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1351528, 1402710, 1404572, 1412922 | ||||||
Attachments: |
|
Description
Nag Pavan Chilakam
2016-11-30 12:49:58 UTC
the client log where mv got hung was as below [2016-11-30 12:18:40.514765] W [MSGID: 122015] [ec-common.c:1774:ec_unlocked] 3-disperse-disperse-0: entry/inode unlocking failed (RENAME) [Invalid argument] [2016-11-30 12:18:40.534720] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 3-disperse-disperse-1: Operation failed on some subvolumes (up=3F, mask=2E, remaining=0, good=2E, bad=11) [2016-11-30 12:18:40.538990] W [MSGID: 122053] [ec-common.c:116:ec_check_status] 3-disperse-disperse-0: Operation failed on some subvolumes (up=3F, mask=2E, remaining=0, good=2E, bad=11) [2016-11-30 12:18:32.135232] W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 3-disperse-disperse-1: Heal failed [Invalid argument] The message "W [MSGID: 122002] [ec-common.c:71:ec_heal_report] 3-disperse-disperse-0: Heal failed [Invalid argument]" repeated 3 times between [2016-11-30 12:18:32.108297] and [2016-11-30 12:18:32.135236] sosreports [qe@rhsqe-repo nchilaka]$ pwd /var/www/html/sosreports/nchilaka [qe@rhsqe-repo nchilaka]$ hostname rhsqe-repo.lab.eng.blr.redhat.com [qe@rhsqe-repo nchilaka]$ cd bug.1400093 [qe@rhsqe-repo bug.1400093]$ ls clients servers [qe@rhsqe-repo bug.1400093]$ This bug is easily reproducible on plain dispersed volume too without killing any brick. [root@apandey gluster]# gvi Volume Name: vol Type: Disperse Volume ID: c3e903e0-e7b5-42a3-9e75-798c4e3268a0 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (4 + 2) = 6 Transport-type: tcp Bricks: Brick1: apandey:/brick/gluster/vol-1 Brick2: apandey:/brick/gluster/vol-2 Brick3: apandey:/brick/gluster/vol-3 Brick4: apandey:/brick/gluster/vol-4 Brick5: apandey:/brick/gluster/vol-5 Brick6: apandey:/brick/gluster/vol-6 Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on 1 - Mount this volume on 3 mount point, c1, c2, and c3 2 - on c1 - mkdir /c1/dir ; cd dir 3 - on c2 touch 4000 files on mount point i.e. "/" 4 - After step 3, start touching next 4000 files on c2 on mount point i.e. "/" 3 - on c1 start touching 10000 files on /dir/. 4 - on c3 start moving 4000 files created on step 3 from mount point to /dir/ 5 - on c3, from different console , start ls in a loop. All the client will be hanged after sometime (within 3-4 min). I took statedump and found following inode lock contention - lock-dump.domain.domain=dht.file.migrate inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 12892, owner=c4582de9117f0000, client=0x7f0c800b0430, connection-id=apandey-29101-2016/12/05-07:33:03:610275-vol-client-0-0-0, granted at 2016-12-05 07:37:05 lock-dump.domain.domain=vol-disperse-0 [conn.0.bound_xl./brick/gluster/vol-1.active.2] gfid=6a7c71ba-9814-41ad-9975-3a7ef4e97d60 nlookup=2528 fd-count=0 ref=9179 ia_type=2 [xlator.features.locks.vol-locks.inode] path=/dir mandatory=0 inodelk-count=2 lock-dump.domain.domain=dht.layout.heal lock-dump.domain.domain=vol-disperse-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 12892, owner=6c20b6d5117f0000, client=0x7f0c800b0430, connection-id=apandey-29101-2016/12/05-07:33:03:610275-vol-client-0-0-0, blocked at 2016-12-05 07:37:05, granted at 2016-12-05 07:37:05 inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 12894, owner=e870234b277f0000, client=0x7f0c800ac5f0, connection-id=apandey-29029-2016/12/05-07:32:51:253616-vol-client-0-0-0, blocked at 2016-12-05 07:37:05 [conn.0.bound_xl./brick/gluster/vol-1.active.3] gfid=00000000-0000-0000-0000-000000000001 nlookup=1 fd-count=1 ref=1 ia_type=2 [xlator.features.locks.vol-locks.inode] path=/ mandatory=0 inodelk-count=3 lock-dump.domain.domain=dht.layout.heal lock-dump.domain.domain=vol-disperse-0:self-heal lock-dump.domain.domain=vol-disperse-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 12885, owner=6c20b6d5117f0000, client=0x7f0c800b0430, connection-id=apandey-29101-2016/12/05-07:33:03:610275-vol-client-0-0-0, blocked at 2016-12-05 07:37:05, granted at 2016-12-05 07:37:05 inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 12890, owner=ac4c67f4257f0000, client=0x7f0c800a87b0, connection-id=apandey-28922-2016/12/05-07:32:35:356027-vol-client-0-0-0, blocked at 2016-12-05 07:37:05 inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 12892, owner=6825b6d5117f0000, client=0x7f0c800b0430, connection-id=apandey-29101-2016/12/05-07:33:03:610275-vol-client-0-0-0, blocked at 2016-12-05 07:37:05 [conn.0.bound_xl./brick/gluster/vol-1.lru.1] gfid=00000000-0000-0000-0000-000000000005 nlookup=6 fd-count=0 ref=0 ia_type=2 On first brick log i see following filed unlock request - [2016-12-05 07:32:58.401408] I [MSGID: 115029] [server-handshake.c:693:server_setvolume] 0-vol-server: accepted client from apandey-29029-2016/12/05-07:32:51:253616-vol-client-0-0-0 (version: 3.8.5) [2016-12-05 07:33:10.774116] I [login.c:76:gf_auth] 0-auth/login: allowed user names: 0b9b19e0-5b6f-4401-8e1e-53fb21a2fd9d [2016-12-05 07:33:10.774138] I [MSGID: 115029] [server-handshake.c:693:server_setvolume] 0-vol-server: accepted client from apandey-29101-2016/12/05-07:33:03:610275-vol-client-0-0-0 (version: 3.8.5) [2016-12-05 07:37:05.212110] E [inodelk.c:404:__inode_unlock_lock] 0-vol-locks: Matching lock not found for unlock 0-9223372036854775807, by 6825b6d5117f0000 on 0x7f0c800b0430 [2016-12-05 07:37:05.212146] E [MSGID: 115053] [server-rpc-fops.c:278:server_inodelk_cbk] 0-vol-server: 45050: INODELK / (00000000-0000-0000-0000-000000000001), client: apandey-29101-2016/12/05-07:33:03:610275-vol-client-0-0-0, error-xlator: vol-locks [Invalid argument] only on C3 logs I found this error - [2016-12-05 07:37:05.212211] E [MSGID: 114031] [client-rpc-fops.c:1549:client3_3_inodelk_cbk] 0-vol-client-1: remote operation failed [Invalid argument] [2016-12-05 07:37:05.212268] E [MSGID: 114031] [client-rpc-fops.c:1549:client3_3_inodelk_cbk] 0-vol-client-2: remote operation failed [Invalid argument] [2016-12-05 07:37:05.212315] E [MSGID: 114031] [client-rpc-fops.c:1549:client3_3_inodelk_cbk] 0-vol-client-3: remote operation failed [Invalid argument] [2016-12-05 07:37:05.212345] E [MSGID: 114031] [client-rpc-fops.c:1549:client3_3_inodelk_cbk] 0-vol-client-0: remote operation failed [Invalid argument] [2016-12-05 07:37:05.212381] E [MSGID: 114031] [client-rpc-fops.c:1549:client3_3_inodelk_cbk] 0-vol-client-5: remote operation failed [Invalid argument] [2016-12-05 07:37:05.212387] E [MSGID: 114031] [client-rpc-fops.c:1549:client3_3_inodelk_cbk] 0-vol-client-4: remote operation failed [Invalid argument] [2016-12-05 07:37:05.212414] W [MSGID: 122015] [ec-common.c:1749:ec_unlocked] 0-vol-disperse-0: entry/inode unlocking failed (RENAME) [Invalid argument] [2016-12-05 07:37:05.222999] I [MSGID: 109066] [dht-rename.c:1562:dht_rename] 0-vol-dht: renaming /a-2501 (hash=vol-disperse-0/cache=vol-disperse-0) => /dir/a-2501 (hash=vol-disperse-0/cache=<nul>) Created attachment 1228478 [details]
possible fix
I found one race which could cause this bug:
i.e. lk-owner changes just before inodelk unlock.
Rename takes two locks 1) On parent directory of the source file 2) On parent directory of the destination file/location. Lk-owner of these two locks is different i.e. the address of the ctx->inode_lock member. After rename, at the time of unlock it sets the lk-owner on the same rename fop frame and unwinds. But it can so happen that just before ec_inodelk is called the owner can be changed to that of the second parent inode's inode_lock. This can cause this problem.
I am a bit confused about the steps I will take your help Ashish tomorrow at 10AM
(In reply to Pranith Kumar K from comment #5) > Created attachment 1228478 [details] > possible fix > > I found one race which could cause this bug: > i.e. lk-owner changes just before inodelk unlock. > Rename takes two locks 1) On parent directory of the source file 2) On > parent directory of the destination file/location. Lk-owner of these two > locks is different i.e. the address of the ctx->inode_lock member. After > rename, at the time of unlock it sets the lk-owner on the same rename fop > frame and unwinds. But it can so happen that just before ec_inodelk is > called the owner can be changed to that of the second parent inode's > inode_lock. This can cause this problem. > > I am a bit confused about the steps I will take your help Ashish tomorrow at > 10AM Ashish found that this patch leads to a crash, I just checked and the fix is a bit more involved than anticipated. verified below on 3.8.4-9 and didn't find any hang hence moving to verified TC#1: There were two bricks down and I started creating files in below fashion c1:under dir1 touch of f{1..10000} c2:under / touch x{1..1000} c3: after c2 action completed, started touch x{1001..10000} I then brought up all the bricks which were down (2 in each of the two subvols) then while c1 and c3 were happening, on c2 i did a move of all the x* into dir1 Behavior:ls and mv didnt hang TC2:no bricks down 1 - Mount this volume on 3 mount point, c1, c2, and c3 2 - on c1 - mkdir /c1/dir ; cd dir 3 - on c2 touch 4000 files on mount point i.e. "/" 4 - After step 3, start touching next 4000 files on c2 on mount point i.e. "/" 3 - on c1 start touching 10000 files on /dir/. 4 - on c3 start moving 4000 files created on step 3 from mount point to /dir/ 5 - on c3, from different console , start ls in a loop. Behavior:ls and mv didnt hang testcases added to polarion: RHG3-11786 - BZ#1400093 -TC#1: simultaneous issue of lookup and write (ls and move) should not hang on disperse volume (All bricks up) RHG3-11787 - BZ#1400093 -TC#2: simultaneous issue of lookup and write (ls and move) should not hang on disperse volume (few bricks down) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |