Bug 1264310
Summary: | DHT: Rebalance hang while migrating the files of disperse volume | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | RajeshReddy <rmekala> | |
Component: | disperse | Assignee: | Ashish Pandey <aspandey> | |
Status: | CLOSED ERRATA | QA Contact: | Prasad Desala <tdesala> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.1 | CC: | amukherj, mzywusko, nbalacha, nchilaka, pkarampu, rcyriac, rhinduja, rhs-bugs, smohan | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.2.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.8.4-1 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1304988 (view as bug list) | Environment: | ||
Last Closed: | 2017-03-23 05:23:56 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1304988, 1322299, 1351522 |
Description
RajeshReddy
2015-09-18 07:15:40 UTC
Logs are available @ following location /home/repo/sosreports/bug.1264310 From the statedump on the bricks, it seems that two clients (rename and rebalance) are trying to acquire inodelk on the same disperse subvol. One of them is granted and the other is blocked which in turn blocks the rebalance process. Here is an extract of the statedump: [xlator.features.locks.e-locks.inode] path=/ mandatory=0 inodelk-count=2 lock-dump.domain.domain=dht.layout.heal lock-dump.domain.domain=e-disperse-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 29918, owner=0c6a3764407f0000, client=0x7f6150001150, connection-id=dhcp42-202.lab.eng.blr.redhat.com-24842-2015/09/21-17:03:29:980708-e-client-0-0-0, granted at 2015-09-21 17:26:16 inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551613, owner=ec320160687f0000, client=0x7f6150081400, connection-id=dhcp42-202.lab.eng.blr.redhat.com-30069-2015/09/21-17:26:24:439084-e-client-0-0-0, blocked at 2015-09-21 17:26:29 A very simple test case to reproduce the issue: 1) Create a disperse volume 2) FUSE mount 3) Create 100 files (touch ec_mnt/file{1..100}) and few other folders 4) Run this script which renames the files in continuous loop: #!/bin/bash echo 'Renaming files' while : do for i in {1..100}; do mv file$i newfile$i; done for i in {1..100}; do mv newfile$i file$i; done done 5) Add few more bricks. 6) Start rebalance on the volume. It will remain hung. 7) Stop the script - rebalance resumes. After discussion with Pranith, these are some observations: 1) Ec takes blocking inodelk during rename. During the rename of a particular file (ec is holding blocking inodelk on the parent directory), if the rename of another file under the same directory comes. EC does not release the lock and goes ahead and renames the "new" file with the "already held lock". 2) Hence a rebalance is not getting hung but rather getting blocked on a lock, which the ec is holding to rename multiple files (without unlocking). 3) As soon as the rename is stopped, lock is released and rebalance continues. Upstream mainline : http://review.gluster.org/13460 Upstream 3.8 : http://review.gluster.org/15061 And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4. Verified this BZ using glusterfs version: 3.8.4-5.el7rhgs.x86_64. Below are the steps that were followed to verify this BZ, 1) Created a EC 2X(4+2) volume and started it. 2) FUSE mounted the volume. 3) Created 100K files on the mount and untarred Linux kernel package. 4) Ran script to rename 100k files, at the same time added 6 bricks and triggered rebalance. Did not see any hang in the rebalance process. Rebalance and rename completed successfully without any issues. Hence, moving this BZ to Verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |