Bug 1306194
Summary: | NFS+attach tier:IOs hang while attach tier is issued | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> | |
Component: | tier | Assignee: | Mohammed Rafi KC <rkavunga> | |
Status: | CLOSED ERRATA | QA Contact: | krishnaram Karthick <kramdoss> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | rhgs-3.1 | CC: | asrivast, byarlaga, nbalacha, rcyriac, rgowdapp, rhinduja, rhs-bugs, rkavunga, sankarshan, skoduri, smohan, storage-qa-internal | |
Target Milestone: | --- | Keywords: | ZStream | |
Target Release: | RHGS 3.1.3 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | tier-fuse-nfs-samba | |||
Fixed In Version: | glusterfs-3.7.9-4 | Doc Type: | Bug Fix | |
Doc Text: |
When attach tier occurred in parallel with I/O, it was possible for the cached subvolume to change. This meant that if an I/O lock had been set on the cached volume just before the cached subvolume changed, unlock operations were sent to the wrong brick, and the lock on the original brick was never released. The location of the last lock is now recorded so that this issue no longer occurs even if the cached subvolume does change during these simultaneous operations.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1311002 (view as bug list) | Environment: | ||
Last Closed: | 2016-06-23 05:07:24 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1299184, 1306930, 1311002, 1333645, 1347524 |
Description
Nag Pavan Chilakam
2016-02-10 09:45:43 UTC
sosreports of both clients and servers available at [nchilaka@rhsqe-repo nchilaka]$ chmod -R 0777 bug.1306194 [nchilaka@rhsqe-repo nchilaka]$ pwd /home/repo/sosreports/nchilaka There is a blocking lock held on one of the brick, which is not released. All of the other clients are waiting on this lock. We couldn't look into the owner of the lock, because by the time ping timer is expired and lock was released. After that i/o's resumed. We need to look which client acquired the lock and why they are not releasing it. When we tried to reproduce the issue, we see "Stale File Handle" errors after attach-tier. When did RCA using gdb, we found that ESTALE is returned via svc_client (which is enabled by USS). So we have disabled USS and then re-tried the test. Now we see the mount points hang. On the server side, the volume got unexported - [skoduri@skoduri ~]$ showmount -e 10.70.35.225 Export list for 10.70.35.225: [skoduri@skoduri ~]$ Tracing back from the logs and the code, [2016-02-11 13:26:02.540565] E [MSGID: 112070] [nfs3.c:896:nfs3_getattr] 0-nfs-nfsv3: Volume is disabled: finalvol [2016-02-11 13:28:02.600425] E [MSGID: 112070] [nfs3.c:896:nfs3_getattr] 0-nfs-nfsv3: Volume is disabled: finalvol [2016-02-11 13:28:02.600546] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully This msg is logged when that volume is not in the list of nfs->initedxl[] list. This list will be updated as part of "nfs_startup_subvolume()" which is invoked during notify of "GF_EVENT_CHILD_UP". So suspecting that nfs xlator has not received this event which resulted in this volume being in unexported state. Attaching the nfs log for further debugging. During the nfs graph initialization, we do a lookup on the root. Looks like this lookup is blocked on a lock which held by another nfs process. We need to figure it out why the nfs server who acquired the lock failed to unlock it. Rafi reported that stale lock or unlock failures are seen even when first lookup on root is happening. Here is a most likely RCA. I am assuming a "tier-dht" has two dht subvols "hot-dht" and "cold-dht". Also stale lock is found on one of the bricks corresponding to hot-dht. 1. Lookup on / on tier-dht. 2. Lookup is wound to hashed subvol - cold-dht and is successful. 3. tier-dht figures out / is a directory and does a lookup on both hot-dht and cold-dht. 4. on hot-dht, some subvols - say c1, c2 - are down. But lookup is still successful as some other subvols (say c3, c4) are up. 5. lookup on / is successful on cold-dht. 6. tier-dht decides it needs to heal layout of "/". From here I am skipping events on cold-dht as they are irrelevant for this RCA. 7. tier-dht winds inodelk on hot-dht. hot-dht winds it to first subvol in the layout-list (Say c1 in this case). Note that subvols with 0 ranges are stored in the beginning of the list. All the subvols where lookup failed (say because of ENOTCONN) ends up with 0 ranges. The relative order of subvols with 0 ranges is undefined and depends on whose lookup failed first. 8. c1 comes up 9. hot-dht acquires lock on c1. 10. tier-dht tries to refresh its layout of /. Winds lookup on hot and cold dhts again. 11. hot-dht sees that layout's generation number is lagging behind current generation number (as c1 came after lookup on / completed). It issues a fresh lookup and reconstructs the layout for /. Since c2 is still down, it is pushed to the beginning of the subvol list of layout. 12. tier-dht is done with healing. It issues unlock on hot-dht. 13. hot-dht winds unlock call to first subvol in layout of /, which is c2. 14. unlock fails with ENOTCONN and a stale lock is left on c1. steps 7 and 8 can be swapped for more clarity and RCA is still valid Yes, I have included this as part of the bug 1303045 . Workaround testing: I tested the work around by restarting volume using force. While the IOs resumed, which means the workaround is fine, but there is a small problem which has been discussed for which bz#1309186 - file creates fail with " failed to open '<filename>': Too many levels of symbolic links for file create/write when restarting NFS using vol start force has been raised upstream patch : http://review.gluster.org/#/c/13492/ upstream master patch : http://review.gluster.org/#/c/13492/ upstream 3.7 patch : http://review.gluster.org/#/c/14236/ downstream patch : https://code.engineering.redhat.com/gerrit/73806 IO hang during attach tier on NFS mount has not been seen so far during the regression tests. Moving the bug to verified. Looks perfect to me. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240 |