Bug 1333268
Summary: | SMB:while running I/O on cifs mount and doing graph switch causes cifs mount to hang. | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Poornima G <pgurusid> |
Component: | gluster-smb | Assignee: | Poornima G <pgurusid> |
Status: | CLOSED EOL | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.7.0 | CC: | asrivast, bugs, joe, kkeithle, nlevinki, pgurusid, rhinduja, rjoseph, sbhaloth, vdas |
Target Milestone: | --- | Keywords: | Reopened, ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.7.12 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | 1333266 | Environment: | |
Last Closed: | 2017-03-08 10:51:11 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1328411, 1332156, 1333266 | ||
Bug Blocks: |
Description
Poornima G
2016-05-05 07:19:16 UTC
REVIEW: http://review.gluster.org/14223 (gfapi: Fix a deadlock caused by graph switch while aio in progress) posted (#1) for review on release-3.7 by Poornima G (pgurusid) COMMIT: http://review.gluster.org/14223 committed in release-3.7 by Atin Mukherjee (amukherj) ------ commit 3639c84d9a3b7e3e490c0b87964d33422ba922a9 Author: Poornima G <pgurusid> Date: Fri Apr 29 12:24:24 2016 -0400 gfapi: Fix a deadlock caused by graph switch while aio in progress RCA: Currently async nature is achieved by submitting a syncop operation to synctask threads. Consider a scenario where the graph switch is triggered, the next write fop checks for the next available graph and sets fs->migration_in_progess and triggers the migration of fds and other things, which can cause some syncop_lookup operation. While this fop (on synctask thread) is waiting for syncop_lookup to return, lets say there are another 17 write async calls submitted, all these writes are blocked waiting for fs->migration_in_progress to be unset, hence all the 16 synctask threads are blocked waiting for fs->migration_in_progress to be unset. Now the syncop_lookup returns, but there are no synctask threads to process the lookup_cbk. If this syncop_lookup doesn't return, then fs->migration_in_progress can not be unset by the first fop. Thus causing a deadlock. To fix this deadlock, changing all the async APIs to use STACK_WIND, instead of syntask to achieve async nature. glfs_preadv_async is already implemented using STACK_WIND, now changing all the other async APIs also to do the same. This patch as such will not reduce the performance of async IO, the only thing that can affect is that, in case of write, the buf passed by application is copied onto iobuf in the same thread wheras before it was being copied in synctask thread. Since, the syncop + graph switch logic (lock across fops) is not a good candidate for synctask, changing the async APIs to use STACK_WIND Backport of http://review.gluster.org/#/c/14148/ Change-Id: Idf665cae0a8e27697fbfc5ec8d93a6d6bae3a4f1 BUG: 1333268 Signed-off-by: Poornima G <pgurusid> Reviewed-on: http://review.gluster.org/14223 Smoke: Gluster Build System <jenkins.com> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com> Reviewed-by: Raghavendra Talur <rtalur> Reviewed-by: Rajesh Joseph <rjoseph> Reviewed-by: Atin Mukherjee <amukherj> This patch causes a problem in several places where glfs_preadv_async_cbk is specifically called with a NULL iovec. 0 glfs-fops.c glfs_preadv_async_cbk 854 glfs_io_async_cbk (op_ret, op_errno, frame, cookie, iovec, count); 1 glfs-fops.c glfs_pwritev_async_cbk 1168 glfs_io_async_cbk (op_ret, op_errno, frame, cookie, NULL, 0); 2 glfs-fops.c glfs_fsync_async_cbk 1367 glfs_io_async_cbk (op_ret, op_errno, frame, cookie, NULL, 0); 3 glfs-fops.c glfs_ftruncate_async_cbk 1573 glfs_io_async_cbk (op_ret, op_errno, frame, cookie, NULL, 0); 4 glfs-fops.c glfs_discard_async_cbk 2429 glfs_io_async_cbk (op_ret, op_errno, frame, cookie, NULL, 0); 5 glfs-fops.c glfs_zerofill_async_cbk 2514 glfs_io_async_cbk (op_ret, op_errno, frame, cookie, NULL, 0); Meant to continue to say it causes a problem because glfs_preadv_async_cbk does: GF_VALIDATE_OR_GOTO ("gfapi", iovec, inval); Since iovec is NULL, this fails. Thanks for reporting this, glfs_io_async_cbk() should not check for GF_VALIDATE_OR_GOTO ("gfapi", iovec, inval);, as fsync and other fops will send iovec as NULL, But you mentioned read is returning ioec as NULL, which shouldn't happen? I will send the patch to fix the same This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.12, please open a new bug report. glusterfs-3.7.12 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/gluster-devel/2016-June/049918.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release. |