1733166 – potential deadlock while processing callbacks in gfapi

Bug 1733166 - potential deadlock while processing callbacks in gfapi

Summary: potential deadlock while processing callbacks in gfapi

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	libgfapi
Sub Component:
Version:	mainline
Hardware:	All
OS:	All
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Soumya Koduri
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1733520 1736341 1736342 1736345
TreeView+	depends on / blocked

Reported:	2019-07-25 10:01 UTC by Soumya Koduri
Modified:	2019-08-02 14:13 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Clones:	1733520 1736341 1736342 1736345 (view as bug list)
Environment:
Last Closed:	2019-08-02 14:13:43 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	23107	0	None	Open	gfapi: Fix deadlock while processing upcall	2019-08-02 10:07:37 UTC
Gluster.org Gerrit	23108	0	None	Merged	gfapi: Fix deadlock while processing upcall	2019-08-02 14:13:42 UTC

Description Soumya Koduri 2019-07-25 10:01:57 UTC

Description of problem:

While running parallel I/Os involving many files on nfs-ganesha mount, have hit below deadlock in the nfs-ganesha process.


epoll thread: ....glfs_cbk_upcall_data->upcall_syncop_args_init->glfs_h_poll_cache_invalidation->glfs_h_find_handle->priv_glfs_active_subvol->glfs_lock (waiting on lock)

I/O thread:

...glfs_h_stat->glfs_resolve_inode->__glfs_resolve_inode (at this point we acquired glfs_lock) -> ...->glfs_refresh_inode_safe->syncop_lookup

To summarize-
I/O thread which acquired glfs_lock are waiting for epoll threads to receive response where as epoll threads are waiting for I/O threads to release lock. 

Similar issue was identified earlier (bug1693575).

There could be other issues at different layers depending on how client xlators choose to process these callbacks.

The correct way of avoiding or fixing these issues is to re-design upcall model which is to use different sockets for callback communication  instead of using same epoll threads. Raised github issue for that - https://github.com/gluster/glusterfs/issues/697 

Since it may take a while, raising this BZ to provide a workaround fix in gfapi layer for now 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Worker Ant 2019-07-25 10:09:58 UTC

REVIEW: https://review.gluster.org/23107 (gfapi: Fix deadlock while processing upcall) posted (#1) for review on release-6 by soumya k

Comment 2 Worker Ant 2019-07-25 10:16:57 UTC

REVIEW: https://review.gluster.org/23108 (gfapi: Fix deadlock while processing upcall) posted (#1) for review on master by soumya k

Comment 3 Worker Ant 2019-08-02 14:13:43 UTC

REVIEW: https://review.gluster.org/23108 (gfapi: Fix deadlock while processing upcall) merged (#5) on master by Amar Tumballi

Note You need to log in before you can comment on or make changes to this bug.