Description of problem:
While running parallel I/Os involving many files on nfs-ganesha mount, have hit below deadlock in the nfs-ganesha process.
epoll thread: ....glfs_cbk_upcall_data->upcall_syncop_args_init->glfs_h_poll_cache_invalidation->glfs_h_find_handle->priv_glfs_active_subvol->glfs_lock (waiting on lock)
...glfs_h_stat->glfs_resolve_inode->__glfs_resolve_inode (at this point we acquired glfs_lock) -> ...->glfs_refresh_inode_safe->syncop_lookup
I/O thread which acquired glfs_lock are waiting for epoll threads to receive response where as epoll threads are waiting for I/O threads to release lock.
Similar issue was identified earlier (bug1693575).
There could be other issues at different layers depending on how client xlators choose to process these callbacks.
The correct way of avoiding or fixing these issues is to re-design upcall model which is to use different sockets for callback communication instead of using same epoll threads. Raised github issue for that - https://github.com/gluster/glusterfs/issues/697
Since it may take a while, raising this BZ to provide a workaround fix in gfapi layer for now
Version-Release number of selected component (if applicable):
Steps to Reproduce:
REVIEW: https://review.gluster.org/23107 (gfapi: Fix deadlock while processing upcall) posted (#1) for review on release-6 by soumya k
REVIEW: https://review.gluster.org/23108 (gfapi: Fix deadlock while processing upcall) posted (#1) for review on master by soumya k
REVIEW: https://review.gluster.org/23108 (gfapi: Fix deadlock while processing upcall) merged (#5) on master by Amar Tumballi