Bug 1262680 - IO hung on v4 ganesha mount
Summary: IO hung on v4 ganesha mount
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.1.2
Assignee: Ravishankar N
QA Contact: Neha
URL:
Whiteboard:
Depends On:
Blocks: 1260783 1261765 1286582 1338634 1338668 1338669
TreeView+ depends on / blocked
 
Reported: 2015-09-14 05:45 UTC by Bhaskarakiran
Modified: 2016-09-17 12:09 UTC (History)
15 users (show)

Fixed In Version: glusterfs-3.7.5-0.3
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-01 05:35:32 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0193 0 normal SHIPPED_LIVE Red Hat Gluster Storage 3.1 update 2 2016-03-01 10:20:36 UTC

Description Bhaskarakiran 2015-09-14 05:45:06 UTC
Description of problem:
=======================

Tried this on ec volume and dist-rep volume. IO hangs after some time on ganesha v4 mount. The workload which was run is mkdir's (500 in parallel), dd (1000 in parallel) and linux untar.

Version-Release number of selected component (if applicable):
=============================================================
3.7.1-14

How reproducible:
=================
100%


Steps to Reproduce:
1. Create an ec volume (8+4) or dist-rep volume (6x2)
2. configure nfs-ganesha and v4 mount on the client.
3. Run the IO (as in description)

Actual results:
===============
IO hangs


Expected results:
=================
No hangs


Additional info:
===============
sosreports and corefile will be copied to rhsqe-repo

Comment 2 Soumya Koduri 2015-09-14 06:52:09 UTC
(gdb) thread apply all bt

Thread 41 (Thread 0x7f4a02bb0700 (LWP 19695)):
#0  0x00007f4a046b8ab2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000050a792 in fridgethr_freeze ()
#2  0x000000000050b01c in fridgethr_start_routine ()
#3  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 40 (Thread 0x7f49ff389700 (LWP 19696)):
#0  0x00007f4a046b8705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000511107 in delayed_thread ()
#2  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 39 (Thread 0x7f49feb88700 (LWP 19697)):
#0  0x00007f4a046bbec1 in sigwait () from /lib64/libpthread.so.0
#1  0x0000000000441015 in sigmgr_thread ()
#2  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 38 (Thread 0x7f49fe387700 (LWP 19698)):
#0  0x00007f4a046b806e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00000000004dc841 in cache_inode_lock_trust_attrs ()
#2  0x00000000004d8aef in cache_inode_setattr ()
#3  0x000000000047f9f1 in nfs4_op_setattr ()
#4  0x000000000045eab5 in nfs4_Compound ()
#5  0x0000000000453a01 in nfs_rpc_execute ()
#6  0x00000000004545ad in worker_run ()
#7  0x000000000050afeb in fridgethr_start_routine ()
#8  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 37 (Thread 0x7f49fdb86700 (LWP 19699)):
#0  0x00007f4a046b806e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00000000004dc841 in cache_inode_lock_trust_attrs ()
#2  0x00000000004d8aef in cache_inode_setattr ()
#3  0x000000000047f9f1 in nfs4_op_setattr ()
#4  0x000000000045eab5 in nfs4_Compound ()
#5  0x0000000000453a01 in nfs_rpc_execute ()
#6  0x00000000004545ad in worker_run ()
#7  0x000000000050afeb in fridgethr_start_routine ()
#8  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 36 (Thread 0x7f49fd385700 (LWP 19700)):
#0  0x00007f4a046b806e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00000000004dc841 in cache_inode_lock_trust_attrs ()
#2  0x00000000004d8aef in cache_inode_setattr ()
#3  0x000000000047f9f1 in nfs4_op_setattr ()
#4  0x000000000045eab5 in nfs4_Compound ()
#5  0x0000000000453a01 in nfs_rpc_execute ()
#6  0x00000000004545ad in worker_run ()
#7  0x000000000050afeb in fridgethr_start_routine ()
#8  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 35 (Thread 0x7f49fcb84700 (LWP 19701)):
#0  0x00007f4a046b806e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00000000004dc841 in cache_inode_lock_trust_attrs ()
#2  0x00000000004d8aef in cache_inode_setattr ()
#3  0x000000000047f9f1 in nfs4_op_setattr ()
#4  0x000000000045eab5 in nfs4_Compound ()
#5  0x0000000000453a01 in nfs_rpc_execute ()
---Type <return> to continue, or q <return> to quit--- 
#6  0x00000000004545ad in worker_run ()
#7  0x000000000050afeb in fridgethr_start_routine ()
#8  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 34 (Thread 0x7f49effff700 (LWP 19702)):
#0  0x00007f4a046b806e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00000000004dc841 in cache_inode_lock_trust_attrs ()
#2  0x00000000004d8aef in cache_inode_setattr ()
#3  0x000000000047f9f1 in nfs4_op_setattr ()
#4  0x000000000045eab5 in nfs4_Compound ()
#5  0x0000000000453a01 in nfs_rpc_execute ()
#6  0x00000000004545ad in worker_run ()
#7  0x000000000050afeb in fridgethr_start_routine ()
#8  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 33 (Thread 0x7f49ef7fe700 (LWP 19703)):
#0  0x00007f4a046b806e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00000000004dc841 in cache_inode_lock_trust_attrs ()
#2  0x00000000004d8aef in cache_inode_setattr ()
#3  0x000000000047f9f1 in nfs4_op_setattr ()
#4  0x000000000045eab5 in nfs4_Compound ()
#5  0x0000000000453a01 in nfs_rpc_execute ()
#6  0x00000000004545ad in worker_run ()
#7  0x000000000050afeb in fridgethr_start_routine ()
#8  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 32 (Thread 0x7f49eeffd700 (LWP 19704)):
#0  0x00007f4a046b806e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00000000004dc841 in cache_inode_lock_trust_attrs ()
#2  0x00000000004d8aef in cache_inode_setattr ()
#3  0x000000000047f9f1 in nfs4_op_setattr ()
#4  0x000000000045eab5 in nfs4_Compound ()
#5  0x0000000000453a01 in nfs_rpc_execute ()
#6  0x00000000004545ad in worker_run ()
#7  0x000000000050afeb in fridgethr_start_routine ()
#8  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 31 (Thread 0x7f49ee7fc700 (LWP 19705)):
#0  0x00007f4a046b806e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00000000004dc841 in cache_inode_lock_trust_attrs ()
#2  0x00000000004d8aef in cache_inode_setattr ()
#3  0x000000000047f9f1 in nfs4_op_setattr ()
#4  0x000000000045eab5 in nfs4_Compound ()
#5  0x0000000000453a01 in nfs_rpc_execute ()
#6  0x00000000004545ad in worker_run ()
#7  0x000000000050afeb in fridgethr_start_routine ()
#8  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 30 (Thread 0x7f49edffb700 (LWP 19706)):
#0  0x00007f4a046b8705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f49ec175c63 in syncop_setattr () from /lib64/libglusterfs.so.0
#2  0x00007f49ec3e5e27 in glfs_h_setattrs () from /lib64/libgfapi.so.0
#3  0x00007f49fc176349 in setattrs () from /usr/lib64/ganesha/libfsalgluster.so
#4  0x00000000004d8da9 in cache_inode_setattr ()
#5  0x000000000047f9f1 in nfs4_op_setattr ()
#6  0x000000000045eab5 in nfs4_Compound ()
#7  0x0000000000453a01 in nfs_rpc_execute ()
#8  0x00000000004545ad in worker_run ()
---Type <return> to continue, or q <return> to quit---
#9  0x000000000050afeb in fridgethr_start_routine ()
#10 0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 29 (Thread 0x7f49ed7fa700 (LWP 19707)):
#0  0x00007f4a046b806e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00000000004dc841 in cache_inode_lock_trust_attrs ()
#2  0x00000000004d8aef in cache_inode_setattr ()
#3  0x000000000047f9f1 in nfs4_op_setattr ()
#4  0x000000000045eab5 in nfs4_Compound ()
#5  0x0000000000453a01 in nfs_rpc_execute ()
#6  0x00000000004545ad in worker_run ()
#7  0x000000000050afeb in fridgethr_start_routine ()
#8  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 28 (Thread 0x7f49ecff9700 (LWP 19708)):
#0  0x00007f4a046b806e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00000000004dc841 in cache_inode_lock_trust_attrs ()
#2  0x00000000004d8aef in cache_inode_setattr ()
#3  0x000000000047f9f1 in nfs4_op_setattr ()
#4  0x000000000045eab5 in nfs4_Compound ()
#5  0x0000000000453a01 in nfs_rpc_execute ()
#6  0x00000000004545ad in worker_run ()
#7  0x000000000050afeb in fridgethr_start_routine ()
#8  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 27 (Thread 0x7f49dffff700 (LWP 19709)):
#0  0x00007f4a046b806e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00000000004dc841 in cache_inode_lock_trust_attrs ()
#2  0x00000000004d8aef in cache_inode_setattr ()
#3  0x000000000047f9f1 in nfs4_op_setattr ()
#4  0x000000000045eab5 in nfs4_Compound ()
#5  0x0000000000453a01 in nfs_rpc_execute ()
#6  0x00000000004545ad in worker_run ()
#7  0x000000000050afeb in fridgethr_start_routine ()
#8  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 26 (Thread 0x7f49df7fe700 (LWP 19710)):
#0  0x00007f4a046b806e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00000000004dc841 in cache_inode_lock_trust_attrs ()
#2  0x00000000004d8aef in cache_inode_setattr ()
#3  0x000000000047f9f1 in nfs4_op_setattr ()
#4  0x000000000045eab5 in nfs4_Compound ()
#5  0x0000000000453a01 in nfs_rpc_execute ()
#6  0x00000000004545ad in worker_run ()
#7  0x000000000050afeb in fridgethr_start_routine ()
#8  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 25 (Thread 0x7f49deffd700 (LWP 19711)):
#0  0x00007f4a046b806e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00000000004dc841 in cache_inode_lock_trust_attrs ()
#2  0x00000000004d8aef in cache_inode_setattr ()
#3  0x000000000047f9f1 in nfs4_op_setattr ()
#4  0x000000000045eab5 in nfs4_Compound ()
#5  0x0000000000453a01 in nfs_rpc_execute ()
#6  0x00000000004545ad in worker_run ()
#7  0x000000000050afeb in fridgethr_start_routine ()
#8  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f4a041da1ad in clone () from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---

Thread 24 (Thread 0x7f49de7fc700 (LWP 19712)):
#0  0x00007f4a046b806e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00000000004dc841 in cache_inode_lock_trust_attrs ()
#2  0x00000000004d8aef in cache_inode_setattr ()
#3  0x000000000047f9f1 in nfs4_op_setattr ()
#4  0x000000000045eab5 in nfs4_Compound ()
#5  0x0000000000453a01 in nfs_rpc_execute ()
#6  0x00000000004545ad in worker_run ()
#7  0x000000000050afeb in fridgethr_start_routine ()
#8  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 23 (Thread 0x7f49ddffb700 (LWP 19713)):
#0  0x00007f4a046b806e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
#1  0x00000000004dc841 in cache_inode_lock_trust_attrs ()
#2  0x00000000004d8aef in cache_inode_setattr ()
#3  0x000000000047f9f1 in nfs4_op_setattr ()
#4  0x000000000045eab5 in nfs4_Compound ()
#5  0x0000000000453a01 in nfs_rpc_execute ()
#6  0x00000000004545ad in worker_run ()
#7  0x000000000050afeb in fridgethr_start_routine ()
#8  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 22 (Thread 0x7f49dd7fa700 (LWP 19714)):
#0  0x00007f4a041da783 in epoll_wait () from /lib64/libc.so.6
#1  0x000000000054eb19 in svc_rqst_thrd_run_epoll ()
#2  0x000000000054ece9 in svc_rqst_thrd_run ()
#3  0x0000000000440aad in rpc_dispatcher_thread ()
#4  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 21 (Thread 0x7f49dcff9700 (LWP 19715)):
#0  0x00007f4a046bb99d in nanosleep () from /lib64/libpthread.so.0
#1  0x000000000043a8d9 in thread_delay_ms ()
#2  0x0000000000440688 in nfs_rpc_getreq_ng ()
#3  0x000000000054e904 in svc_rqst_handle_event ()
#4  0x000000000054ebda in svc_rqst_thrd_run_epoll ()
#5  0x000000000054ece9 in svc_rqst_thrd_run ()
#6  0x0000000000440aad in rpc_dispatcher_thread ()
#7  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 20 (Thread 0x7f49b7fff700 (LWP 19716)):
#0  0x00007f4a046bb99d in nanosleep () from /lib64/libpthread.so.0
#1  0x000000000043a8d9 in thread_delay_ms ()
#2  0x0000000000440688 in nfs_rpc_getreq_ng ()
#3  0x000000000054e904 in svc_rqst_handle_event ()
#4  0x000000000054ebda in svc_rqst_thrd_run_epoll ()
#5  0x000000000054ece9 in svc_rqst_thrd_run ()
#6  0x0000000000440aad in rpc_dispatcher_thread ()
#7  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 19 (Thread 0x7f49b77fe700 (LWP 19717)):
#0  0x00007f4a041da783 in epoll_wait () from /lib64/libc.so.6
#1  0x000000000054eb19 in svc_rqst_thrd_run_epoll ()
#2  0x000000000054ece9 in svc_rqst_thrd_run ()
#3  0x0000000000440aad in rpc_dispatcher_thread ()
#4  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

---Type <return> to continue, or q <return> to quit---
Thread 18 (Thread 0x7f49b6ffd700 (LWP 19718)):
#0  0x00007f4a046bb99d in nanosleep () from /lib64/libpthread.so.0
#1  0x000000000043a8d9 in thread_delay_ms ()
#2  0x0000000000440688 in nfs_rpc_getreq_ng ()
#3  0x000000000054e904 in svc_rqst_handle_event ()
#4  0x000000000054ebda in svc_rqst_thrd_run_epoll ()
#5  0x000000000054ece9 in svc_rqst_thrd_run ()
#6  0x0000000000440aad in rpc_dispatcher_thread ()
#7  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7f49b67fc700 (LWP 19719)):
#0  0x00007f4a041cfb7d in poll () from /lib64/libc.so.6
#1  0x00007f4a05a9d780 in socket_do_iteration () from /lib64/libdbus-1.so.3
#2  0x00007f4a05a9c5ff in _dbus_transport_do_iteration () from /lib64/libdbus-1.so.3
#3  0x00007f4a05a85d7c in _dbus_connection_do_iteration_unlocked () from /lib64/libdbus-1.so.3
#4  0x00007f4a05a88112 in _dbus_connection_read_write_dispatch () from /lib64/libdbus-1.so.3
#5  0x000000000052cd64 in gsh_dbus_thread ()
#6  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 16 (Thread 0x7f49b5ffb700 (LWP 19720)):
#0  0x00007f4a046b8705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000447b56 in admin_thread ()
#2  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 15 (Thread 0x7f49b57fa700 (LWP 19721)):
#0  0x00007f4a046b8ab2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000050a792 in fridgethr_freeze ()
#2  0x000000000050b01c in fridgethr_start_routine ()
#3  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7f49b4ff9700 (LWP 21706)):
#0  0x00007f4a046b8ab2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000050a792 in fridgethr_freeze ()
#2  0x000000000050b01c in fridgethr_start_routine ()
#3  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f49a2dfe700 (LWP 22331)):
#0  0x00007f4a046b8ab2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f49ec170168 in syncenv_task () from /lib64/libglusterfs.so.0
#2  0x00007f49ec170ee0 in syncenv_processor () from /lib64/libglusterfs.so.0
#3  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f49a25fd700 (LWP 22332)):
#0  0x00007f4a046b8ab2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f49ec170168 in syncenv_task () from /lib64/libglusterfs.so.0
#2  0x00007f49ec170ee0 in syncenv_processor () from /lib64/libglusterfs.so.0
#3  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f49a1771700 (LWP 22333)):
#0  0x00007f4a046bb99d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f49ec14b9e4 in gf_timer_proc () from /lib64/libglusterfs.so.0
#2  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f49a0d6d700 (LWP 22334)):
#0  0x00007f4a046b5f27 in pthread_join () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#1  0x00007f49ec18da08 in event_dispatch_epoll () from /lib64/libglusterfs.so.0
#2  0x00007f49ec3d63c4 in glfs_poller () from /lib64/libgfapi.so.0
#3  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f4995ad8700 (LWP 22335)):
#0  0x00007f4a041da783 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f49ec18d500 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f498cdf3700 (LWP 22363)):
#0  0x00007f4a041da783 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f49ec18d500 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f49821fd280 (LWP 22378)):
#0  0x00007f4a041a148d in nanosleep () from /lib64/libc.so.6
#1  0x00007f4a041a1324 in sleep () from /lib64/libc.so.6
#2  0x00007f49fc177c51 in GLUSTERFSAL_UP_Thread () from /usr/lib64/ganesha/libfsalgluster.so
#3  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f4981ff8700 (LWP 25638)):
#0  0x00007f4a046b8ab2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000050a792 in fridgethr_freeze ()
#2  0x000000000050b01c in fridgethr_start_routine ()
#3  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f4980bdf700 (LWP 28856)):
#0  0x00007f4a046b8705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000050a7b2 in fridgethr_freeze ()
#2  0x000000000050b01c in fridgethr_start_routine ()
#3  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f496bfff700 (LWP 28857)):
#0  0x00007f4a046b8705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000050a7b2 in fridgethr_freeze ()
#2  0x000000000050b01c in fridgethr_start_routine ()
#3  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f496b7fe700 (LWP 28858)):
#0  0x00007f4a046b8705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000050a7b2 in fridgethr_freeze ()
#2  0x000000000050b01c in fridgethr_start_routine ()
#3  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f496affd700 (LWP 28859)):
#0  0x00007f4a046b8705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000050a7b2 in fridgethr_freeze ()
#2  0x000000000050b01c in fridgethr_start_routine ()
#3  0x00007f4a046b4df5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f4a041da1ad in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f4a060d50c0 (LWP 19692)):
#0  0x00007f4a046b5f27 in pthread_join () from /lib64/libpthread.so.0
#1  0x0000000000442d48 in nfs_start ()
#2  0x000000000041e43b in main ()
(gdb) q
A debugging session is active.

	Inferior 1 [process 19692] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/bin/ganesha.nfsd, process 19692
[root@dhcp37-137 ~]# 


Thread 30 is waiting for setattr response from the brick process. 
And all the other worker threads are waiting for this thread to release attr->lock. Hence the hang.

Need to find out why the response has not been sent for setattr. 
I re-started nfs-ganesha process and trying to reproduce the issue so that I can look into brick process while the issue is hit.

Comment 3 Soumya Koduri 2015-09-15 13:44:19 UTC
Bhaskar could reproduce the issue while running heavy I/O on NFS mounts of both disperse and replicated volumes using single NFS-ganesha server VIP.


From the ganesha process gdb,
As mentioned in the above note, one thread is waiting for setattr response and other worker threads are waiting for that thread to release attr->lock.

(gdb) bt
#0  0x00007fc701728705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fc6ff387c63 in syncop_setattr (subvol=subvol@entry=0x7fc093061820, loc=loc@entry=0x7fc6ced048a0, 
    iatt=iatt@entry=0x7fc6ced048e0, valid=48, preop=preop@entry=0x0, postop=postop@entry=0x0, 
    xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1786
#2  0x00007fc6ff5f7e27 in pub_glfs_h_setattrs (fs=0x7fc67c003170, object=<optimized out>, stat=0x7fc6ced049d8, 
    valid=<optimized out>) at glfs-handleops.c:442
#3  0x00007fc6ffa1332b in setattrs () from /usr/lib64/ganesha/libfsalgluster.so
#4  0x00000000004d8da9 in cache_inode_setattr ()
#5  0x000000000047f9f1 in nfs4_op_setattr ()
#6  0x000000000045eab5 in nfs4_Compound ()
#7  0x0000000000453a01 in nfs_rpc_execute ()
#8  0x00000000004545ad in worker_run ()
#9  0x000000000050afeb in fridgethr_start_routine ()
#10 0x00007fc701724df5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007fc70124a1ad in clone () from /lib64/libc.so.6
(gdb) f 1
#1  0x00007fc6ff387c63 in syncop_setattr (subvol=subvol@entry=0x7fc093061820, loc=loc@entry=0x7fc6ced048a0, 
    iatt=iatt@entry=0x7fc6ced048e0, valid=48, preop=preop@entry=0x0, postop=postop@entry=0x0, 
    xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1786
1786	        SYNCOP (subvol, (&args), syncop_setattr_cbk, subvol->fops->setattr,
(gdb) p loc
$5 = (loc_t *) 0x7fc6ced048a0
(gdb) p *$
$6 = {path = 0x7fc6b0dcd600 "/linux/linux-4.1.1/drivers/gpu/drm/nouveau/nvkm/subdev/bus", name = 0x0, 
  inode = 0x7fc31e48e6e4, parent = 0x0, 
  gfid = "\353\262\024\264\300\371O\300\270ݰ\332\005\234", <incomplete sequence \365>, 
  pargfid = '\000' <repeats 15 times>}
(gdb) 
(gdb) p loc
$7 = (loc_t *) 0x7fc6ced048a0
(gdb) p *loc
$8 = {path = 0x7fc6b0dcd600 "/linux/linux-4.1.1/drivers/gpu/drm/nouveau/nvkm/subdev/bus", name = 0x0, 
  inode = 0x7fc31e48e6e4, parent = 0x0, 
  gfid = "\353\262\024\264\300\371O\300\270ݰ\332\005\234", <incomplete sequence \365>, 
  pargfid = '\000' <repeats 15 times>}
(gdb) call uuid_utoa("\353\262\024\264\300\371O\300\270ݰ\332\005\234")
$9 = 0x7fc6b0002ff0 "ebb214b4-c0f9-4fc0-b8dd-b0da059c0000"
(gdb) 

Thanks to Kritika, Du and KP.

In the bricks statedump of the dist-rep volume (rhs-brick3-rep5.9073.dump.1442306886),

[conn.1.bound_xl./rhs/brick3/rep5.active.84]
gfid=ebb214b4-c0f9-4fc0-b8dd-b0da059c3cf5
nlookup=2
fd-count=0
ref=3
ia_type=2


[xlator.features.locks.dist-rep-locks.inode]
path=/linux/linux-4.1.1/drivers/gpu/drm/nouveau/nvkm/subdev/bus
mandatory=0
inodelk-count=2
lock-dump.domain.domain=dist-rep-replicate-2
lock-dump.domain.domain=dist-rep-replicate-2:metadata
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=9223372036854775806, len=0, pid = 54535, owner=8c33147cc67f0000, client=0x7fac680ffa50, connection-id=interstellar.lab.eng.blr.redhat.com-54535-2015/09/15-07:16:55:407586-dist-rep-client-4-0-0, granted at 2015-09-15 07:44:15
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=9223372036854775806, len=0, pid = 54535, owner=8c8b227cc67f0000, client=0x7fac680ffa50, connection-id=interstellar.lab.eng.blr.redhat.com-54535-2015/09/15-07:16:55:407586-dist-rep-client-4-0-0, blocked at 2015-09-15 07:44:15
lock-dump.domain.domain=dht.layout.heal
lock-dump.domain.domain=dist-rep-marker


There is one active lock and one blocked lock waiting on the inode of this file.

In the client log (ganesha-gfapi.log),


[2015-09-15 07:43:38.050381] E [MSGID: 108010] [afr-lk-common.c:665:afr_unlock_inodelk_cbk] 0-dist-rep-replicate-2: path=/linux/linux-4.1.1/drivers/gpu/drm/nouveau/nvkm/subdev/bus gfid=ebb214b4-c0f9-4fc0-b8dd-b0da059c3cf5: unlock failed on subvolume dist-rep-client-4 with lock owner 8c33147cc67f0000 [Stale file handle]


The unlock of the first lock failed with Stale File Handle error. And since the first lock was not unlocked, second lock has been in BLOCKED state in turn blocking the application thread which resulted in hang.

For now moving the bug component to AFR till we investigate why the unlock got ESTALE error.

Comment 4 Soumya Koduri 2015-09-16 05:44:17 UTC
I see the hang from fuse_mnt as well for the same reason, 

[root@rhel-6 ~]# ls -ld /mnt/linux/linux-4.1.1/drivers/gpu/drm/nouveau/nvkm/subdev/bus
drwx------. 2 root root 209 Sep 15 13:14 /mnt/linux/linux-4.1.1/drivers/gpu/drm/nouveau/nvkm/subdev/bus
[root@rhel-6 ~]# chmod 755 /mnt/linux/linux-4.1.1/drivers/gpu/drm/nouveau/nvkm/subdev/bus



In the statedump,

[conn.1.bound_xl./rhs/brick3/rep5.active.84]
gfid=ebb214b4-c0f9-4fc0-b8dd-b0da059c3cf5
nlookup=5
fd-count=0
ref=5
ia_type=2

[xlator.features.locks.dist-rep-locks.inode]
path=/linux/linux-4.1.1/drivers/gpu/drm/nouveau/nvkm/subdev/bus
mandatory=0
inodelk-count=3
lock-dump.domain.domain=dist-rep-replicate-2
lock-dump.domain.domain=dist-rep-replicate-2:metadata
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=9223372036854775806, len=0, pid = 54535, owner=8c33147cc67f0000, client=0x7fac680ffa50, connection-id=interstellar.lab.eng.blr.redhat.com-54535-2015/09/15-07:16:55:407586-dist-rep-client-4-0-0, granted at 2015-09-15 07:44:15
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=9223372036854775806, len=0, pid = 54535, owner=8c8b227cc67f0000, client=0x7fac680ffa50, connection-id=interstellar.lab.eng.blr.redhat.com-54535-2015/09/15-07:16:55:407586-dist-rep-client-4-0-0, blocked at 2015-09-15 07:44:15
inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=9223372036854775806, len=0, pid = 16020, owner=6ca073641f7f0000, client=0x7fac681de9c0, connection-id=rhel-6.5-16002-2015/09/16-05:37:08:166370-dist-rep-client-4-0-0, blocked at 2015-09-16 05:36:04
lock-dump.domain.domain=dht.layout.heal
lock-dump.domain.domain=dist-rep-marker

Comment 9 Neha 2015-11-24 06:32:55 UTC
Do not see IO hung after trying multiple times so moving it to verified.

But still seen 'Remote IO error' and 'Stale file handle' errors. Will file a new BZ for that.

Comment 10 Soumya Koduri 2015-12-15 07:44:27 UTC
Looks like the fix mentioned above hasn't got merged to downstream. We have hit this issue again on Saurabh's setup.

Ravi, Kindly check and do the needful. Thanks!

Comment 11 Ravishankar N 2015-12-15 08:50:18 UTC
Soumya, you are right, I'm surprised that the patch in release-3.7 branch hasn't made it downstream as a part of the  3.7.5 rebase, despite what the tag seems to be saying:
------------
0:ravi@tuxpad glusterfs$ git describe 1a1b00fcd0ec199d19652d8fceb9465cc4edf189
v3.7.5-21-g1a1b00f
------------

I should have double checked in the code.I'll revive the downstream patch in comment #7 and merge it.

Comment 12 Ravishankar N 2015-12-15 09:07:18 UTC
I have merged https://code.engineering.redhat.com/gerrit/#/c/63798/ into the RHGS 3.1.2 branch.

Comment 13 Soumya Koduri 2015-12-15 09:52:42 UTC
Thanks Ravi.

I am not sure what state this bug should be in. Vivek?

Saurabh,
The fix shall be available in the next dowsntream build. You may want to re-run the test to check if it fixes the issue you have run into.

Comment 16 errata-xmlrpc 2016-03-01 05:35:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html


Note You need to log in before you can comment on or make changes to this bug.