Bug 1325975
Summary: | nfs-ganesha crashes with segfault error while doing refresh config on volume. | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Shashank Raj <sraj> | |
Component: | nfs-ganesha | Assignee: | Soumya Koduri <skoduri> | |
Status: | CLOSED ERRATA | QA Contact: | Shashank Raj <sraj> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.1 | CC: | asrivast, jthottan, kkeithle, ndevos, nlevinki, rhinduja, sashinde, skoduri, smohan | |
Target Milestone: | --- | Keywords: | ZStream | |
Target Release: | RHGS 3.1.3 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.7.9-3 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1326627 (view as bug list) | Environment: | ||
Last Closed: | 2016-06-23 05:16:57 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1330892 | |||
Bug Blocks: | 1311817, 1326627 |
Description
Shashank Raj
2016-04-11 13:59:13 UTC
I could reproduce the issue - Breakpoint 1, inode_table_destroy (inode_table=0x7f523802db80) at inode.c:1729 1729 inode_t *tmp = NULL, *trav = NULL; (gdb) b __inode_retire Breakpoint 2 at 0x7f525a57bba1: file inode.c, line 430. (gdb) p inode_table->active $1 = {next = 0x7f5240001444, prev = 0x7f5238164ec4} (gdb) p inode_table->active_size $2 = 3 (gdb) p &inode_table->active $3 = (struct list_head *) 0x7f523802dbe0 (gdb) p/x &inode_table->active->next-104 $4 = 0x7f523802d8a0 (gdb) p/x &inode_table->active->next->next-104 $5 = 0x7f5240001104 (gdb) p/x &inode_table->active->next->next->next-104 $6 = 0xb96ed4 (gdb) p/x &inode_table->active->next->next->next->next-104 $7 = 0x7f5238164b84 (gdb) p/x &inode_table->active->next->next->next->next->next-104 $8 = 0x7f523802d8a0 (gdb) p/x &inode_table->active->next->next->next->next $9 = 0x7f5238164ec4 (gdb) p *(inode_t *)$4 $10 = {table = 0x7f5238019000, gfid = "\276\272\376\312", '\000' <repeats 11 times>, lock = 0, nlookup = 0, fd_count = 0, ref = 0, ia_type = IA_INVAL, fd_list = {next = 0x0, prev = 0x7f522002a3a0}, dentry_list = {next = 0x0, prev = 0x100000001}, hash = {next = 0x3802d92400000001, prev = 0x3802d92c00007f52}, list = { next = 0x7f52, prev = 0x0}, _ctx = 0x100000000} (gdb) p &inode_table->active $11 = (struct list_head *) 0x7f523802dbe0 (gdb) p &inode_table->active->next->list-104 There is no member named list. (gdb) p &inode_table->active->next $12 = (struct list_head **) 0x7f523802dbe0 (gdb) p inode_table->active->next $13 = (struct list_head *) 0x7f5240001444 (gdb) p inode_table->active->next-104 $14 = (struct list_head *) 0x7f5240000dc4 (gdb) p *(inode_t *)inode_table->active->next-104 Structure has no component named operator-. (gdb) p *(inode_t *)(inode_table->active->next-104) $15 = {table = 0x0, gfid = '\000' <repeats 15 times>, lock = 0, nlookup = 0, fd_count = 0, ref = 0, ia_type = IA_INVAL, fd_list = {next = 0x0, prev = 0x0}, dentry_list = {next = 0x0, prev = 0x0}, hash = {next = 0x0, prev = 0x0}, list = {next = 0x0, prev = 0x0}, _ctx = 0x0} (gdb) p inode_table $16 = (inode_table_t *) 0x7f523802db80 (gdb) p inode_table->active $17 = {next = 0x7f5240001444, prev = 0x7f5238164ec4} (gdb) p inode_table->active_size $18 = 3 (gdb) p inode_table->active->next $19 = (struct list_head *) 0x7f5240001444 (gdb) p inode_table->active->next->next $20 = (struct list_head *) 0xb97214 (gdb) p inode_table->active->next->next->next $21 = (struct list_head *) 0x7f5238164ec4 (gdb) p/x *(inode_t *) $19-104 Structure has no component named operator-. (gdb) p/x *(inode_t *)($19-104) $22 = {table = 0x0, gfid = {0x0 <repeats 16 times>}, lock = 0x0, nlookup = 0x0, fd_count = 0x0, ref = 0x0, ia_type = 0x0, fd_list = {next = 0x0, prev = 0x0}, dentry_list = {next = 0x0, prev = 0x0}, hash = {next = 0x0, prev = 0x0}, list = {next = 0x0, prev = 0x0}, _ctx = 0x0} (gdb) b __inode_retire Note: breakpoint 2 also set at pc 0x7f525a57bba1. Breakpoint 3 at 0x7f525a57bba1: file inode.c, line 430. (gdb) p &$17 $23 = (struct list_head *) 0x7f523802dbe0 (gdb) p inode_table->active->next->next->next->next $24 = (struct list_head *) 0x7f523802dbe0 (gdb) p/x *(inode_t *)(0x7f5240001444-104) $25 = {table = 0x7f523802db80, gfid = {0xad, 0x7e, 0xb1, 0x60, 0x3, 0x1b, 0x4f, 0x8e, 0x99, 0xf, 0x38, 0x2e, 0xe4, 0xb8, 0xf2, 0x93}, lock = 0x1, nlookup = 0x0, fd_count = 0x1, ref = 0x1, ia_type = 0x1, fd_list = { next = 0x7f524000177c, prev = 0x7f524000177c}, dentry_list = { next = 0x7f52400019ac, prev = 0x7f52400019ac}, hash = {next = 0x7f5238120700, prev = 0x7f5238120700}, list = {next = 0xb97214, prev = 0x7f523802dbe0}, _ctx = 0x7f52400014b0} (gdb) p/x *(inode_t *)(0xb97214-104) $26 = {table = 0x7f523802db80, gfid = {0x6d, 0xa, 0x7a, 0x59, 0x76, 0xf4, 0x40, 0x6f, 0xaf, 0x35, 0x3b, 0x8e, 0xaf, 0x78, 0xec, 0xd6}, lock = 0x1, nlookup = 0x0, fd_count = 0x0, ref = 0x1, ia_type = 0x2, fd_list = { next = 0xb971e4, prev = 0xb971e4}, dentry_list = {next = 0xb96f5c, prev = 0xb96f5c}, hash = {next = 0x7f523811ab30, prev = 0x7f523811ab30}, list = {next = 0x7f5238164ec4, prev = 0x7f5240001444}, _ctx = 0xbc9b40} (gdb) p/x *(inode_t *)(0x7f5238164ec4-104) $27 = {table = 0x7f523802db80, gfid = {0x0 <repeats 15 times>, 0x1}, lock = 0x1, nlookup = 0x0, fd_count = 0x0, ref = 0x1, ia_type = 0x2, fd_list = { next = 0x7f5238164e94, prev = 0x7f5238164e94}, dentry_list = { next = 0x7f5238164ea4, prev = 0x7f5238164ea4}, hash = {next = 0x7f523802dde0, prev = 0x7f523802dde0}, list = {next = 0x7f523802dbe0, prev = 0xb97214}, _ctx = 0x7f5238164f30} (gdb) p/x (0x7f5238164ec4-104) $28 = 0x7f5238164e5c (gdb) p/x (0x7f5238164ec4-104) $29 = 0x7f5238164e5c (gdb) p/x (0xb97214-104) $30 = 0xb971ac (gdb) p/x (0x7f5240001444-104) $31 = 0x7f52400013dc (gdb) n 1731 if (inode_table == NULL) (gdb) 1756 pthread_mutex_lock (&inode_table->lock); (gdb) 1768 while (!list_empty (&inode_table->lru)) { (gdb) 1776 list_for_each_entry_safe (trav, tmp, &inode_table->active, (gdb) 1782 if (trav != inode_table->root) (gdb) p trav $32 = (inode_t *) 0x7f52400013dc (gdb) p tmp $33 = (inode_t *) 0xb971ac (gdb) p *trav $34 = {table = 0x7f523802db80, gfid = "\255~\261`\003\033O\216\231\017\070.\344\270", <incomplete sequence \362\223>, lock = 1, nlookup = 0, fd_count = 1, ref = 1, ia_type = IA_IFREG, fd_list = {next = 0x7f524000177c, prev = 0x7f524000177c}, dentry_list = { next = 0x7f52400019ac, prev = 0x7f52400019ac}, hash = {next = 0x7f5238120700, prev = 0x7f5238120700}, list = {next = 0xb97214, prev = 0x7f523802dbe0}, _ctx = 0x7f52400014b0} (gdb) p *tmp $35 = {table = 0x7f523802db80, gfid = "m\nzYv\364@o\257\065;\216\257x\354", <incomplete sequence \326>, lock = 1, nlookup = 0, fd_count = 0, ref = 1, ia_type = IA_IFDIR, fd_list = { next = 0xb971e4, prev = 0xb971e4}, dentry_list = {next = 0xb96f5c, prev = 0xb96f5c}, hash = {next = 0x7f523811ab30, prev = 0x7f523811ab30}, list = {next = 0x7f5238164ec4, prev = 0x7f5240001444}, _ctx = 0xbc9b40} (gdb) c Continuing. [Thread 0x7f52345fb700 (LWP 5032) exited] Breakpoint 2, __inode_retire (inode=0x7f52400013dc) at inode.c:430 430 dentry_t *dentry = NULL; (gdb) c Continuing. Breakpoint 2, __inode_retire (inode=0xb971ac) at inode.c:430 430 dentry_t *dentry = NULL; (gdb) bt #0 __inode_retire (inode=0xb971ac) at inode.c:430 #1 0x00007f525a57bd55 in __inode_unref (inode=0xb971ac) at inode.c:473 #2 0x00007f525a57b001 in __dentry_unset (dentry=0x7f52400019ac) at inode.c:141 #3 0x00007f525a57bc78 in __inode_retire (inode=0x7f52400013dc) at inode.c:445 #4 0x00007f525a57c367 in __inode_ref_reduce_by_n (inode=0x7f52400013dc, nref=0) at inode.c:686 #5 0x00007f525a57e7d2 in inode_table_destroy (inode_table=0x7f523802db80) at inode.c:1789 #6 0x00007f525a57e63c in inode_table_destroy_all (ctx=0x7f5220003c00) at inode.c:1720 #7 0x00007f525e4e2e42 in pub_glfs_fini (fs=0x7f5220003a80) at glfs.c:1158 #8 0x00007f525e6c6281 in export_release (exp_hdl=0x7f5220003960) at /home/guest/Documents/workspace/nfs-ganesha/src/FSAL/FSAL_GLUSTER/export.c:86 #9 0x000000000050c317 in free_export_resources (export=0x7f5220000ec8) at /home/guest/Documents/workspace/nfs-ganesha/src/support/exports.c:1497 #10 0x000000000051d5a8 in free_export (export=0x7f5220000ec8) at /home/guest/Documents/workspace/nfs-ganesha/src/support/export_mgr.c:250 #11 0x000000000051e93d in put_gsh_export (export=0x7f5220000ec8) at /home/guest/Documents/workspace/nfs-ganesha/src/support/export_mgr.c:631 ---Type <return> to continue, or q <return> to quit---q Quit (gdb) l 425 426 427 static void 428 __inode_retire (inode_t *inode) 429 { 430 dentry_t *dentry = NULL; 431 dentry_t *t = NULL; 432 433 if (!inode) { 434 gf_msg_callingfn (THIS->name, GF_LOG_WARNING, 0, (gdb) 435 LG_MSG_INODE_NOT_FOUND, "inode not found"); 436 return; 437 } 438 439 list_move_tail (&inode->list, &inode->table->purge); 440 inode->table->purge_size++; 441 442 __inode_unhash (inode); 443 444 list_for_each_entry_safe (dentry, t, &inode->dentry_list, inode_list) { (gdb) As can be seen above, the 'tmp' inode entry is being moved from active list to purge list by next iteration. Since the inode entries can get moved from one list to another between iterations, its best to not fetch them early. So the fix can be to use 'list_each_entry' and is safe as there will be no other thread accessing these inodes. Fix posted upstream - http://review.gluster.org/13987 Even while executing gluster operations automation suite, hit below segfault issue and ganesha crashes on mounted node. below bt is observed: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f6de71fb700 (LWP 20592)] __inode_ref_reduce_by_n (inode=inode@entry=0x7f6d71e9b5a8, nref=nref@entry=0) at inode.c:686 686 inode->table->active_size--; (gdb) bt #0 __inode_ref_reduce_by_n (inode=inode@entry=0x7f6d71e9b5a8, nref=nref@entry=0) at inode.c:686 #1 0x00007f6de4865a2a in inode_table_destroy (inode_table=0x7f6d71e9b580) at inode.c:1794 #2 0x00007f6de4865b21 in inode_table_destroy_all (ctx=ctx@entry=0x7f6de0007f90) at inode.c:1725 #3 0x00007f6de4af107f in pub_glfs_fini (fs=0x7f6de0007e30) at glfs.c:1133 #4 0x00007f6de4f18051 in export_release () from /usr/lib64/ganesha/libfsalgluster.so #5 0x00007f6dfa327acb in free_export_resources () #6 0x00007f6dfa3369e3 in free_export () #7 0x00007f6dfa3386a4 in gsh_export_removeexport () #8 0x00007f6dfa3455e9 in dbus_message_entrypoint () #9 0x00007f6df9be7c86 in _dbus_object_tree_dispatch_and_unlock () from /lib64/libdbus-1.so.3 #10 0x00007f6df9bd9e49 in dbus_connection_dispatch () from /lib64/libdbus-1.so.3 #11 0x00007f6df9bda0e2 in _dbus_connection_read_write_dispatch () from /lib64/libdbus-1.so.3 #12 0x00007f6dfa346640 in gsh_dbus_thread () #13 0x00007f6df8803dc5 in start_thread () from /lib64/libpthread.so.0 #14 0x00007f6df7ed21cd in clone () from /lib64/libc.so.6 Verified this bug with latest glusterfs-3.7.9-3 build and after performing rootsquash regression automated cases, did not observe this issue. Automation test case with ID 342834, which originally was reproducing this issue, works fine now and no refresh config failure or crash seen in this case. verified on both v3 and v4 ganesha mounts. Based on the above observation, moving this bug to verified state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240 |