Bug 1339208

Summary: Ganesha gets killed with segfault error while rebalance is in progress.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Shashank Raj <sraj>
Component: gluster-nfsAssignee: Soumya Koduri <skoduri>
Status: CLOSED ERRATA QA Contact: Shashank Raj <sraj>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: jthottan, kkeithle, ndevos, rcyriac, rhinduja, rhs-bugs, sashinde, skoduri, storage-qa-internal
Target Milestone: ---Keywords: Regression, ZStream
Target Release: RHGS 3.1.3   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.7.9-7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-23 05:24:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1311817    
Attachments:
Description Flags
ganesha-gfapi.log from mounted node
none
ganesha.log none

Description Shashank Raj 2016-05-24 11:49:55 UTC
Created attachment 1160991 [details]
ganesha-gfapi.log from mounted node

Description of problem:

Ganesha gets killed with segfault error while rebalance is in progress.

Version-Release number of selected component (if applicable):

glusterfs-3.7.9-5
nfs-ganesha-2.3.1-7

How reproducible:

Always

Steps to Reproduce:
1. Create a 4 node ganesha cluster.
2. Create a volume and enable ganesha on it.
3. Mount the volume using vers=3 or 4 and create nested directories on the mount point.

from distaf logs:

for i in {1..25}; do mkdir /mnt1464089502.83/a$i;  for j in {1..50}; do mkdir /mnt1464089502.83/a$i/b$j; for k in {1..50}; do touch /mnt1464089502.83/a$i/b$j/c$k; done done done

4. Add bricks to the volume.

gluster volume add-brick newvolume replica 2   dhcp37-44.lab.eng.blr.redhat.com:/bricks/brick4/newvolume_brick12 dhcp37-220.lab.eng.blr.redhat.com:/bricks/brick4/newvolume_brick13

5. start the rebalance process:

gluster v rebalance newvolume start force

6. Observe that while rebalance is in progress, ganesha process on the mounted node gets killed with seg fault error:

[73850.224747] ganesha.nfsd[6003]: segfault at 7fda62dfd8a4 ip 00007fc8c38e1210 sp 00007fc83ac7cf68 error 6 in libpthread-2.17.so[7fc8c38d5000+16000]


Below is the bt generated from gdb:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f9c60430280 (LWP 18923)]
0x00007f9c8fcd0210 in pthread_spin_lock () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install libacl-2.2.51-12.el7.x86_64 openssl-libs-1.0.1e-51.el7_2.5.x86_64 zlib-1.2.7-15.el7.x86_64
(gdb) bt
#0  0x00007f9c8fcd0210 in pthread_spin_lock () from /lib64/libpthread.so.0
#1  0x00007f9c7c60b63d in __gf_free (free_ptr=0x7f9c50000950) at mem-pool.c:316
#2  0x00007f9c7c89db0b in glfs_h_poll_cache_invalidation (
    fs=fs@entry=0x7f9c78007e30, up_arg=up_arg@entry=0x7f9c6042f0a0, 
    upcall_data=upcall_data@entry=0x7f9c5e239e00) at glfs-handleops.c:1972
#3  0x00007f9c7c89de00 in pub_glfs_h_poll_upcall (fs=0x7f9c78007e30, 
    up_arg=up_arg@entry=0x7f9c6042f0a0) at glfs-handleops.c:2066
#4  0x00007f9c7ccb5ed3 in GLUSTERFSAL_UP_Thread (Arg=0x7f9c78007d00)
    at /usr/src/debug/nfs-ganesha-2.3.1/src/FSAL/FSAL_GLUSTER/fsal_up.c:153
#5  0x00007f9c8fccbdc5 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f9c8f399ced in clone () from /lib64/libc.so.6

Actual results:

Ganesha gets killed with segfault error while rebalance is in progress.

Expected results:

ganesha process should not get killed.

Additional info:

Attached ganesha and ganesha-gfapi logs from the mounted node

Comment 2 Shashank Raj 2016-05-24 11:50:39 UTC
Created attachment 1160992 [details]
ganesha.log

Comment 3 Soumya Koduri 2016-05-24 11:54:52 UTC
The reason for the crash is that in 'glfs_h_poll_cache_invalidation', we used 'calloc' to create up_inpode_arg. Hence it shall not have any memory accounting variables set/defined which are used in GF_FREE (up_inode_arg) (in case of any errors). The fix is to use 'GF_CALLOC' while creating memory for up_inode_arg variable.

Comment 4 Soumya Koduri 2016-05-24 11:57:22 UTC
Since this is a change in glusterfs code-path, adjusting components accordingly.

Comment 5 Shashank Raj 2016-05-24 11:58:08 UTC
Since no ganesha kills are acceptable in any scenario, raising a blocker flag for 3.1.3

Comment 6 Soumya Koduri 2016-05-24 12:26:59 UTC
Fix has been posted upstream for review -
    http://review.gluster.org/14521

Comment 13 Shashank Raj 2016-06-02 06:57:37 UTC
Verified this bug with latest glusterfs-3.7.9-7 and nfs-ganesha-2.3.1-7 build and its working as expected.

The earlier rebalance automated cases which were making nfs-ganesha to crash, is now working fine and no ganesha crash/ segfault error is observed.

based on the above observation, marking this bug as Verified.

Comment 15 errata-xmlrpc 2016-06-23 05:24:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240