1339208 – Ganesha gets killed with segfault error while rebalance is in progress.

Bug 1339208 - Ganesha gets killed with segfault error while rebalance is in progress.

Summary: Ganesha gets killed with segfault error while rebalance is in progress.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-nfs
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Soumya Koduri
QA Contact:	Shashank Raj
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1311817
TreeView+	depends on / blocked

Reported:	2016-05-24 11:49 UTC by Shashank Raj
Modified:	2016-11-08 03:52 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.7.9-7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-06-23 05:24:20 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
ganesha-gfapi.log from mounted node (2.38 MB, text/plain) 2016-05-24 11:49 UTC, Shashank Raj	no flags	Details
ganesha.log (168.05 KB, text/plain) 2016-05-24 11:50 UTC, Shashank Raj	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Description Shashank Raj 2016-05-24 11:49:55 UTC

Created attachment 1160991 [details]
ganesha-gfapi.log from mounted node

Description of problem:

Ganesha gets killed with segfault error while rebalance is in progress.

Version-Release number of selected component (if applicable):

glusterfs-3.7.9-5
nfs-ganesha-2.3.1-7

How reproducible:

Always

Steps to Reproduce:
1. Create a 4 node ganesha cluster.
2. Create a volume and enable ganesha on it.
3. Mount the volume using vers=3 or 4 and create nested directories on the mount point.

from distaf logs:

for i in {1..25}; do mkdir /mnt1464089502.83/a$i;  for j in {1..50}; do mkdir /mnt1464089502.83/a$i/b$j; for k in {1..50}; do touch /mnt1464089502.83/a$i/b$j/c$k; done done done

4. Add bricks to the volume.

gluster volume add-brick newvolume replica 2   dhcp37-44.lab.eng.blr.redhat.com:/bricks/brick4/newvolume_brick12 dhcp37-220.lab.eng.blr.redhat.com:/bricks/brick4/newvolume_brick13

5. start the rebalance process:

gluster v rebalance newvolume start force

6. Observe that while rebalance is in progress, ganesha process on the mounted node gets killed with seg fault error:

[73850.224747] ganesha.nfsd[6003]: segfault at 7fda62dfd8a4 ip 00007fc8c38e1210 sp 00007fc83ac7cf68 error 6 in libpthread-2.17.so[7fc8c38d5000+16000]


Below is the bt generated from gdb:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f9c60430280 (LWP 18923)]
0x00007f9c8fcd0210 in pthread_spin_lock () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install libacl-2.2.51-12.el7.x86_64 openssl-libs-1.0.1e-51.el7_2.5.x86_64 zlib-1.2.7-15.el7.x86_64
(gdb) bt
#0  0x00007f9c8fcd0210 in pthread_spin_lock () from /lib64/libpthread.so.0
#1  0x00007f9c7c60b63d in __gf_free (free_ptr=0x7f9c50000950) at mem-pool.c:316
#2  0x00007f9c7c89db0b in glfs_h_poll_cache_invalidation (
    fs=fs@entry=0x7f9c78007e30, up_arg=up_arg@entry=0x7f9c6042f0a0, 
    upcall_data=upcall_data@entry=0x7f9c5e239e00) at glfs-handleops.c:1972
#3  0x00007f9c7c89de00 in pub_glfs_h_poll_upcall (fs=0x7f9c78007e30, 
    up_arg=up_arg@entry=0x7f9c6042f0a0) at glfs-handleops.c:2066
#4  0x00007f9c7ccb5ed3 in GLUSTERFSAL_UP_Thread (Arg=0x7f9c78007d00)
    at /usr/src/debug/nfs-ganesha-2.3.1/src/FSAL/FSAL_GLUSTER/fsal_up.c:153
#5  0x00007f9c8fccbdc5 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f9c8f399ced in clone () from /lib64/libc.so.6

Actual results:

Ganesha gets killed with segfault error while rebalance is in progress.

Expected results:

ganesha process should not get killed.

Additional info:

Attached ganesha and ganesha-gfapi logs from the mounted node

Comment 2 Shashank Raj 2016-05-24 11:50:39 UTC

Created attachment 1160992 [details]
ganesha.log

Comment 3 Soumya Koduri 2016-05-24 11:54:52 UTC

The reason for the crash is that in 'glfs_h_poll_cache_invalidation', we used 'calloc' to create up_inpode_arg. Hence it shall not have any memory accounting variables set/defined which are used in GF_FREE (up_inode_arg) (in case of any errors). The fix is to use 'GF_CALLOC' while creating memory for up_inode_arg variable.

Comment 4 Soumya Koduri 2016-05-24 11:57:22 UTC

Since this is a change in glusterfs code-path, adjusting components accordingly.

Comment 5 Shashank Raj 2016-05-24 11:58:08 UTC

Since no ganesha kills are acceptable in any scenario, raising a blocker flag for 3.1.3

Comment 6 Soumya Koduri 2016-05-24 12:26:59 UTC

Fix has been posted upstream for review -
    http://review.gluster.org/14521

Comment 13 Shashank Raj 2016-06-02 06:57:37 UTC

Verified this bug with latest glusterfs-3.7.9-7 and nfs-ganesha-2.3.1-7 build and its working as expected.

The earlier rebalance automated cases which were making nfs-ganesha to crash, is now working fine and no ganesha crash/ segfault error is observed.

based on the above observation, marking this bug as Verified.

Comment 15 errata-xmlrpc 2016-06-23 05:24:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.