Bug 1339208

Summary:

Ganesha gets killed with segfault error while rebalance is in progress.

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Shashank Raj <sraj>

Component:

gluster-nfs

Assignee:

Soumya Koduri <skoduri>

Status:

CLOSED ERRATA

QA Contact:

Shashank Raj <sraj>

Severity:

urgent

Docs Contact:

Priority:

unspecified

Version:

rhgs-3.1

CC:

jthottan, kkeithle, ndevos, rcyriac, rhinduja, rhs-bugs, sashinde, skoduri, storage-qa-internal

Target Milestone:

---

Keywords:

Regression, ZStream

Target Release:

RHGS 3.1.3

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

glusterfs-3.7.9-7

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-06-23 05:24:20 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1311817

Attachments:

Description	Flags
ganesha-gfapi.log from mounted node	none
ganesha.log	none

Description Shashank Raj 2016-05-24 11:49:55 UTC

Created attachment 1160991 [details]
ganesha-gfapi.log from mounted node

Description of problem:

Ganesha gets killed with segfault error while rebalance is in progress.

Version-Release number of selected component (if applicable):

glusterfs-3.7.9-5
nfs-ganesha-2.3.1-7

How reproducible:

Always

Steps to Reproduce:
1. Create a 4 node ganesha cluster.
2. Create a volume and enable ganesha on it.
3. Mount the volume using vers=3 or 4 and create nested directories on the mount point.

from distaf logs:

for i in {1..25}; do mkdir /mnt1464089502.83/a$i;  for j in {1..50}; do mkdir /mnt1464089502.83/a$i/b$j; for k in {1..50}; do touch /mnt1464089502.83/a$i/b$j/c$k; done done done

4. Add bricks to the volume.

gluster volume add-brick newvolume replica 2   dhcp37-44.lab.eng.blr.redhat.com:/bricks/brick4/newvolume_brick12 dhcp37-220.lab.eng.blr.redhat.com:/bricks/brick4/newvolume_brick13

5. start the rebalance process:

gluster v rebalance newvolume start force

6. Observe that while rebalance is in progress, ganesha process on the mounted node gets killed with seg fault error:

[73850.224747] ganesha.nfsd[6003]: segfault at 7fda62dfd8a4 ip 00007fc8c38e1210 sp 00007fc83ac7cf68 error 6 in libpthread-2.17.so[7fc8c38d5000+16000]


Below is the bt generated from gdb:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f9c60430280 (LWP 18923)]
0x00007f9c8fcd0210 in pthread_spin_lock () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install libacl-2.2.51-12.el7.x86_64 openssl-libs-1.0.1e-51.el7_2.5.x86_64 zlib-1.2.7-15.el7.x86_64
(gdb) bt
#0  0x00007f9c8fcd0210 in pthread_spin_lock () from /lib64/libpthread.so.0
#1  0x00007f9c7c60b63d in __gf_free (free_ptr=0x7f9c50000950) at mem-pool.c:316
#2  0x00007f9c7c89db0b in glfs_h_poll_cache_invalidation (
    fs=fs@entry=0x7f9c78007e30, up_arg=up_arg@entry=0x7f9c6042f0a0, 
    upcall_data=upcall_data@entry=0x7f9c5e239e00) at glfs-handleops.c:1972
#3  0x00007f9c7c89de00 in pub_glfs_h_poll_upcall (fs=0x7f9c78007e30, 
    up_arg=up_arg@entry=0x7f9c6042f0a0) at glfs-handleops.c:2066
#4  0x00007f9c7ccb5ed3 in GLUSTERFSAL_UP_Thread (Arg=0x7f9c78007d00)
    at /usr/src/debug/nfs-ganesha-2.3.1/src/FSAL/FSAL_GLUSTER/fsal_up.c:153
#5  0x00007f9c8fccbdc5 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f9c8f399ced in clone () from /lib64/libc.so.6

Actual results:

Ganesha gets killed with segfault error while rebalance is in progress.

Expected results:

ganesha process should not get killed.

Additional info:

Attached ganesha and ganesha-gfapi logs from the mounted node

Comment 2 Shashank Raj 2016-05-24 11:50:39 UTC

Created attachment 1160992 [details]
ganesha.log

Comment 3 Soumya Koduri 2016-05-24 11:54:52 UTC

The reason for the crash is that in 'glfs_h_poll_cache_invalidation', we used 'calloc' to create up_inpode_arg. Hence it shall not have any memory accounting variables set/defined which are used in GF_FREE (up_inode_arg) (in case of any errors). The fix is to use 'GF_CALLOC' while creating memory for up_inode_arg variable.

Comment 4 Soumya Koduri 2016-05-24 11:57:22 UTC

Since this is a change in glusterfs code-path, adjusting components accordingly.

Comment 5 Shashank Raj 2016-05-24 11:58:08 UTC

Since no ganesha kills are acceptable in any scenario, raising a blocker flag for 3.1.3

Comment 6 Soumya Koduri 2016-05-24 12:26:59 UTC

Fix has been posted upstream for review -
    http://review.gluster.org/14521

Comment 13 Shashank Raj 2016-06-02 06:57:37 UTC

Verified this bug with latest glusterfs-3.7.9-7 and nfs-ganesha-2.3.1-7 build and its working as expected.

The earlier rebalance automated cases which were making nfs-ganesha to crash, is now working fine and no ganesha crash/ segfault error is observed.

based on the above observation, marking this bug as Verified.

Comment 15 errata-xmlrpc 2016-06-23 05:24:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240