Bug 1221941 - glusterfsd: bricks crash while executing ls on nfs-ganesha vers=3
Summary: glusterfsd: bricks crash while executing ls on nfs-ganesha vers=3
Alias: None
Product: GlusterFS
Classification: Community
Component: upcall
Version: 3.7.0
Hardware: x86_64
OS: Linux
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
Depends On: 1227204
Blocks: glusterfs-3.7.2
TreeView+ depends on / blocked
Reported: 2015-05-15 10:02 UTC by Saurabh
Modified: 2016-01-19 06:14 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.7.2
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1227204 (view as bug list)
Last Closed: 2015-06-20 09:48:20 UTC
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:

Attachments (Terms of Use)
coredump of the brick (732.05 KB, application/x-xz)
2015-05-15 10:08 UTC, Saurabh
no flags Details
sosreport of node2 (13.34 MB, application/x-xz)
2015-05-15 10:10 UTC, Saurabh
no flags Details
sosreport of node3 (13.21 MB, application/x-xz)
2015-05-15 10:13 UTC, Saurabh
no flags Details

Description Saurabh 2015-05-15 10:02:22 UTC
Description of problem:
Seen a coredump for several brick processes of the same volume, while executing the ls on mount-point. Volume mount using nfs-ganesha with vers=3

Version-Release number of selected component (if applicable):

How reproducible:
seen only once

Steps to Reproduce:
1. create a 6x2 volume, start it
2. bring up nfs-ganesha after completing the pre-requisites
3. disable_acl and do the needful as required to bringing up ganesha again
4. mount the volume with vers=3
5. execute ls on the mount-point

Actual results:

step 5 result,
[root@rhsauto010 ~]# time ls /mnt/nfs-test
dir  dir1  fstest_f017b1f6b87412d79e9052d0a289ce23  rhsauto010.test

real    144m12.193s
user    0m0.003s
sys     0m0.023s

(gdb) bt
#0  0x00007fcb200605bd in __gf_free (free_ptr=0x7fcabc0036a0) at mem-pool.c:312
#1  0x00007fcb0fbe1dc7 in upcall_reaper_thread (data=0x7fcb100127a0) at upcall-internal.c:426
#2  0x0000003890c079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000038908e88fd in clone () from /lib64/libc.so.6

Expected results:
ls should not this long time and glusterfsd getting a coredump is wierd, need to rectify this problem

Additional info:

Comment 1 Saurabh 2015-05-15 10:03:35 UTC
[root@nfs3 ~]# gluster volume status
Status of volume: gluster_shared_storage
Gluster process                             TCP Port  RDMA Port  Online  Pid
Brick   49156     0          Y       3549 
Brick    49155     0          Y       3329 
Brick    49155     0          Y       3081 
Brick    49155     0          Y       3346 
Brick   49157     0          Y       3566 
Brick    49156     0          Y       3346 
Brick    49156     0          Y       3098 
Brick    49156     0          Y       3363 
Brick   49158     0          Y       3583 
Brick    49157     0          Y       3363 
Brick    49157     0          Y       3115 
Brick    49157     0          Y       3380 
Self-heal Daemon on localhost               N/A       N/A        Y       28389
Self-heal Daemon on            N/A       N/A        Y       22717
Self-heal Daemon on             N/A       N/A        Y       4784 
Self-heal Daemon on             N/A       N/A        Y       25893
Task Status of Volume gluster_shared_storage
There are no active volume tasks
Status of volume: vol2
Gluster process                             TCP Port  RDMA Port  Online  Pid
Brick         49153     0          Y       22219
Brick          49152     0          Y       4321 
Brick          N/A       N/A        N       25654
Brick          49152     0          Y       27914
Brick         49154     0          Y       18842
Brick          49153     0          Y       4343 
Brick          N/A       N/A        N       25856
Brick          N/A       N/A        N       27934
Brick         49155     0          Y       22237
Brick          49154     0          Y       4361 
Brick          N/A       N/A        N       25874
Brick          N/A       N/A        N       27952
Self-heal Daemon on localhost               N/A       N/A        Y       28389
Self-heal Daemon on             N/A       N/A        Y       4784 
Self-heal Daemon on            N/A       N/A        Y       22717
Self-heal Daemon on             N/A       N/A        Y       25893
Task Status of Volume vol2
There are no active volume tasks

cat /etc/ganesha/exports/export.vol2.conf
# WARNING : Using Gluster CLI will overwrite manual
# changes made to this file. To avoid it, edit the
# file, copy it over to all the NFS-Ganesha nodes
# and run ganesha-ha.sh --refresh-config.
      Export_Id= 2 ;
      Path = "/vol2";
      FSAL {
           name = GLUSTER;
      Access_type = RW;
      Protocols = "3", "4" ;
      Transports = "UDP","TCP";
      SecType = "sys";
      Disable_ACL = True;

Comment 2 Saurabh 2015-05-15 10:08:11 UTC
Created attachment 1025731 [details]
coredump of the brick

Comment 3 Saurabh 2015-05-15 10:10:39 UTC
Created attachment 1025733 [details]
sosreport of node2

Comment 4 Saurabh 2015-05-15 10:13:06 UTC
Created attachment 1025735 [details]
sosreport of node3

Comment 5 Niels de Vos 2015-06-09 13:47:34 UTC
http://review.gluster.org/10909 has been merged in the master branch, backporting can be done now.

Comment 6 Soumya Koduri 2015-06-09 15:35:41 UTC
Thanks Niels. I shall backport the fix.

Comment 7 Anand Avati 2015-06-09 18:49:07 UTC
COMMIT: http://review.gluster.org/11141 committed in release-3.7 by Kaleb KEITHLEY (kkeithle@redhat.com) 
commit 922f9df5d7cdb7775dfa6fac4874105d5cc85c98
Author: Soumya Koduri <skoduri@redhat.com>
Date:   Thu Jun 4 11:25:35 2015 +0530

    Upcall/cache-invalidation: Ignore fops with frame->root->client not set
    Server-side internally generated fops like 'quota/marker' will
    not have any client associated with the frame. Hence we need a
    check for clients to be valid before processing for upcall cache
    invalidation. Also fixed an issue with initializing reaper-thread.
    Added a testcase to test the fix.
    Change-Id: If7419b98aca383f4b80711c10fef2e0b32498c57
    BUG: 1221941
    Signed-off-by: Soumya Koduri <skoduri@redhat.com>
    Reviewed-on: http://review.gluster.org/10909
    Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
    Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
    Reviewed-by: Niels de Vos <ndevos@redhat.com>
    Reviewed-on: http://review.gluster.org/11141
    Tested-by: NetBSD Build System <jenkins@build.gluster.org>

Comment 8 Niels de Vos 2015-06-20 09:48:20 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.2, please reopen this bug report.

glusterfs-3.7.2 has been announced on the Gluster Packaging mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://www.gluster.org/pipermail/packaging/2015-June/000006.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.