Bug 1410741

Summary: [Ganesha + EC] Segfault occured and nfs-ganesha process got killed while compiling glusterfs code in mount point.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Arthy Loganathan <aloganat>
Component: nfs-ganeshaAssignee: Daniel Gryniewicz <dang>
Status: CLOSED ERRATA QA Contact: Arthy Loganathan <aloganat>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, asoman, dang, ffilz, jthottan, mbenjamin, rcyriac, rhinduja, rhs-bugs, skoduri, storage-qa-internal, tdesala
Target Milestone: ---   
Target Release: RHGS 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: nfs-ganesha-2.4.1-7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-23 06:28:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1351528    

Description Arthy Loganathan 2017-01-06 10:25:45 UTC
Description of problem:
Segfault occured and nfs-ganesha process got killed while compiling glusterfs build in mount point. 

Version-Release number of selected component (if applicable):
nfs-ganesha-gluster-2.4.1-3.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-10.el7rhgs.x86_64
nfs-ganesha-2.4.1-3.el7rhgs.x86_64


How reproducible:
Always

Steps to Reproduce:
1. Create a ganehsa cluster and create 2*(4+2) EC volume.
2. Enable ganesha on the volume.
3. Compile glusterfs source code on the mount point.

Actual results:
Segfault occured and nfs-ganesha process got killed

Expected results:
Compilation should succeed.

Additional info:

/var/log/messages log snippet:
-------------------------------


Jan  5 18:54:50 dhcp46-111 kernel: ganesha.nfsd[3281]: segfault at 40 ip 00007fc00550a137 sp 00007fbfcdd88e40 error 4 in ganesha.nfsd[7fc0053fa000+162000]
Jan  5 18:54:51 dhcp46-111 systemd: nfs-ganesha.service: main process exited, code=killed, status=11/SEGV
Jan  5 18:54:51 dhcp46-111 systemd: Unit nfs-ganesha.service entered failed state.
Jan  5 18:54:51 dhcp46-111 systemd: nfs-ganesha.service failed.


[root@dhcp46-111 ~]# service nfs-ganesha status  -l
Redirecting to /bin/systemctl status  -l nfs-ganesha.service
● nfs-ganesha.service - NFS-Ganesha file server
   Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled; vendor preset: disabled)
   Active: failed (Result: signal) since Fri 2017-01-06 15:19:45 IST; 31min ago
     Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki
  Process: 29712 ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown (code=exited, status=0/SUCCESS)
  Process: 12931 ExecStartPost=/bin/bash -c prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE (code=exited, status=0/SUCCESS)
  Process: 12929 ExecStart=/bin/bash -c ${NUMACTL} ${NUMAOPTS} /usr/bin/ganesha.nfsd ${OPTIONS} ${EPOCH} (code=exited, status=0/SUCCESS)
 Main PID: 12930 (code=killed, signal=SEGV)

Jan 06 11:49:40 dhcp46-111.lab.eng.blr.redhat.com systemd[1]: Starting NFS-Ganesha file server...
Jan 06 11:49:40 dhcp46-111.lab.eng.blr.redhat.com systemd[1]: Started NFS-Ganesha file server.
Jan 06 15:19:45 dhcp46-111.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha.service: main process exited, code=killed, status=11/SEGV
Jan 06 15:19:45 dhcp46-111.lab.eng.blr.redhat.com systemd[1]: Unit nfs-ganesha.service entered failed state.
Jan 06 15:19:45 dhcp46-111.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha.service failed.


[root@dhcp46-111 ~]# gluster vol info vol_ec
 
Volume Name: vol_ec
Type: Distributed-Disperse
Volume ID: 3e41487f-92b7-4d0f-aee8-80492ba34afb
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: dhcp46-111.lab.eng.blr.redhat.com:/bricks/brick6/br6
Brick2: dhcp46-115.lab.eng.blr.redhat.com:/bricks/brick6/br6
Brick3: dhcp46-139.lab.eng.blr.redhat.com:/bricks/brick6/br6
Brick4: dhcp46-124.lab.eng.blr.redhat.com:/bricks/brick6/br6
Brick5: dhcp46-131.lab.eng.blr.redhat.com:/bricks/brick6/br6
Brick6: dhcp46-152.lab.eng.blr.redhat.com:/bricks/brick6/br6
Brick7: dhcp46-111.lab.eng.blr.redhat.com:/bricks/brick7/br7
Brick8: dhcp46-115.lab.eng.blr.redhat.com:/bricks/brick7/br7
Brick9: dhcp46-139.lab.eng.blr.redhat.com:/bricks/brick7/br7
Brick10: dhcp46-124.lab.eng.blr.redhat.com:/bricks/brick7/br7
Brick11: dhcp46-131.lab.eng.blr.redhat.com:/bricks/brick7/br7
Brick12: dhcp46-152.lab.eng.blr.redhat.com:/bricks/brick7/br7
Options Reconfigured:
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
ganesha.enable: on
features.cache-invalidation: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@dhcp46-111 ~]# gluster vol status vol_ec
Status of volume: vol_ec
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp46-111.lab.eng.blr.redhat.com:/br
icks/brick6/br6                             49153     0          Y       16684
Brick dhcp46-115.lab.eng.blr.redhat.com:/br
icks/brick6/br6                             49153     0          Y       24378
Brick dhcp46-139.lab.eng.blr.redhat.com:/br
icks/brick6/br6                             49153     0          Y       32613
Brick dhcp46-124.lab.eng.blr.redhat.com:/br
icks/brick6/br6                             49155     0          Y       14120
Brick dhcp46-131.lab.eng.blr.redhat.com:/br
icks/brick6/br6                             49156     0          Y       28710
Brick dhcp46-152.lab.eng.blr.redhat.com:/br
icks/brick6/br6                             49156     0          Y       23437
Brick dhcp46-111.lab.eng.blr.redhat.com:/br
icks/brick7/br7                             49154     0          Y       16703
Brick dhcp46-115.lab.eng.blr.redhat.com:/br
icks/brick7/br7                             49154     0          Y       24397
Brick dhcp46-139.lab.eng.blr.redhat.com:/br
icks/brick7/br7                             49154     0          Y       32632
Brick dhcp46-124.lab.eng.blr.redhat.com:/br
icks/brick7/br7                             49156     0          Y       14139
Brick dhcp46-131.lab.eng.blr.redhat.com:/br
icks/brick7/br7                             49157     0          Y       28729
Brick dhcp46-152.lab.eng.blr.redhat.com:/br
icks/brick7/br7                             49157     0          Y       23457
Self-heal Daemon on localhost               N/A       N/A        Y       22980
Self-heal Daemon on dhcp46-115.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       24494
Self-heal Daemon on dhcp46-139.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       25259
Self-heal Daemon on dhcp46-152.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       3277 
Self-heal Daemon on dhcp46-124.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       6490 
Self-heal Daemon on dhcp46-131.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       10045
 
Task Status of Volume vol_ec
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp46-111 ~]# 


However, the test passed with 2*2 dist-rep volume with nfs-ganesha mount.

core and sosreports will be attached soon.

Comment 2 Arthy Loganathan 2017-01-06 10:45:22 UTC
bt snippet:
===========

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fe8ce724700 (LWP 13235)]
mdcache_dirent_rename (parent=parent@entry=0x7fe968022ad0, oldname=oldname@entry=0x7fe93c02b9c0 "libglusterfs_la-dict.loT", newname=newname@entry=0x7fe93c018580 "libglusterfs_la-dict.lo")
    at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1364
1364		mdcache_key_dup(&dirent2->ckey, &dirent->ckey);
(gdb) generate-core-file
warning: target file /proc/12930/cmdline contained unexpected null characters
Saved corefile core.12930


[Thread 0x7fe8b17fa700 (LWP 23215) exited]
[Thread 0x7fe97c279700 (LWP 23216) exited]
[Thread 0x7fe8b84f6700 (LWP 17607) exited]
[Thread 0x7fe8b81f5700 (LWP 18924) exited]
[Thread 0x7fe997ef7700 (LWP 19341) exited]
[Thread 0x7fe8b0ff9700 (LWP 19342) exited]
[Thread 0x7fe8b0ef8700 (LWP 19343) exited]
[Thread 0x7fe8b0df7700 (LWP 19344) exited]
[Thread 0x7fe8b0cf6700 (LWP 19345) exited]
[Thread 0x7fe8b0bf5700 (LWP 19346) exited]
[Thread 0x7fe9a052a0c0 (LWP 12930) exited]

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.

==========================
>gdb core.12930
[New LWP 12930]

warning: core file may not match specified executable file.
Reading symbols from /usr/lib64/ld-2.17.so...Reading symbols from /usr/lib/debug/usr/lib64/ld-2.17.so.debug...done.
done.
Core was generated by `/usr/bin/ganesha.nfsd'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fe99eb08a82 in ?? ()

======================================
>t a a bt

Thread 141 (LWP 13132):
#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558fbd0 in ?? ()
#3  0x00007fe9a558fba8 in ?? ()
#4  0x0000000000000c0d in print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
---Type <return> to continue, or q <return> to quit---
#5  0x00007fe9a08b0b58 in ?? ()
#6  0x00007fe90178a9c0 in ?? ()
#7  0x00007fe9a558fc70 in ?? ()
#8  0x00007fe9a5565aa0 in ?? ()
#9  0x00007fe9a0624459 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 140 (LWP 13131):
#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558f710 in ?? ()
#3  0x00007fe9a558f6e8 in ?? ()
#4  0x0000000000000c12 in _itoa_word (upper_case=0, base=10, buflim=0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>, value=18446744073709551100)
    at ../sysdeps/generic/_itoa.h:75
#5  print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
#6  0x00007fe9a08b0b58 in ?? ()
#7  0x00007fe901f8b9c0 in ?? ()
#8  0x00007fe9a558f7b0 in ?? ()
#9  0x00007fe9a5565aa0 in ?? ()
#10 0x00007fe9a0624459 in ?? ()
#11 0x0000000000000000 in ?? ()

Thread 139 (LWP 13130):
#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558f250 in ?? ()
#3  0x00007fe9a558f228 in ?? ()
#4  0x0000000000000c0e in print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
#5  0x00007fe9a08b0b58 in ?? ()
#6  0x00007fe90278c9c0 in ?? ()
#7  0x00007fe9a558f2f0 in ?? ()
#8  0x00007fe9a5565aa0 in ?? ()
#9  0x00007fe9a0624459 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 138 (LWP 13129):
#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558ed90 in ?? ()
#3  0x00007fe9a558ed68 in ?? ()
#4  0x0000000000000c0d in print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
#5  0x00007fe9a08b0b58 in ?? ()
#6  0x00007fe902f8d9c0 in ?? ()
#7  0x00007fe9a558ee30 in ?? ()
#8  0x00007fe9a5565aa0 in ?? ()
#9  0x00007fe9a0624459 in ?? ()
#10 0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---

Thread 137 (LWP 13128):
#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558e8d0 in ?? ()
#3  0x00007fe9a558e8a8 in ?? ()
#4  0x0000000000000c0d in print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
#5  0x00007fe9a08b0b58 in ?? ()
#6  0x00007fe90378e9c0 in ?? ()
#7  0x00007fe9a558e970 in ?? ()
#8  0x00007fe9a5565aa0 in ?? ()
#9  0x00007fe9a0624459 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 136 (LWP 13127):
#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558e410 in ?? ()
#3  0x00007fe9a558e3e8 in ?? ()
#4  0x0000000000000c11 in _itoa_word (upper_case=0, base=10, buflim=0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>, value=18446744073709551100)
    at ../sysdeps/generic/_itoa.h:75
#5  print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
#6  0x00007fe9a08b0b58 in ?? ()
#7  0x00007fe903f8f9c0 in ?? ()
#8  0x00007fe9a558e4b0 in ?? ()
#9  0x00007fe9a5565aa0 in ?? ()
#10 0x00007fe9a0624459 in ?? ()
#11 0x0000000000000000 in ?? ()

Thread 135 (LWP 13126):
#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558df50 in ?? ()
#3  0x00007fe9a558df28 in ?? ()
#4  0x0000000000000c0e in print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
#5  0x00007fe9a08b0b58 in ?? ()
#6  0x00007fe9047909c0 in ?? ()
#7  0x00007fe9a558dff0 in ?? ()
#8  0x00007fe9a5565aa0 in ?? ()
#9  0x00007fe9a0624459 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 134 (LWP 13125):
#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558da90 in ?? ()
#3  0x00007fe9a558da68 in ?? ()
---Type <return> to continue, or q <return> to quit---
#4  0x0000000000000c10 in _itoa_word (upper_case=0, base=10, buflim=0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>, value=18446744073709551100)
    at ../sysdeps/generic/_itoa.h:75
#5  print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
#6  0x00007fe9a08b0b58 in ?? ()
#7  0x00007fe904f919c0 in ?? ()
#8  0x00007fe9a558db30 in ?? ()
#9  0x00007fe9a5565aa0 in ?? ()
#10 0x00007fe9a0624459 in ?? ()
#11 0x0000000000000000 in ?? ()

Thread 133 (LWP 13124):
#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558d5d0 in ?? ()
#3  0x00007fe9a558d5a8 in ?? ()
#4  0x0000000000000c0c in print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
#5  0x00007fe9a08b0b58 in ?? ()
#6  0x00007fe9057929c0 in ?? ()
#7  0x00007fe9a558d670 in ?? ()
#8  0x00007fe9a5565aa0 in ?? ()
#9  0x00007fe9a0624459 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 132 (LWP 13123):
#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558d110 in ?? ()
#3  0x00007fe9a558d0e8 in ?? ()
#4  0x0000000000000c15 in _itoa_word (upper_case=0, base=10, buflim=0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>, value=18446744073709551100)
    at ../sysdeps/generic/_itoa.h:75
#5  print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
#6  0x00007fe9a08b0b58 in ?? ()
#7  0x00007fe905f939c0 in ?? ()
#8  0x00007fe9a558d1b0 in ?? ()
#9  0x00007fe9a5565aa0 in ?? ()
#10 0x00007fe9a0624459 in ?? ()
#11 0x0000000000000000 in ?? ()

Thread 131 (LWP 13122):
#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558cc50 in ?? ()
#3  0x00007fe9a558cc28 in ?? ()
#4  0x0000000000000c0e in print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
#5  0x00007fe9a08b0b58 in ?? ()
#6  0x00007fe9067949c0 in ?? ()
#7  0x00007fe9a558ccf0 in ?? ()
---Type <return> to continue, or q <return> to quit---
#8  0x00007fe9a5565aa0 in ?? ()
#9  0x00007fe9a0624459 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 130 (LWP 13121):

#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558c790 in ?? ()
#3  0x00007fe9a558c768 in ?? ()
#4  0x0000000000000c0b in print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
#5  0x00007fe9a08b0b58 in ?? ()
#6  0x00007fe906f959c0 in ?? ()
#7  0x00007fe9a558c830 in ?? ()
#8  0x00007fe9a5565aa0 in ?? ()
#9  0x00007fe9a0624459 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 129 (LWP 13120):
#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558c2d0 in ?? ()
#3  0x00007fe9a558c2a8 in ?? ()
#4  0x0000000000000c0d in print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
#5  0x00007fe9a08b0b58 in ?? ()
#6  0x00007fe9077969c0 in ?? ()
#7  0x00007fe9a558c370 in ?? ()
#8  0x00007fe9a5565aa0 in ?? ()
#9  0x00007fe9a0624459 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 128 (LWP 13119):
#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558be10 in ?? ()
#3  0x00007fe9a558bde8 in ?? ()
#4  0x0000000000000c11 in _itoa_word (upper_case=0, base=10, buflim=0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>, value=18446744073709551100)
    at ../sysdeps/generic/_itoa.h:75
#5  print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
#6  0x00007fe9a08b0b58 in ?? ()
#7  0x00007fe907f979c0 in ?? ()
#8  0x00007fe9a558beb0 in ?? ()
#9  0x00007fe9a5565aa0 in ?? ()
#10 0x00007fe9a0624459 in ?? ()
#11 0x0000000000000000 in ?? ()

Thread 127 (LWP 13118):
#0  0x00007fe99eb08a82 in ?? ()
---Type <return> to continue, or q <return> to quit---
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558b950 in ?? ()
#3  0x00007fe9a558b928 in ?? ()
#4  0x0000000000000c0f in print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
#5  0x00007fe9a08b0b58 in ?? ()
#6  0x00007fe9087989c0 in ?? ()
#7  0x00007fe9a558b9f0 in ?? ()
#8  0x00007fe9a5565aa0 in ?? ()
#9  0x00007fe9a0624459 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 126 (LWP 13117):
#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558b490 in ?? ()
#3  0x00007fe9a558b468 in ?? ()
#4  0x0000000000000c14 in _itoa_word (upper_case=0, base=10, buflim=0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>, value=18446744073709551100)
    at ../sysdeps/generic/_itoa.h:75
#5  print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
#6  0x00007fe9a08b0b58 in ?? ()
#7  0x00007fe908f999c0 in ?? ()
#8  0x00007fe9a558b530 in ?? ()
#9  0x00007fe9a5565aa0 in ?? ()
#10 0x00007fe9a0624459 in ?? ()
#11 0x0000000000000000 in ?? ()

Thread 125 (LWP 13116):
#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558afd0 in ?? ()
#3  0x00007fe9a558afa8 in ?? ()
#4  0x0000000000000c0c in print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
#5  0x00007fe9a08b0b58 in ?? ()
#6  0x00007fe90979a9c0 in ?? ()
#7  0x00007fe9a558b070 in ?? ()
#8  0x00007fe9a5565aa0 in ?? ()
#9  0x00007fe9a0624459 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 124 (LWP 13115):
#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558ab10 in ?? ()
#3  0x00007fe9a558aae8 in ?? ()
#4  0x0000000000000c12 in _itoa_word (upper_case=0, base=10, buflim=0xffffffffffffffff <Address 0xffffffffffffffff out of bounds>, value=18446744073709551100)
    at ../sysdeps/generic/_itoa.h:75
#5  print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
---Type <return> to continue, or q <return> to quit---
#6  0x00007fe9a08b0b58 in ?? ()
#7  0x00007fe909f9b9c0 in ?? ()
#8  0x00007fe9a558abb0 in ?? ()
#9  0x00007fe9a5565aa0 in ?? ()
#10 0x00007fe9a0624459 in ?? ()
#11 0x0000000000000000 in ?? ()

Thread 123 (LWP 13114):
#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558a650 in ?? ()
#3  0x00007fe9a558a628 in ?? ()
#4  0x0000000000000c0c in print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
#5  0x00007fe9a08b0b58 in ?? ()
#6  0x00007fe90a79c9c0 in ?? ()
#7  0x00007fe9a558a6f0 in ?? ()
#8  0x00007fe9a5565aa0 in ?? ()
#9  0x00007fe9a0624459 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 122 (LWP 13113):
#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a558a190 in ?? ()
#3  0x00007fe9a558a168 in ?? ()
#4  0x0000000000000c0e in print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
#5  0x00007fe9a08b0b58 in ?? ()
#6  0x00007fe90af9d9c0 in ?? ()
#7  0x00007fe9a558a230 in ?? ()
#8  0x00007fe9a5565aa0 in ?? ()
#9  0x00007fe9a0624459 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 121 (LWP 13112):
#0  0x00007fe99eb08a82 in ?? ()
#1  0x0000000000000002 in ?? ()
#2  0x00007fe9a5589cd0 in ?? ()
#3  0x00007fe9a5589ca8 in ?? ()
#4  0x0000000000000c0c in print_statistics (rtld_total_timep=0x7fe9a08be440) at rtld.c:2721
#5  0x00007fe9a08b0b58 in ?? ()
#6  0x00007fe90b79e9c0 in ?? ()
#7  0x00007fe9a5589d70 in ?? ()
#8  0x00007fe9a5565aa0 in ?? ()
#9  0x00007fe9a0624459 in ?? ()
#10 0x0000000000000000 in ?? ()

Comment 3 Soumya Koduri 2017-01-06 11:26:44 UTC
(gdb) thread 243
[Switching to thread 243 (Thread 0x7fe8ce724700 (LWP 13235))]
#0  mdcache_key_dup (src=0x28, tgt=0x7fe93c040808) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_int.h:455
455		tgt->kv.len = src->kv.len;
(gdb) bt
#0  mdcache_key_dup (src=0x28, tgt=0x7fe93c040808) at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_int.h:455
#1  mdcache_dirent_rename (parent=parent@entry=0x7fe968022ad0, oldname=oldname@entry=0x7fe93c02b9c0 "libglusterfs_la-dict.loT", newname=newname@entry=0x7fe93c018580 "libglusterfs_la-dict.lo")
    at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1364
#2  0x00007fe9a064828a in mdcache_rename (obj_hdl=0x7fe8b4084f88, olddir_hdl=0x7fe968022b08, old_name=0x7fe93c02b9c0 "libglusterfs_la-dict.loT", newdir_hdl=0x7fe968022b08, new_name=0x7fe93c018580 "libglusterfs_la-dict.lo")
    at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:844
#3  0x00007fe9a0581099 in fsal_rename (dir_src=dir_src@entry=0x7fe968022b08, oldname=0x7fe93c02b9c0 "libglusterfs_la-dict.loT", dir_dest=dir_dest@entry=0x7fe968022b08, newname=0x7fe93c018580 "libglusterfs_la-dict.lo")
    at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_helper.c:1666
#4  0x00007fe9a05bc6c2 in nfs4_op_rename (op=<optimized out>, data=<optimized out>, resp=0x7fe93c03a0b0) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_rename.c:115
#5  0x00007fe9a05a7fcd in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7fe93c03e830) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_Compound.c:734
#6  0x00007fe9a059917c in nfs_rpc_execute (reqdata=reqdata@entry=0x7fe8a810ebc0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1281
#7  0x00007fe9a059a7da in worker_run (ctx=0x7fe9a55ad480) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548
#8  0x00007fe9a0624459 in fridgethr_start_routine (arg=0x7fe9a55ad480) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550
#9  0x00007fe99eb04dc5 in start_thread (arg=0x7fe8ce724700) at pthread_create.c:308
#10 0x00007fe99e1d373d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) f 1
#1  mdcache_dirent_rename (parent=parent@entry=0x7fe968022ad0, oldname=oldname@entry=0x7fe93c02b9c0 "libglusterfs_la-dict.loT", newname=newname@entry=0x7fe93c018580 "libglusterfs_la-dict.lo")
    at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1364
1364		mdcache_key_dup(&dirent2->ckey, &dirent->ckey);
(gdb) l
1359	
1360		/* try to rename--no longer in-place */
1361		dirent2 = gsh_calloc(1, sizeof(mdcache_dir_entry_t) + newnamesize);
1362		memcpy(dirent2->name, newname, newnamesize);
1363		dirent2->flags = DIR_ENTRY_FLAG_NONE;
1364		mdcache_key_dup(&dirent2->ckey, &dirent->ckey);
1365	
1366		/* Delete the entry for oldname */
1367		avl_dirent_set_deleted(parent, dirent);
1368	
(gdb) p &dirent2->ckey
$1 = (mdcache_key_t *) 0x7fe93c040808
(gdb) p &dirent->ckey
$2 = (mdcache_key_t *) 0x28

(gdb) p/x parent->mde_flags
$8 = 0x4


In mdcache_dirent_rename()->mdcache_dirent_find()->

1190         dirent = mdcache_avl_qp_lookup_s(dir, name, 1);
1191         if (!dirent) {
1192                 if (mdc_dircache_trusted(dir))
1193                         return fsalstat(ERR_FSAL_NOENT, 0);
1194 
1195                 return fsalstat(ERR_FSAL_NO_ERROR, 0);
1196         }

Here there seem to be cases where we return success even when dirent is NULL, which may get dereferenced at later point. This is what happened in this testcase as well.

Comment 4 Arthy Loganathan 2017-01-06 11:35:19 UTC
sosreports and core are at, http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1410741/

Comment 5 Daniel Gryniewicz 2017-01-09 14:19:18 UTC
Attempted fix: https://review.gerrithub.io/342151

Comment 6 Prasad Desala 2017-01-10 10:04:50 UTC
Observed the same issue on glusterfs version 3.8.4-11.el7rhgs.x86_64.

Below are the steps,
1) Created ganesha cluster and created a distributed-replicate volume.
2) Enabled nfs-ganesha on the volume with mdcache settings.
3) Mounted the volume on multiple clients.
4) As a root user ran the below command from mount point to clone pjd-fstest
	git://git.code.sf.net/p/ntfs-3g/pjd-fstest
5) Started continuous lookups from few clients and ran rename test suite.

while rename test suite is in-progress, one of the node observed SEGV and nfs-ganesha proccess got killed.

Jan 10 15:15:10 dhcp46-42 systemd: nfs-ganesha.service: main process exited, code=killed, status=11/SEGV

Comment 9 Arthy Loganathan 2017-01-20 11:51:04 UTC
Nfs ganesha service got crashed and glusterfs compilation failed in nfs-ganesha mount with the latest ganesha packages.

nfs-ganesha-gluster-2.4.1-6.el7rhgs.x86_64
nfs-ganesha-2.4.1-6.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-12.el7rhgs.x86_64


Crash:
======

(gdb) bt
#0  mdcache_dirent_rename (parent=parent@entry=0x7f8b0c0244b0, oldname=oldname@entry=0x7f8a44050100 "nlm4-xdr.tmp", newname=newname@entry=0x7f8a44001f60 "nlm4-xdr.c")
    at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:1366
#1  0x00007f8b773dc7ce in mdcache_rename (obj_hdl=0x7f8b0805b148, olddir_hdl=0x7f8b0c0244e8, old_name=0x7f8a44050100 "nlm4-xdr.tmp", newdir_hdl=0x7f8b0c0244e8, 
    new_name=0x7f8a44001f60 "nlm4-xdr.c") at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:848
#2  0x00007f8b77315059 in fsal_rename (dir_src=dir_src@entry=0x7f8b0c0244e8, oldname=0x7f8a44050100 "nlm4-xdr.tmp", dir_dest=dir_dest@entry=0x7f8b0c0244e8, 
    newname=0x7f8a44001f60 "nlm4-xdr.c") at /usr/src/debug/nfs-ganesha-2.4.1/src/FSAL/fsal_helper.c:1666
#3  0x00007f8b77350682 in nfs4_op_rename (op=<optimized out>, data=<optimized out>, resp=0x7f8a440178c0) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_op_rename.c:115
#4  0x00007f8b7733bf8d in nfs4_Compound (arg=<optimized out>, req=<optimized out>, res=0x7f8a440344e0) at /usr/src/debug/nfs-ganesha-2.4.1/src/Protocols/NFS/nfs4_Compound.c:734
#5  0x00007f8b7732d13c in nfs_rpc_execute (reqdata=reqdata@entry=0x7f8a4c3644c0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1281
#6  0x00007f8b7732e79a in worker_run (ctx=0x7f8b7b5a67f0) at /usr/src/debug/nfs-ganesha-2.4.1/src/MainNFSD/nfs_worker_thread.c:1548
#7  0x00007f8b773b8409 in fridgethr_start_routine (arg=0x7f8b7b5a67f0) at /usr/src/debug/nfs-ganesha-2.4.1/src/support/fridgethr.c:550
#8  0x00007f8b75898dc5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f8b74f6773d in clone () from /lib64/libc.so.6

Comment 10 Daniel Gryniewicz 2017-01-20 14:02:54 UTC
Better proposed fix, replacing the other one:

https://review.gerrithub.io/343969

Comment 11 Arthy Loganathan 2017-02-06 10:12:15 UTC
glusterfs compilation passed with nfs-ganesha mount with the latest ganesha packages.

===========================TESTS RUNNING===========================
Changing to the specified mountpoint
/mnt/test_ec/run26376
executing glusterfs_build

real	10m1.133s
user	0m20.792s
sys	0m7.967s
running autogen.sh:14:29:33

real	2m39.542s
user	0m34.883s
sys	0m4.737s
running configure:14:32:13

real	7m12.215s
user	0m17.790s
sys	0m29.909s
running make:14:39:25

real	10m12.244s
user	6m17.635s
sys	1m16.319s
all successful:14:49:37
glusterfs directory removed
1
Total 1 tests were successful
Switching over to the previous working directory
Removing /mnt/test_ec//run26376/


Verified the fix in build,
nfs-ganesha-gluster-2.4.1-7.el7rhgs.x86_64
nfs-ganesha-2.4.1-7.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-13.el7rhgs.x86_64

Comment 13 errata-xmlrpc 2017-03-23 06:28:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2017-0493.html