Bug 1656316

Summary: Gluster volume statedump inode Segmentation Fault (core dumped)
Product: [Community] GlusterFS Reporter: emanuel.ocone
Component: cliAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.1CC: amukherj, bugs, emanuel.ocone, rhs-bugs, sankarshan, storage-qa-internal, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-19 11:18:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Core Dump statedump inode none

Description emanuel.ocone 2018-12-05 08:57:31 UTC
Created attachment 1511592 [details]
Core Dump statedump inode

Description of problem:
I've a three node cluster Gluster with three volumes per node.
My node are VPS on KVM Proxmox, with 8GB Ram and 4 vcpu.
My volumes are 21G the first one, and 1.5G the other two.

The problem born with the following error in my log files:

```
[2018-12-05 07:59:40.844949] W [glusterd-locks.c:622:glusterd_mgmt_v3_lock] (-->/usr/lib64/glusterfs/4.1.1/xlator/mgmt/glusterd.so(+0x4214f) [0x7fe6b26ab14f] -->/usr/lib64/glusterfs/4.1.1/xlator/mgmt/glusterd.so(+0x32263) [0x7fe6b269b263] -->/usr/lib64/glusterfs/4.1.1/xlator/mgmt/glusterd.so(+0xe7d3d) [0x7fe6b2750d3d] ) 0-management: Lock for tmp_webfiles_1 held by 4c7fa287-b631-47b5-84b7-d553bfdf3a5d
[2018-12-05 07:59:40.845016] E [MSGID: 106118] [glusterd-op-sm.c:4173:glusterd_op_ac_lock] 0-management: Unable to acquire lock for tmp_webfiles_1
[2018-12-05 07:59:40.845070] E [MSGID: 106376] [glusterd-op-sm.c:8305:glusterd_op_sm] 0-management: handler returned: -1
[2018-12-05 07:59:40.957017] W [glusterd-locks.c:856:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/4.1.1/xlator/mgmt/glusterd.so(+0x4214f) [0x7fe6b26ab14f] -->/usr/lib64/glusterfs/4.1.1/xlator/mgmt/glusterd.so(+0x31f50) [0x7fe6b269af50] -->/usr/lib64/glusterfs/4.1.1/xlator/mgmt/glusterd.so(+0xe835c) [0x7fe6b275135c] ) 0-management: Lock owner mismatch. Lock for vol tmp_webfiles_1 held by 4c7fa287-b631-47b5-84b7-d553bfdf3a5d
[2018-12-05 07:59:40.845185] E [MSGID: 106115] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Locking failed on node1. Please check log file for details.
[2018-12-05 07:59:40.957072] E [MSGID: 106117] [glusterd-op-sm.c:4236:glusterd_op_ac_unlock] 0-management: Unable to release lock for tmp_webfiles_1
[2018-12-05 07:59:40.957156] E [MSGID: 106376] [glusterd-op-sm.c:8305:glusterd_op_sm] 0-management: handler returned: 1
[2018-12-05 07:59:41.063427] E [MSGID: 106115] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Locking failed on node2. Please check log file for details.
[2018-12-05 07:59:41.063561] E [MSGID: 106150] [glusterd-syncop.c:1957:gd_sync_task_begin] 0-management: Locking Peers Failed.
[2018-12-05 07:59:41.064893] E [MSGID: 106115] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on node1. Please check log file for details.
[2018-12-05 07:59:41.066582] E [MSGID: 106115] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on node2. Please check log file for details.
[2018-12-05 07:59:41.066704] E [MSGID: 106151] [glusterd-syncop.c:1640:gd_unlock_op_phase] 0-management: Failed to unlock on some peer(s)
```
When I try to get the lock informantion with the command, I get this output:
```
gluster volume statedump  inode
Segmentation fault (core dumped)
```
In attachment the core dump

Version-Release number of selected component (if applicable):
Here's the Gluster packes I've Installed:
```
centos-release-gluster41-1.0-1.el7.centos.x86_64
glusterfs-4.1.1-1.el7.x86_64
glusterfs-fuse-4.1.1-1.el7.x86_64
glusterfs-api-4.1.1-1.el7.x86_64
glusterfs-libs-4.1.1-1.el7.x86_64
glusterfs-cli-4.1.1-1.el7.x86_64
glusterfs-server-4.1.1-1.el7.x86_64
glusterfs-client-xlators-4.1.1-1.el7.x86_64
```

How reproducible:
gluster volume statedump  inode

Steps to Reproduce:
1.
2.
3.

Actual results:
Segmentation fault (core dumped)

Expected results:
Volume statedump successful

Additional info:

Comment 2 Atin Mukherjee 2018-12-06 02:24:29 UTC
Can you please provide the complete backtrace (thread apply all bt) of the core from gdb session?

Comment 3 emanuel.ocone 2018-12-06 07:31:54 UTC
(In reply to Atin Mukherjee from comment #2)
> Can you please provide the complete backtrace (thread apply all bt) of the
> core from gdb session?

Hi Atin,
How can I get the complete backtrace?

Comment 4 emanuel.ocone 2018-12-10 11:34:30 UTC
Hi Atin,
Can you please explain how to get the complete backetrace?
We're in production enviroment and need to resolv this problem.

Regards

Comment 5 Atin Mukherjee 2018-12-10 12:45:34 UTC
cat /proc/sys/kernel/core_pattern

This will let you know the location of the core file. 

And then do the following to get to the gdb session (you may have to install gdb and glusterfs-debuginfo package for the interim)

gdb glusterd <core file with path> 

t a a bt

Comment 6 emanuel.ocone 2018-12-10 13:40:42 UTC
The Core is already uploaded as attachment (is was created in the working folder with the pid of the processes ended), 
here's the command's output;

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.

warning: core file may not match specified executable file.
[New LWP 28964]
[New LWP 28962]
[New LWP 28961]
[New LWP 28965]
[New LWP 28960]
Core was generated by `gluster volume statedump inode'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f6a89786f01 in ?? ()

Comment 7 emanuel.ocone 2018-12-13 11:39:50 UTC
Any news?

Comment 8 emanuel.ocone 2018-12-14 08:15:38 UTC
Hi,
I've rerunned the gdb command, I didn't noticed that the command executable was glusterd instead of gluster.
Here's the new output:

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/gluster...Reading symbols from /usr/lib/debug/usr/sbin/gluster.debug...done.
done.
[New LWP 4052]
[New LWP 4049]
[New LWP 4053]
[New LWP 4048]
[New LWP 4050]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `gluster volume statedump gv0'.
Program terminated with signal 11, Segmentation fault.
#0  __strstr_sse2 (haystack_start=0x0, needle_start=0x55c2d1b1063b "nfs") at ../string/strstr.c:63
63	  while (*haystack && *needle)

Comment 9 emanuel.ocone 2018-12-14 08:40:14 UTC
Hi,
I've rerunned the gdb command, I didn't noticed that the command executable was glusterd instead of gluster.
Here's the new output:

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/gluster...Reading symbols from /usr/lib/debug/usr/sbin/gluster.debug...done.
done.
[New LWP 4052]
[New LWP 4049]
[New LWP 4053]
[New LWP 4048]
[New LWP 4050]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `gluster volume statedump gv0'.
Program terminated with signal 11, Segmentation fault.
#0  __strstr_sse2 (haystack_start=0x0, needle_start=0x55c2d1b1063b "nfs") at ../string/strstr.c:63
63	  while (*haystack && *needle)

Comment 10 Atin Mukherjee 2018-12-17 09:51:22 UTC
Unfortunately I don't have a centos machine at this moment to debug this crash and this is why I was requesting you to install glusterfs-debuginfo package and then do the following:

gdb gluster core.28960
(gdb) thread apply all bt

The backtrace you're pointing is basically a strstr call, but we need to track the gluster functions to see what's wrong here. From the core we can definitely figure out that this is a crash in cli.

Comment 11 emanuel.ocone 2018-12-17 13:57:02 UTC
I've already installed glusterfs-debuginfo:
rpm -qa |grep glusterfs
glusterfs-4.1.1-1.el7.x86_64
glusterfs-fuse-4.1.1-1.el7.x86_64
glusterfs-debuginfo-4.1.1-1.el7.x86_64
glusterfs-api-4.1.1-1.el7.x86_64
glusterfs-libs-4.1.1-1.el7.x86_64
glusterfs-cli-4.1.1-1.el7.x86_64
glusterfs-server-4.1.1-1.el7.x86_64
glusterfs-client-xlators-4.1.1-1.el7.x86_64


I've launched the thread apply all bt, and this is the complete output:

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/gluster...Reading symbols from /usr/lib/debug/usr/sbin/gluster.debug...done.
done.
[New LWP 28964]
[New LWP 28962]
[New LWP 28961]
[New LWP 28965]
[New LWP 28960]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `gluster volume statedump inode'.
Program terminated with signal 11, Segmentation fault.
#0  __strstr_sse2 (haystack_start=0x0, needle_start=0x557e3786063b "nfs") at ../string/strstr.c:63
63	  while (*haystack && *needle)
(gdb) thread apply all bt

Thread 5 (Thread 0x7f6a8bfab780 (LWP 28960)):
#0  0x00007f6a89f18f47 in pthread_join (threadid=140095378020096, thread_return=thread_return@entry=0x0) at pthread_join.c:90
#1  0x00007f6a8bb186c8 in event_dispatch_epoll (event_pool=0x557e391903e0) at event-epoll.c:750
#2  0x0000557e3780fa53 in main (argc=<optimized out>, argv=0x7fff400439b8) at cli.c:790

Thread 4 (Thread 0x7f6a7f3dd700 (LWP 28965)):
#0  0x00007f6a897e0483 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f6a8bb17f62 in event_dispatch_epoll_worker (data=0x557e391d6370) at event-epoll.c:653
#2  0x00007f6a89f17dd5 in start_thread (arg=0x7f6a7f3dd700) at pthread_create.c:307
#3  0x00007f6a897dfead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 3 (Thread 0x7f6a894bb700 (LWP 28961)):
#0  0x00007f6a897a6e2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f6a897a6cc4 in __sleep (seconds=0, seconds@entry=30) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2  0x00007f6a8badf75d in pool_sweeper (arg=<optimized out>) at mem-pool.c:470
#3  0x00007f6a89f17dd5 in start_thread (arg=0x7f6a894bb700) at pthread_create.c:307
#4  0x00007f6a897dfead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 2 (Thread 0x7f6a803df700 (LWP 28962)):
#0  0x00007f6a89f1ee3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f6a8bac2bf6 in gf_timer_proc (data=0x557e391d29b0) at timer.c:202
#2  0x00007f6a89f17dd5 in start_thread (arg=0x7f6a803df700) at pthread_create.c:307
#3  0x00007f6a897dfead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 1 (Thread 0x7f6a7fbde700 (LWP 28964)):
#0  __strstr_sse2 (haystack_start=0x0, needle_start=0x557e3786063b "nfs") at ../string/strstr.c:63
#1  0x0000557e37845eb4 in cli_cmd_volume_statedump_options_parse (words=words@entry=0x7fff400439c0, wordcount=<optimized out>, options=options@entry=0x7f6a7fbdde20) at cli-cmd-parser.c:3604
#2  0x0000557e37814c97 in cli_cmd_volume_statedump_cbk (state=<optimized out>, word=0x557e391d50e0, words=0x7fff400439c0, wordcount=<optimized out>) at cli-cmd-volume.c:2984
#3  0x0000557e378119a3 in cli_cmd_process (state=0x7fff400437c0, argc=3, argv=0x7fff400439c0) at cli-cmd.c:135
#4  0x0000557e37811320 in cli_batch (d=<optimized out>) at input.c:29
#5  0x00007f6a89f17dd5 in start_thread (arg=0x7f6a7fbde700) at pthread_create.c:307
#6  0x00007f6a897dfead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)

Comment 12 Atin Mukherjee 2018-12-19 02:38:59 UTC
This is fixed through https://review.gluster.org/#/c/glusterfs/+/20867/ which was one of the several fixes we did as part of coverity defect fixes.

The fix is already available in the release 5 branch but not in release 4 which you're using. Would you mind upgrading to the latest version of Gluster?

Comment 13 emanuel.ocone 2018-12-19 07:29:01 UTC
We have installed gluster via yum repo:
[centos-gluster41]
name=CentOS-$releasever - Gluster 4.1 (Long Term Maintanance)
baseurl=http://mirror.centos.org/centos/$releasever/storage/$basearch/gluster-4.1/
gpgcheck=1
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Storage

Is this patch already available via this repo?
The latest version available is 4.1.6-1

In case I need to do a manual upgrade, where can I find a procedure? 
We're in production environment, I would like to broke anything

Comment 14 Atin Mukherjee 2018-12-19 09:07:00 UTC
No, the fix is only available from glusterfs-5 release onwards.

If you're looking for upgrade document please refer https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/ . Even though it says upgrade to 4.1, the same steps would be applicable to upgrade to glusterfs-5.x as well.

Comment 15 emanuel.ocone 2018-12-19 09:27:34 UTC
Ok, 
We will upgrade the server software, but only after the 10 of January 2019,
We're now in release/upgrade freeze.

I think we could close the ticket now.

Regards