Bug 1226253 - gluster volume heal info crashes
Summary: gluster volume heal info crashes
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-05-29 10:03 UTC by Pranith Kumar K
Modified: 2016-06-16 13:06 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.8rc2
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-06-16 13:06:19 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Pranith Kumar K 2015-05-29 10:03:32 UTC
Description of problem:

Mail thread from Alessandro who reported the problem on gluster-users:

On 05/29/2015 03:16 PM, Alessandro De Salvo wrote:
> Hi Pranith,
> I’m definitely sure the log is correct, but you are also correct when you say there is no sign of crash (even checking with grep!).
> However I see core dumps (e.g. core.19430) in /var/log/gluster) created every time I issue the heal info command.
> From gdb I see this:
Thanks for providing the information Alessandro. We will fix this issue. I am wondering how we can unblock you in the interim. There is a plan to release 3.7.1 in 2-3 days I think. I can try to make this fix for that release. Let me know if you can wait that long? Another possibility is to compile just glfsheal binary with the fix which "gluster volume heal <volname> info" internally. Let me know.

Pranith.
>
>
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-64.el7
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /usr/sbin/glfsheal...Reading symbols from /usr/lib/debug/usr/sbin/glfsheal.debug...done.
> done.
> [New LWP 19430]
> [New LWP 19431]
> [New LWP 19434]
> [New LWP 19436]
> [New LWP 19433]
> [New LWP 19437]
> [New LWP 19432]
> [New LWP 19435]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `/usr/sbin/glfsheal adsnet-vm-01'.
> Program terminated with signal 11, Segmentation fault.
> #0  inode_unref (inode=0x7f7a1e27806c) at inode.c:499
> 499             table = inode->table;
> (gdb) bt
> #0  inode_unref (inode=0x7f7a1e27806c) at inode.c:499
> #1  0x00007f7a265e8a61 in fini (this=<optimized out>) at qemu-block.c:1092
> #2  0x00007f7a39a53791 in xlator_fini_rec (xl=0x7f7a2000b9a0) at xlator.c:463
> #3  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a2000d450) at xlator.c:453
> #4  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a2000e800) at xlator.c:453
> #5  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a2000fbb0) at xlator.c:453
> #6  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20010f80) at xlator.c:453
> #7  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20012330) at xlator.c:453
> #8  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a200136e0) at xlator.c:453
> #9  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20014b30) at xlator.c:453
> #10 0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20015fc0) at xlator.c:453
> #11 0x00007f7a39a54eea in xlator_tree_fini (xl=<optimized out>) at xlator.c:545
> #12 0x00007f7a39a90b25 in glusterfs_graph_deactivate (graph=<optimized out>) at graph.c:340
> #13 0x00007f7a38d50e3c in pub_glfs_fini (fs=fs@entry=0x7f7a3a6b6010) at glfs.c:1155
> #14 0x00007f7a39f18ed4 in main (argc=<optimized out>, argv=<optimized out>) at glfs-heal.c:821
>
>
> Thanks,
>
> Alessandro
>
>> Il giorno 29/mag/2015, alle ore 11:12, Pranith Kumar Karampuri <pkarampu> ha scritto:
>>
>>
>>
>> On 05/29/2015 02:37 PM, Alessandro De Salvo wrote:
>>> Hi Pranith,
>>> many thanks for the help!
>>> The volume info of the problematic volume is the following:
>>>
>>> # gluster volume info adsnet-vm-01
>>>  
>>> Volume Name: adsnet-vm-01
>>> Type: Replicate
>>> Volume ID: f8f615df-3dde-4ea6-9bdb-29a1706e864c
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: gwads02.sta.adsnet.it:/gluster/vm01/data
>>> Brick2: gwads03.sta.adsnet.it:/gluster/vm01/data
>>> Options Reconfigured:
>>> nfs.disable: true
>>> features.barrier: disable
>>> features.file-snapshot: on
>>> server.allow-insecure: on
>> Are you sure the attached log is correct? I do not see any backtrace in the log file to indicate there is a crash :-(. Could you do "grep -i crash /var/log/glusterfs/*" to see if there is some other file with the crash. If that also fails, will it be possible for you to provide the backtrace of the core by opening it using gdb?
>>
>> Pranith
>>>
>>> The log is in attachment.
>>> I just wanted to add that the heal info command works fine on other volumes hosted by the same machines, so it’s just this volume which is causing problems.
>>> Thanks,
>>>
>>> Alessandro
>>>
>>>
>>>
>>>
>>>> Il giorno 29/mag/2015, alle ore 10:50, Pranith Kumar Karampuri <pkarampu> ha scritto:
>>>>
>>>>
>>>>
>>>> On 05/29/2015 02:18 PM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>> On 05/29/2015 02:13 PM, Alessandro De Salvo wrote:
>>>>>> Hi,
>>>>>> I'm facing a strange issue with split brain reporting.
>>>>>> I have upgraded to 3.7.0, after stopping all gluster processes as described in the twiki, on all servers hosting the volumes. The upgrade and the restart was fine, and the volumes are accessible.
>>>>>> However I had two files in split brain that I did not heal before upgrading, so I tried a full heal with 3.7.0. The heal was launched correctly, but when I now perform an heal info there is no output, while the heal statistics says there are actually 2 files in split brain. In the logs I see something like this:
>>>>>>
>>>>>> glustershd.log:
>>>>>> [2015-05-29 08:28:43.008373] I [afr-self-heal-entry.c:558:afr_selfheal_entry_do] 0-adsnet-gluster-01-replicate-0: performing entry selfheal on 7fd1262d-949b-402e-96c2-ae487c8d4e27
>>>>>> [2015-05-29 08:28:43.012690] W [client-rpc-fops.c:241:client3_3_mknod_cbk] 0-adsnet-gluster-01-client-1: remote operation failed: Invalid argument. Path: (null)
>>>>> Hey could you let us know "gluster volume info" output? Please let us know the backtrace printed by /var/log/glusterfs/glfsheal-<volname>.log as well.
>>>> Please attach /var/log/glusterfs/glfsheal-<volname>.log file to this thread so that I can take a look.
>>>>
>>>> Pranith
>>>>>
>>>>> Pranith
>>>>>>
>>>>>>
>>>>>> So, it seems like the files to be healed are not correctly identified, or at least their path is null.
>>>>>> Also, every time I issue a "gluster volume heal <volname> info" a core dump is generated in the log area.
>>>>>> All servers are using the latest CentOS 7.
>>>>>> Any idea why this might be happening and how to solve it?
>>>>>> Thanks,
>>>>>>
>>>>>>    Alessandro
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>



_______________________________________________
Gluster-users mailing list
Gluster-users
http://www.gluster.org/mailman/listinfo/gluster-users



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Anand Avati 2015-05-29 10:06:17 UTC
REVIEW: http://review.gluster.org/11001 (heal: Do not call glfs_fini in final builds) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 2 Anand Avati 2015-05-29 10:29:37 UTC
REVIEW: http://review.gluster.org/11001 (heal: Do not call glfs_fini in final builds) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Niels de Vos 2016-06-16 13:06:19 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.