Description of problem: ====================== Seen the quotad crash. Not sure of the exact steps being done. The setup is of 8 nodes and 17 volumes. The volume on which the crash is seen is a tiered volume with quota enabled. [root@transformers ~]# gluster v info dpvol Volume Name: dpvol Type: Tier Volume ID: 0bf13965-09df-4f06-a15d-fb16e69c9cf5 Status: Started Number of Bricks: 10 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: tettnang:/rhs/brick2/dpvol-ht4 Brick2: tettnang:/rhs/brick2/dpvol-ht3 Brick3: tettnang:/rhs/brick2/dpvol-ht2 Brick4: tettnang:/rhs/brick2/dpvol-ht1 Cold Tier: Cold Tier Type : Disperse Number of Bricks: 1 x (4 + 2) = 6 Brick5: vertigo:/rhs/brick1/dpvol1 Brick6: ninja:/rhs/brick1/dpvol2 Brick7: rhs-client18:/rhs/brick1/dpvol3 Brick8: rhs-client19:/rhs/brick1/dpvol4 Brick9: transformers:/rhs/brick1/dpvol5 Brick10: interstellar:/rhs/brick1/dpvol6 Options Reconfigured: cluster.tier-mode: cache features.ctr-enabled: on performance.readdir-ahead: on features.uss: on features.quota: on features.inode-quota: on features.quota-deem-statfs: on client.event-threads: 4 server.event-threads: 4 [root@transformers ~]# Version-Release number of selected component (if applicable): ============================================================= 3.7.5-13 [root@transformers ~]# gluster --version glusterfs 3.7.5 built on Dec 22 2015 19:44:25 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. [root@transformers ~]# This is the pid info at the time of core generation. [root@tettnang ~]# ps -ef |grep 2136 root 2136 1 0 Dec24 ? 00:05:05 /usr/sbin/glusterfs --volfile-server /var/run/glusterd.socket --volfile-server-transport unix --volfile-id dpvol -l /var/log/glusterfs/quota-mount-dpvol.log -p /var/run/gluster/dpvol.pid --client-pid -5 /var/run/gluster/dpvol/ How reproducible: ================= Seen twice Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
# gdb /usr/sbin/glusterfs /core.24517 Looks like a infinite loop in creating memory: #0 _gf_msg_nomem (domain=domain@entry=0x7f55a6022697 "", file=file@entry=0x7f55a6022074 "mem-pool.h", function=function@entry=0x7f55a6028b10 <__FUNCTION__.7762> "__gf_default_calloc", line=line@entry=120, level=level@entry=GF_LOG_ALERT, size=size@entry=8) at logging.c:1155 #1 0x00007f55a5fe1130 in __gf_default_calloc (cnt=1, size=8) at mem-pool.h:120 #2 __glusterfs_this_location () at globals.c:141 #3 0x00007f55a5faa265 in _gf_msg_nomem (domain=domain@entry=0x7f55a6022697 "", file=file@entry=0x7f55a6022074 "mem-pool.h", function=function@entry=0x7f55a6028b10 <__FUNCTION__.7762> "__gf_default_calloc", line=line@entry=120, level=level@entry=GF_LOG_ALERT, size=size@entry=8) at logging.c:1167 #4 0x00007f55a5fe1130 in __gf_default_calloc (cnt=1, size=8) at mem-pool.h:120 #5 __glusterfs_this_location () at globals.c:141 #6 0x00007f55a5faa265 in _gf_msg_nomem (domain=domain@entry=0x7f55a6022697 "", file=file@entry=0x7f55a6022074 "mem-pool.h", function=function@entry=0x7f55a6028b10 <__FUNCTION__.7762> "__gf_default_calloc", line=line@entry=120, level=level@entry=GF_LOG_ALERT, size=size@entry=8) at logging.c:1167 #7 0x00007f55a5fe1130 in __gf_default_calloc (cnt=1, size=8) at mem-pool.h:120 #8 __glusterfs_this_location () at globals.c:141 #9 0x00007f55a5faa265 in _gf_msg_nomem (domain=domain@entry=0x7f55a6022697 "", file=file@entry=0x7f55a6022074 "mem-pool.h", function=function@entry=0x7f55a6028b10 <__FUNCTION__.7762> "__gf_default_calloc", line=line@entry=120, level=level@entry=GF_LOG_ALERT, size=size@entry=8) at logging.c:1167 #10 0x00007f55a5fe1130 in __gf_default_calloc (cnt=1, size=8) at mem-pool.h:120 #11 __glusterfs_this_location () at globals.c:141 #12 0x00007f55a5faa265 in _gf_msg_nomem (domain=domain@entry=0x7f55a6022697 "", file=file@entry=0x7f55a6022074 "mem-pool.h", function=function@entry=0x7f55a6028b10 <__FUNCTION__.7762> "__gf_default_calloc", line=line@entry=120, level=level@entry=GF_LOG_ALERT, size=size@entry=8) at logging.c:1167 #13 0x00007f55a5fe1130 in __gf_default_calloc (cnt=1, size=8) at mem-pool.h:120 #14 __glusterfs_this_location () at globals.c:141 #15 0x00007f55a5faa265 in _gf_msg_nomem (domain=domain@entry=0x7f55a6022697 "", file=file@entry=0x7f55a6022074 "mem-pool.h", function=function@entry=0x7f55a6028b10 <__FUNCTION__.7762> "__gf_default_calloc", line=line@entry=120, level=level@entry=GF_LOG_ALERT, size=size@entry=8) at logging.c:1167 #16 0x00007f55a5fe1130 in __gf_default_calloc (cnt=1, size=8) at mem-pool.h:120 #17 __glusterfs_this_location () at globals.c:141 #18 0x00007f55a5faa265 in _gf_msg_nomem (domain=domain@entry=0x7f55a6022697 "", file=file@entry=0x7f55a6022074 "mem-pool.h", function=function@entry=0x7f55a6028b10 <__FUNCTION__.7762> "__gf_default_calloc", line=line@entry=120, level=level@entry=GF_LOG_ALERT, size=size@entry=8) at logging.c:1167 #19 0x00007f55a5fe1130 in __gf_default_calloc (cnt=1, size=8) at mem-pool.h:120 .............. #9639 0x00007f55a5faa265 in _gf_msg_nomem (domain=domain@entry=0x7f55a6022697 "", file=file@entry=0x7f55a6022074 "mem-pool.h", function=function@entry=0x7f55a6028b10 <__FUNCTION__.7762> "__gf_default_calloc", line=line@entry=120, level=level@entry=GF_LOG_ALERT, size=size@entry=8) at logging.c:1167 #9640 0x00007f55a5fe1130 in __gf_default_calloc (cnt=1, size=8) at mem-pool.h:120 #9641 __glusterfs_this_location () at globals.c:141 #9642 0x00007f55a5faa265 in _gf_msg_nomem (domain=domain@entry=0x7f55a6022697 "", file=file@entry=0x7f55a6022074 "mem-pool.h", function=function@entry=0x7f55a6028b10 <__FUNCTION__.7762> "__gf_default_calloc", line=line@entry=120, level=level@entry=GF_LOG_ALERT, size=size@entry=8) at logging.c:1167 #9643 0x00007f55a5fe1130 in __gf_default_calloc (cnt=1, size=8) at mem-pool.h:120 #9644 __glusterfs_this_location () at globals.c:141 #9645 0x00007f55a5fdd312 in __gf_free (free_ptr=free_ptr@entry=0x7f558c068890) at mem-pool.c:293 #9646 0x00007f559cba409c in notify_kernel_loop (data=<optimized out>) at fuse-bridge.c:3875 #9647 0x00007f55a4e12dc5 in start_thread (arg=0x7f558a2dc700) at pthread_create.c:308 #9648 0x00007f55a47591cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 ----------------------------------------------------------------------------- # gdb /usr/sbin/glusterfs /core.2136 #9626 __glusterfs_this_location () at globals.c:141 #9627 0x00007f68014e9265 in _gf_msg_nomem (domain=domain@entry=0x7f6801561697 "", file=file@entry=0x7f6801561074 "mem-pool.h", function=function@entry=0x7f6801567b10 <__FUNCTION__.7762> "__gf_default_calloc", line=line@entry=120, level=level@entry=GF_LOG_ALERT, size=size@entry=8) at logging.c:1167 #9628 0x00007f6801520130 in __gf_default_calloc (cnt=1, size=8) at mem-pool.h:120 #9629 __glusterfs_this_location () at globals.c:141 #9630 0x00007f68014e9265 in _gf_msg_nomem (domain=domain@entry=0x7f6801561697 "", file=file@entry=0x7f6801561074 "mem-pool.h", function=function@entry=0x7f6801567b10 <__FUNCTION__.7762> "__gf_default_calloc", line=line@entry=120, level=level@entry=GF_LOG_ALERT, size=size@entry=8) at logging.c:1167 #9631 0x00007f6801520130 in __gf_default_calloc (cnt=1, size=8) at mem-pool.h:120 #9632 __glusterfs_this_location () at globals.c:141 #9633 0x00007f68014e9265 in _gf_msg_nomem (domain=domain@entry=0x7f6801561697 "", file=file@entry=0x7f6801561074 "mem-pool.h", function=function@entry=0x7f6801567b10 <__FUNCTION__.7762> "__gf_default_calloc", line=line@entry=120, level=level@entry=GF_LOG_ALERT, size=size@entry=8) at logging.c:1167 #9634 0x00007f6801520130 in __gf_default_calloc (cnt=1, size=8) at mem-pool.h:120 #9635 __glusterfs_this_location () at globals.c:141 #9636 0x00007f68014e9265 in _gf_msg_nomem (domain=domain@entry=0x7f6801561697 "", file=file@entry=0x7f6801561074 "mem-pool.h", function=function@entry=0x7f6801567b10 <__FUNCTION__.7762> "__gf_default_calloc", line=line@entry=120, level=level@entry=GF_LOG_ALERT, size=size@entry=8) at logging.c:1167 #9637 0x00007f6801520130 in __gf_default_calloc (cnt=1, size=8) at mem-pool.h:120 #9638 __glusterfs_this_location () at globals.c:141 #9639 0x00007f68014e9265 in _gf_msg_nomem (domain=domain@entry=0x7f6801561697 "", file=file@entry=0x7f6801561074 "mem-pool.h", function=function@entry=0x7f6801567b10 <__FUNCTION__.7762> "__gf_default_calloc", line=line@entry=120, level=level@entry=GF_LOG_ALERT, size=size@entry=8) at logging.c:1167 #9640 0x00007f6801520130 in __gf_default_calloc (cnt=1, size=8) at mem-pool.h:120 #9641 __glusterfs_this_location () at globals.c:141 #9642 0x00007f68014e9265 in _gf_msg_nomem (domain=domain@entry=0x7f6801561697 "", file=file@entry=0x7f6801561074 "mem-pool.h", function=function@entry=0x7f6801567b10 <__FUNCTION__.7762> "__gf_default_calloc", line=line@entry=120, level=level@entry=GF_LOG_ALERT, size=size@entry=8) at logging.c:1167 #9643 0x00007f6801520130 in __gf_default_calloc (cnt=1, size=8) at mem-pool.h:120 #9644 __glusterfs_this_location () at globals.c:141 #9645 0x00007f680152d0cb in synctask_switchto (task=0x7f67e8006950) at syncop.c:659 #9646 0x00007f680152dc40 in syncenv_processor (thdata=0x7f68027995e0) at syncop.c:703 #9647 0x00007f6800351dc5 in start_thread (arg=0x7f67f68d6700) at pthread_create.c:308 #9648 0x00007f67ffc981cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
I have seen the same crash on 3.7.5-14
patch submitted upstream: http://review.gluster.org/13254
Ignore patch# 13254 from comment# 7. Submitted new patch upstream with better approach: http://review.gluster.org/#/c/13255/
upstream patch: http://review.gluster.org/#/c/13255/ downstream patch: https://code.engineering.redhat.com/gerrit/#/c/65983/
verified this on 16 node setup and didn't hit the crash. Tried for 3 days.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html