1239319 – quota: quotad crash "#4 0x00007f5404f73681 in _gf_msg_backtrace (stacksize=<value optimized out>, callstr=0x7f53ef601100 "", strsize=5) at logging.c:1137"

Bug 1239319 - quota: quotad crash "#4 0x00007f5404f73681 in _gf_msg_backtrace (stacksize=<value optimized out>, callstr=0x7f53ef601100 "", strsize=5) at logging.c:1137"

Summary: quota: quotad crash "#4 0x00007f5404f73681 in _gf_msg_backtrace (stacksize=<...

Keywords:
Status:	CLOSED DUPLICATE of bug 1239317
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	quota
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Vijaikumar Mallikarjuna
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-07-05 16:24 UTC by Saurabh
Modified:	2016-09-17 12:42 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-07-06 06:35:07 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
sosreport of nfs12 (962.23 KB, application/x-xz) 2015-07-05 17:02 UTC, Saurabh	no flags	Details
coredump of quotad on nfs12 (962.23 KB, application/x-xz) 2015-07-05 17:57 UTC, Saurabh	no flags	Details
View All

Description Saurabh 2015-07-05 16:24:29 UTC

Description of problem:
I was having iozone over nfs-ganesha process, with volume being mounted as vers=4. Meantime a failover of nfs-ganesha process was triggered successfully and I/O resumed post failover but later I find that quotad has crashed and coredumped 


Version-Release number of selected component (if applicable):
glusterfs-3.7.1-7.el6rhs.x86_64
nfs-ganesha-2.2.0-3.el6rhs.x86_64

How reproducible:
seen for first time

Steps to Reproduce:
1. create a volume of 6x2 type.
2. nfs-ganesha configuration
3. mount the volume with vers=4
4. trigger a failover by killing a nfs-ganesha process on the node from where mount is done in step 3
5. wait for I/O resume and check the status of things.

Actual results:
A bz 1239317 is already with the same testcase,
this separate bz as coredump are different and crashes seen on different nodes of rhgs cluster

#0  0x00007f540392406a in vfprintf () from /lib64/libc.so.6
#1  0x00007f5403949609 in vsprintf () from /lib64/libc.so.6
#2  0x00007f540392f2b8 in sprintf () from /lib64/libc.so.6
#3  0x00007f54039dec07 in backtrace_symbols () from /lib64/libc.so.6
#4  0x00007f5404f73681 in _gf_msg_backtrace (stacksize=<value optimized out>, callstr=0x7f53ef601100 "", strsize=5) at logging.c:1137
#5  0x00007f5404f75250 in _gf_msg (domain=0x7f5404fec1f1 "", file=0x7f5404ff1a8f "mem-pool.c", function=0x7f5404ff1da0 "gf_mem_set_acct_info", 
    line=<value optimized out>, level=GF_LOG_ERROR, errnum=<value optimized out>, trace=1, msgid=101150, 
    fmt=0x7f5404ff1c70 "Assertion failed: xl->mem_acct != NULL") at logging.c:2047
#6  0x00007f5404faa57e in gf_mem_set_acct_info (xl=0x7f53f0015610, alloc_ptr=0x7f53ef602278, size=78, type=39, 
    typestr=0x7f5404ff1b25 "gf_common_mt_asprintf") at mem-pool.c:63
#7  0x00007f5404faa646 in __gf_malloc (size=78, type=39, typestr=0x7f5404ff1b25 "gf_common_mt_asprintf") at mem-pool.c:147
#8  0x00007f5404faa754 in gf_vasprintf (string_ptr=0x7f53ef602488, format=0x7f5404fec2cb "[%s] %s [%s:%d:%s] %s %d-%s: ", arg=<value optimized out>)
    at mem-pool.c:221
#9  0x00007f5404faa818 in gf_asprintf (string_ptr=<value optimized out>, format=<value optimized out>) at mem-pool.c:239
#10 0x00007f5404f76149 in gf_log_glusterlog (ctx=0x7f5405979010, domain=0x7f5404fec1f1 "", file=0x7f5404ff1a8f "mem-pool.c", 
    function=0x7f5404ff1da0 "gf_mem_set_acct_info", line=63, level=GF_LOG_ERROR, errnum=0, msgid=101150, appmsgstr=0x7f53ef6026c8, 
    callstr=0x7f53ef6026d0 "(-->", tv=..., graph_id=0, fmt=gf_logformat_traditional) at logging.c:1377
#11 0x00007f5404f75227 in _gf_msg_internal (domain=<value optimized out>, file=<value optimized out>, function=<value optimized out>, 
    line=<value optimized out>, level=<value optimized out>, errnum=<value optimized out>, trace=1, msgid=101150, 
    fmt=0x7f5404ff1c70 "Assertion failed: xl->mem_acct != NULL") at logging.c:1994
#12 _gf_msg (domain=<value optimized out>, file=<value optimized out>, function=<value optimized out>, line=<value optimized out>, 
    level=<value optimized out>, errnum=<value optimized out>, trace=1, msgid=101150, fmt=0x7f5404ff1c70 "Assertion failed: xl->mem_acct != NULL")
    at logging.c:2077
#13 0x00007f5404faa57e in gf_mem_set_acct_info (xl=0x7f53f0015610, alloc_ptr=0x7f53ef603848, size=78, type=39, 
    typestr=0x7f5404ff1b25 "gf_common_mt_asprintf") at mem-pool.c:63
#14 0x00007f5404faa646 in __gf_malloc (size=78, type=39, typestr=0x7f5404ff1b25 "gf_common_mt_asprintf") at mem-pool.c:147
#15 0x00007f5404faa754 in gf_vasprintf (string_ptr=0x7f53ef603a58, format=0x7f5404fec2cb "[%s] %s [%s:%d:%s] %s %d-%s: ", arg=<value optimized out>)
    at mem-pool.c:221
#16 0x00007f5404faa818 in gf_asprintf (string_ptr=<value optimized out>, format=<value optimized out>) at mem-pool.c:239
#17 0x00007f5404f76149 in gf_log_glusterlog (ctx=0x7f5405979010, domain=0x7f5404fec1f1 "", file=0x7f5404ff1a8f "mem-pool.c", 
    function=0x7f5404ff1da0 "gf_mem_set_acct_info", line=63, level=GF_LOG_ERROR, errnum=0, msgid=101150, appmsgstr=0x7f53ef603c98, 
    callstr=0x7f53ef603ca0 "(-->", tv=..., graph_id=0, fmt=gf_logformat_traditional) at logging.c:1377
#18 0x00007f5404f75227 in _gf_msg_internal (domain=<value optimized out>, file=<value optimized out>, function=<value optimized out>, 
    line=<value optimized out>, level=<value optimized out>, errnum=<value optimized out>, trace=1, msgid=101150, 
---Type <return> to continue, or q <return> to quit---


[root@nfs11 ~]# gluster volume status vol2
Status of volume: vol2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.46.8:/rhs/brick1/d1r12          49156     0          Y       2665 
Brick 10.70.46.27:/rhs/brick1/d1r22         49156     0          Y       20958
Brick 10.70.46.25:/rhs/brick1/d2r12         49156     0          Y       17883
Brick 10.70.46.29:/rhs/brick1/d2r22         49155     0          Y       20935
Brick 10.70.46.8:/rhs/brick1/d3r12          49157     0          Y       2684 
Brick 10.70.46.27:/rhs/brick1/d3r22         49157     0          Y       20977
Brick 10.70.46.25:/rhs/brick1/d4r12         49157     0          Y       17902
Brick 10.70.46.29:/rhs/brick1/d4r22         49156     0          Y       20954
Brick 10.70.46.8:/rhs/brick1/d5r12          49158     0          Y       2703 
Brick 10.70.46.27:/rhs/brick1/d5r22         49158     0          Y       20996
Brick 10.70.46.25:/rhs/brick1/d6r12         49158     0          Y       17921
Brick 10.70.46.29:/rhs/brick1/d6r22         49157     0          Y       20973
Self-heal Daemon on localhost               N/A       N/A        Y       2341 
Quota Daemon on localhost                   N/A       N/A        Y       2352 
Self-heal Daemon on 10.70.46.27             N/A       N/A        Y       10010
Quota Daemon on 10.70.46.27                 N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.46.22             N/A       N/A        Y       7660 
Quota Daemon on 10.70.46.22                 N/A       N/A        Y       7668 
Self-heal Daemon on 10.70.46.25             N/A       N/A        Y       7479 
Quota Daemon on 10.70.46.25                 N/A       N/A        Y       7484 
Self-heal Daemon on 10.70.46.29             N/A       N/A        Y       9905 
Quota Daemon on 10.70.46.29                 N/A       N/A        N       N/A  
 
Task Status of Volume vol2
------------------------------------------------------------------------------
There are no active volume tasks

[root@nfs11 ~]# gluster volume info vol2
 
Volume Name: vol2
Type: Distributed-Replicate
Volume ID: 30ab7484-1480-46d5-8f83-4ab27199640d
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.46.8:/rhs/brick1/d1r12
Brick2: 10.70.46.27:/rhs/brick1/d1r22
Brick3: 10.70.46.25:/rhs/brick1/d2r12
Brick4: 10.70.46.29:/rhs/brick1/d2r22
Brick5: 10.70.46.8:/rhs/brick1/d3r12
Brick6: 10.70.46.27:/rhs/brick1/d3r22
Brick7: 10.70.46.25:/rhs/brick1/d4r12
Brick8: 10.70.46.29:/rhs/brick1/d4r22
Brick9: 10.70.46.8:/rhs/brick1/d5r12
Brick10: 10.70.46.27:/rhs/brick1/d5r22
Brick11: 10.70.46.25:/rhs/brick1/d6r12
Brick12: 10.70.46.29:/rhs/brick1/d6r22
Options Reconfigured:
performance.readdir-ahead: on
features.cache-invalidation: on
ganesha.enable: on
nfs.disable: on
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on
nfs-ganesha: enable


----------------------
pcs status at the time of filing the bz

Online: [ nfs11 nfs12 nfs13 nfs14 nfs16 ]

Full list of resources:

 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ nfs11 nfs12 nfs13 nfs14 nfs16 ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ nfs11 nfs12 nfs13 nfs14 nfs16 ]
 nfs11-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs11 
 nfs11-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs11 
 nfs12-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs12 
 nfs12-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs12 
 nfs13-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs12 
 nfs13-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs12 
 nfs14-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs12 
 nfs14-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs12 
 nfs16-cluster_ip-1	(ocf::heartbeat:IPaddr):	Started nfs12 
 nfs16-trigger_ip-1	(ocf::heartbeat:Dummy):	Started nfs12 
 nfs16-dead_ip-1	(ocf::heartbeat:Dummy):	Started nfs16 
 nfs14-dead_ip-1	(ocf::heartbeat:Dummy):	Started nfs14 
 nfs13-dead_ip-1	(ocf::heartbeat:Dummy):	Started nfs13 


Expected results:
quotad crash not expected.

Additional info:

Comment 2 Saurabh 2015-07-05 17:02:42 UTC

Created attachment 1046306 [details]
sosreport of nfs12

Comment 3 Saurabh 2015-07-05 17:57:43 UTC

Created attachment 1046309 [details]
coredump of quotad on nfs12

Comment 4 Vijaikumar Mallikarjuna 2015-07-06 06:35:07 UTC

After looking at the core-dump stack, issue looks same as bug# 1239317

*** This bug has been marked as a duplicate of bug 1239317 ***

Note You need to log in before you can comment on or make changes to this bug.