1241761 – nfs-ganesha: coredump "pthread_spin_lock () from /lib64/libpthread.so.0"

Bug 1241761 - nfs-ganesha: coredump "pthread_spin_lock () from /lib64/libpthread.so.0"

Summary: nfs-ganesha: coredump "pthread_spin_lock () from /lib64/libpthread.so.0"

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.1.1
Assignee:	Soumya Koduri
QA Contact:	Saurabh
Docs Contact:
URL:
Whiteboard:
Depends On:	1243806 1254436
Blocks:	1202842 1216951 1251815
TreeView+	depends on / blocked

Reported:	2015-07-10 04:18 UTC by Saurabh
Modified:	2016-01-19 06:14 UTC (History)
CC List:	15 users (show)
Fixed In Version:	glusterfs-3.7.1-12
Doc Type:	Bug Fix
Doc Text:	Previously, when running high workloads, if the last volume exported via NFS-Ganesha is unexported, there was a memory corruption. As a consequence, the process would crash and failover to another node. Now, the memory corruption issue has been fixed and there will not be any crash.
Clone Of:
Environment:
Last Closed:	2015-10-05 07:19:16 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
coredump of quotad on nfs11 (3.48 MB, application/x-xz) 2015-07-10 04:18 UTC, Saurabh	no flags	Details
sosreport of nfs11 (11.47 MB, application/x-xz) 2015-07-10 04:27 UTC, Saurabh	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1845	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.1 update	2015-10-05 11:06:22 UTC

Description Saurabh 2015-07-10 04:18:16 UTC

Created attachment 1050509 [details]
coredump of quotad on nfs11

Description of problem:
ran iozone and did a failover, on one node found a coredump.
Although iozone finished as ganesha failed over to other node.

Version-Release number of selected component (if applicable):
glusterfs-3.7.1-8.el6rhs.x86_64
nfs-ganesha-2.2.0-4.el6rhs.x86_64

How reproducible:
happen to be seen first time.

Steps to Reproduce:
1. create a volume of 6x2, start it
2. configure nfs-ganesha and mount volume via vers=4
3. check for the status of the iozone completion and ganesha health,

Actual results:
coredump bt
(gdb) 
#0  0x0000003e96a0c400 in pthread_spin_lock () from /lib64/libpthread.so.0
#1  0x00007f8d74920d85 in __gf_free () from /usr/lib64/libglusterfs.so.0
#2  0x00007f8d748eaa06 in log_buf_destroy () from /usr/lib64/libglusterfs.so.0
#3  0x00007f8d748ec69a in _gf_msg () from /usr/lib64/libglusterfs.so.0
#4  0x00007f8d71d4aa6f in ?? () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/replicate.so
#5  0x00007f8d71f83a85 in client3_3_getxattr_cbk () from /usr/lib64/glusterfs/3.7.1/xlator/protocol/client.so
#6  0x00007f8d746b9445 in rpc_clnt_handle_reply () from /usr/lib64/libgfrpc.so.0
#7  0x00007f8d746ba8f2 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
#8  0x00007f8d746b5ad8 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0
#9  0x00007f8d6a271255 in ?? () from /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so
#10 0x00007f8d6a272e4d in ?? () from /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so
#11 0x00007f8d7494e980 in ?? () from /usr/lib64/libglusterfs.so.0
#12 0x0000003e96a07a51 in start_thread () from /lib64/libpthread.so.0
#13 0x0000003e966e896d in clone () from /lib64/libc.so.6


Expected results:
no coredumps expected.

Additional info:

Comment 2 Saurabh 2015-07-10 04:27:09 UTC

Created attachment 1050511 [details]
sosreport of nfs11

Comment 3 Soumya Koduri 2015-07-10 06:49:01 UTC

Similar crash seems to have been already reported in bug1238626. Investigating if its the same issue.

Comment 5 Soumya Koduri 2015-07-13 07:55:27 UTC

Thanks Nithya. Bug1212660 talks about an issue of referencing a variable whose xlator object got freed during graph switch. 


From the logs I see that graph switch has happened.

[2015-07-10 12:43:28.411512] I [MSGID: 104041] [glfs-resolve.c:843:__glfs_active_subvol] 0-vol4: switched to graph 6e667331-312d-3235-3138-322d32303135 (0)
[2015-07-10 12:43:28.411895] I [MSGID: 108031] [afr-common.c:1745:afr_local_discovery_cbk] 0-vol4-replicate-4: selecting local read_child vol4-client-8


But post that there are also logs of connection failure with bricks -

[2015-07-10 12:47:10.807101] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-vol4-client-2: disconnected from vol4-client-2. Client process will keep trying to connect to glusterd until brick's port is available
[2015-07-10 12:47:10.807496] W [socket.c:642:__socket_rwv] 0-vol4-client-2: readv on 10.70.46.25:49157 failed (No data available)
[2015-07-10 12:47:10.807551] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-vol4-client-2: disconnected from vol4-client-2. Client process will keep trying to connect to glusterd until brick's port is available
[2015-07-10 12:47:11.781752] W [socket.c:642:__socket_rwv] 0-vol4-client-11: readv on 10.70.46.29:49158 failed (No data available)
[2015-07-10 12:47:11.781862] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-vol4-client-11: disconnected from vol4-client-11. Client process will keep trying to connect to glusterd until brick's port is available
[2015-07-10 12:47:11.782174] W [socket.c:642:__socket_rwv] 0-vol4-client-11: readv on 10.70.46.29:49158 failed (No data available)
[2015-07-10 12:47:11.782239] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-vol4-client-11: disconnected from vol4-client-11. Client process will keep trying to connect to glusterd until brick's port is available
[2015-07-10 12:47:11.912728] W [socket.c:642:__socket_rwv] 0-vol4-client-3: readv on 10.70.46.29:49156 failed (No data available)
[2015-07-10 12:47:11.912809] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-vol4-client-3: disconnected from vol4-client-3. Client process will keep trying to connect to glusterd until brick's port is available
[2015-07-10 12:47:11.912855] E [MSGID: 108006] [afr-common.c:3999:afr_notify] 0-vol4-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.


AFAIR, Saurabh has reported another issue of bricks crashing while running
ACLs tests. 

I shall try to reproduce this issue by following the exact steps executed when this crash is seen. The tests were run via non-root user.

Comment 7 Jeff Darcy 2015-07-13 20:39:01 UTC

I looked at this a bit, to investigate a possible connection to bug1212660.  While it looks similar in some ways, I don't think it's related in the same way to a recent graph switch.

First, a bit of background.  Any block that we allocate is preceded by a 0x40-byte header and followed by a short trailer.  Both of these contain magic numbers so that we can check their validity.  What I actually see in the dump is a complete header..data..trailer sequence only 0x20 bytes before the pointer we're trying to free.  In other words, any valid header we might have had in that space has been overwritten by a subsequent allocation.  The disturbing part is that we're at a point in __gf_free where we would already have checked both magic numbers, and they must have been OK for us to get this far.  We're right after a LOCK call, so the implication is that the memory was freed and reallocated while we were asleep.  There's no *obvious* race condition that would cause two threads to be freeing the same log buffers, but such a race would be consistent with what's in the dump.

Comment 9 Soumya Koduri 2015-07-15 18:47:56 UTC

We might be looking at the wrong time in the logs

From ganesha-gfapi.log, following messages were logged

[2015-07-09 12:52:15.579331] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy] 0-vol3-dht: size=2028 max=7 total=8
[2015-07-09 12:52:15.579356] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy] 0-vol3-read-ahead: size=188 max=0 total=0
[2015-07-09 12:52:15.579417] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy] 0-vol3-readdir-ahead: size=52 max=0 total=0
[2015-07-09 12:52:15.579447] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy] 0-vol3-io-cache: size=68 max=0 total=0
[2015-07-09 12:52:15.579468] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy] 0-vol3-io-cache: size=252 max=1 total=2
[2015-07-09 12:52:15.579498] I [io-stats.c:2951:fini] 0-vol3: io-stats translator unloaded
[2015-07-09 12:52:15.583531] I [MSGID: 101191] [event-epoll.c:663:event_dispatch_epoll_worker] 0-epoll: Exited thread with index 2
[2015-07-09 12:52:15.583683] I [MSGID: 101191] [event-epoll.c:663:event_dispatch_epoll_worker] 0-epoll: Exited thread with index 1
[2015-07-09 12:52:54.980552] I [MSGID: 108031] [afr-common.c:1745:afr_local_discovery_cbk] 0-vol3-replicate-0: selecting local read_child vol3-client-0
[2015-07-09 12:52:54.982823] E [mem-pool.c:305:__gf_free] (--> 0-: Assertion failed: GF_MEM_HEADER_MAGIC == header->magic
[2015-07-09 12:52:54.982956] E [mem-pool.c:314:__gf_free] (--> 0-: Assertion failed: GF_MEM_TRAILER_MAGIC == *(uint32_t *)((char *)free_ptr + header->size)
[2015-07-09 13:36:02.801096] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2015-07-09 13:36:03.532066] I [MSGID: 104045] [glfs-master.c:95:notify] 0-gfapi: New graph 6e667331-312d-3837-3432-2d323031352d (0) coming up

And in /var/log/messages (which is set to IST time)

Jul  9 18:23:41 nfs11 abrtd: Generating core_backtrace
Jul  9 18:23:42 nfs11 abrtd: New problem directory /var/spool/abrt/ccpp-2015-07-09-18:22:55-25756, processing


Assuming those were the logs generated while the core was generated, as it was a non-debug build, the process did not crash at GF_ASSERT (GF_MEM_HEADER_MAGIC == header->magic).

Also as Nithya has pointed out, from the core, 

(gdb) p *first
$14 = {msg = 0x7f8d5b028a60 "selecting local read_child vol3-client-0", 
  msg_id = 108031, errnum = 0, oldest = {tv_sec = 1436446374, tv_usec = 980552}, 
  latest = {tv_sec = 1436446374, tv_usec = 980552}, 
  domain = 0x7f8d09f56190 "vol3-replicate-0", file = 0x7f8d58008860 "afr-common.c", 
  function = 0x7f8d58008460 "afr_local_discovery_cbk", line = 1745, 
  level = GF_LOG_INFO, refcount = 0, graph_id = 0, msg_list = {
    next = 0x7f8d2c51470c, prev = 0x7f8d2c51470c}}
(gdb)

This was the log message flushed to the disk just before the assertion was hit. 


Continuing on the core,


(gdb) p *first
$14 = {msg = 0x7f8d5b028a60 "selecting local read_child vol3-client-0", 
  msg_id = 108031, errnum = 0, oldest = {tv_sec = 1436446374, tv_usec = 980552}, 
  latest = {tv_sec = 1436446374, tv_usec = 980552}, 
  domain = 0x7f8d09f56190 "vol3-replicate-0", file = 0x7f8d58008860 "afr-common.c", 
  function = 0x7f8d58008460 "afr_local_discovery_cbk", line = 1745, 
  level = GF_LOG_INFO, refcount = 0, graph_id = 0, msg_list = {
    next = 0x7f8d2c51470c, prev = 0x7f8d2c51470c}}
(gdb) p first->refcount
$15 = 0

But as per the code, first->refcount should have been >=1 

                /* If the list is full, flush the lru msg to disk and also
                 * release it after unlock, and ...
                 * */
                if (first->refcount >= 1)
                        TEST_LOG("Buffer overflow of a buffer whose size limit "
                                 "is %d. About to flush least recently used log"
                                 " message to disk", size);
                        list_del_init (&first->msg_list);
                        ctx->log.lru_cur_size--;
                        flush_lru = _gf_true;
                }


Not sure, where else recount is set to 0. Leaving that -
int
log_buf_init (log_buf_t *buf, const char *domain, const char *file,
              const char *function, int32_t line, gf_loglevel_t level,
              int errnum, uint64_t msgid, char **appmsgstr, int graph_id)
{
        int ret = -1;
.............
.............
        /*
         * The current translator will be put in the block header for any
         * memory block we allocate here.  Unfortunately, these objects might
         * outlive the current translator, and if we then try to dereference
         * that pointer we go BOOM.  Since this is really a global structure,
         * use the global translator.
         */
        old_THIS = THIS;
        THIS = &global_xlator;

        if (!buf || !domain || !file || !function || !appmsgstr || !*appmsgstr)
                goto out;

        buf->msg = gf_strdup (*appmsgstr);
..........
..........

log bug messages are created using 'global translator'. 

(gdb) p buf_new
$28 = (log_buf_t *) 0x7f8d2c51447c
(gdb) p first
$29 = (log_buf_t *) 0x7f8d2c5146ac
(gdb) p first->msg
$30 = 0x7f8d5b028a60 "selecting local read_child vol3-client-0"
(gdb) p buf_new->msg
$31 = 0x7f8d580adaf0 "selecting local read_child vol3-client-8"
(gdb) p *(uint32_t *)((char *)$31 + header->size)
No symbol "header" in current context.
(gdb) p sizeof (struct mem_header)
$32 = 64
(gdb) p $31 - $32
$33 = 0x7f8d580adab0 "\300\212\323q\215\177"
(gdb) p header = (struct mem_header*)$33
No symbol "header" in current context.
(gdb) p (struct mem_header*)$33
$34 = (struct mem_header *) 0x7f8d580adab0
(gdb) p $34->size
$35 = 0
(gdb) p/x $34->magic
$36 = 0x0
(gdb) 
(gdb) p $29 - $32
$37 = (log_buf_t *) 0x7f8d2c512aac
(gdb) p $30 - $32
$38 = 0x7f8d5b028a20 ""
(gdb) p (struct mem_header*)$38
$39 = (struct mem_header *) 0x7f8d5b028a20
(gdb) p $39->size
$40 = 0
(gdb) p $39->magic
$41 = 4294967295
(gdb) p/x $39->magic
$42 = 0xffffffff
(gdb) 

Even 'buf_new->msg' seem to gone bad.

coming to global_xlator->ctx

(gdb) p global_xlator->ctx
$43 = (glusterfs_ctx_t *) 0x7f8d2c027840
(gdb) p global_xlator->ctx->cleanup_started
$44 = 0 '\000'
(gdb) p global_xlator->ctx->logbuf_pool
$45 = (void *) 0x0
(gdb) p global_xlator->ctx->fin
$46 = -80 '\260'
(gdb) p global_xlator->ctx->master
$47 = (void *) 0x0
(gdb) 

(gdb) p *$43
$48 = {cmd_args = {volfile_server = 0x7f8d2c000748 "8\a", 
    curr_server = 0x7f8d2c000748, volfile_servers = {next = 0x7f8d2c027830, 
      prev = 0x7f8d2c027830}, 
    volfile = 0x74726f7078652f73 <Address 0x74726f7078652f73 out of bounds>, 
    log_server = 0x6f632e336c6f762e <Address 0x6f632e336c6f762e out of bounds>, 
    log_level = 943351406, 
    log_file = 0x746164696c617620 <Address 0x746164696c617620 out of bounds>, 
    log_ident = 0x6f727265206e6f69 <Address 0x6f727265206e6f69 out of bounds>, 
    logger = 1763734386, log_format = 1818370158, log_buf_size = 543908719, 
    log_flush_timeout = 1279349574, max_connect_attempts = 1852785418, 
    print_exports = 0x652f2820656c6946 <Address 0x652f2820656c6946 out of bounds>, 
    print_netgroups = 0x73656e61672f6374 <Address 0x73656e61672f6374 out of bounds>, volfile_server_port = 1697603944, 
    volfile_server_transport = 0x726f7078652f7374 <Address 0x726f7078652f7374 out of bounds>, log_server_port = 1870016116, 
    pid_file = 0x203a29383a666e6f <Address 0x203a29383a666e6f out of bounds>, 
    sock_file = 0x702073726f727245 <Address 0x702073726f727245 out of bounds>, 
    no_daemon_mode = 1701015410, 
    run_id = 0x206b636f6c622067 <Address 0x206b636f6c622067 out of bounds>, 
    debug_mode = 1095976488, read_only = 1124739404, acl = 1768320623, 
    selinux = 1766203495, enable_ino32 = 673211756, worm = 1668572463, 
    mac_compat = 1851877167, fopen_keep_cache = 1634235237, 
    gid_timeout = 1886938415, gid_timeout_set = 111 'o', 
.........
..........
........

global_xlator->ctx seem like either got freed or about to be initialized. Also from the below gfapi logs before the assertion , 

[mem-pool.c:616:mem_pool_destroy] 0-vol3-io-cache: size=68 max=0 total=0
[2015-07-09 12:52:15.579468] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy] 0-vol3-io-cache: size=252 max=1 total=2
[2015-07-09 12:52:15.579498] I [io-stats.c:2951:fini] 0-vol3: io-stats translator unloaded

We suspect 'glfs_fini' may be playing some role here. The ctx which global_xlator is referring to could have got freed.

Comment 10 Krutika Dhananjay 2015-07-16 09:33:20 UTC

Adding a few details missing in comment #9...

Poornima and I investigated into the core and found the following:

(gdb) f 1
#1  0x00007f8d74920d85 in __gf_free (free_ptr=0x7f8d5b028a60) at mem-pool.c:316
316	        LOCK (&mem_acct->rec[header->type].lock);
(gdb) p header
$1 = (struct mem_header *) 0x7f8d5b028a20
(gdb) x/256xh (char *)header - 128
0x7f8d5b0289a0:	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000
0x7f8d5b0289b0:	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000
0x7f8d5b0289c0:	0x652f	0x6374	0x732f	0x6c73	0x672f	0x756c	0x7473	0x7265
0x7f8d5b0289d0:	0x7366	0x632e	0x0061	0xf00d	0xbaad	0x0000	0x0000	0x0000
0x7f8d5b0289e0:	0x0000	0x0000	0x0000	0x0000	0x0075	0x0000	0x0000	0x0000
0x7f8d5b0289f0:	0x009a	0x0000	0x0000	0x0000	0x0018	0x0000	0x0000	0x0000
0x7f8d5b028a00:	0x7500	0x5a4d	0x7f8d	0x0000	0xbabe	0xcafe	0x0000	0x0000
0x7f8d5b028a10:	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000
<start of header>
0x7f8d5b028a20:	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000
0x7f8d5b028a30:	0x0003	0x0003	0x0002	0x0000	0xffff	0xffff	<-----supposedly the magic number of the header of buf->msg 0x0000	0x0000
0x7f8d5b028a40:	0x0000	0x0000	0x0000	0x0000	0xf00d	0xbaad	<------ trailer of a different allocation> 0x0000	0x0000
0x7f8d5b028a50:	0x0000	0x0000	0x0000	0x0000	0x0045	0x0000	0x0000	0x0000
<start of the actual data>
0x7f8d5b028a60:	0x6573	0x656c	0x7463	0x6e69	0x2067	0x6f6c	0x6163	0x206c
0x7f8d5b028a70:	0x6572	0x6461	0x635f	0x6968	0x646c	0x7620	0x6c6f	0x2d33
0x7f8d5b028a80:	0x6c63	0x6569	0x746e	0x302d	0x0000	0x0000	0x0000	0x0000
^^ missing trailer.
0x7f8d5b028a90:	0x0000	0x0000	0x0000	0x0000	0x0035	0x0000	0x0000	0x0000
0x7f8d5b028aa0:	0x5900	0x5801	0x7f8d	0x0000	0xb86d	0x075b	0x0000	0x0000
0x7f8d5b028ab0:	0x0001	0x0000	0x0000	0x0000	0x2bc0	0x09f5	0x7f8d	0x0000
0x7f8d5b028ac0:	0x0000	0x0000	0x0000	0x0000	0x0045	0x0000	0x0000	0x0000
0x7f8d5b028ad0:	0x6573	0x656c	0x7463	0x6e69	0x2067	0x6f6c	0x6163	0x206c
0x7f8d5b028ae0:	0x6572	0x6461	0x635f	0x6968	0x646c	0x7620	0x6c6f	0x2d33
0x7f8d5b028af0:	0x6c63	0x6569	0x746e	0x382d	0x2f00	0x696c	0x6362	0x6d2e
0x7f8d5b028b00:	0x006f	0x0000	0x0000	0x0000	0x00b5	0x0000	0x0000	0x0000
0x7f8d5b028b10:	0x004a	0x0000	0x0000	0x0000	0x0058	0x0000	0x0000	0x0000
0x7f8d5b028b20:	0x8770	0x5afb	0x7f8d	0x0000	0xbabe	0xcafe	0x0000	0x0000
0x7f8d5b028b30:	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000
0x7f8d5b028b40:	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000
0x7f8d5b028b50:	0x6c47	0x7375	0x6574	0x4672	0x2053	0x6143	0x6c6c	0x6162
0x7f8d5b028b60:	0x6b63	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000
0x7f8d5b028b70:	0xcc42	0x0324	0x0001	0x0000	0x5ee0	0x721a	0x7f8d	0x0000
0x7f8d5b028b80:	0x0006	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000	0x0000
0x7f8d5b028b90:	0xe680	0x5afc	0x7f8d	0x0000	0xe680	0x5afc	0x7f8d	0x0000
(gdb) p free_ptr
$2 = (void *) 0x7f8d5b028a60
(gdb) p (char *)free_ptr 
$3 = 0x7f8d5b028a60 "selecting local read_child vol3-client-0"
(gdb) p/x header->magic 
$4 = 0xffffffff
(gdb) f 2
#2  0x00007f8d748eaa06 in log_buf_destroy (buf=0x7f8d2c5146ac) at logging.c:353
353	        GF_FREE (buf->msg);
(gdb) p *buf
$5 = {msg = 0x7f8d5b028a60 "selecting local read_child vol3-client-0", msg_id = 108031, errnum = 0, oldest = {tv_sec = 1436446374, tv_usec = 980552}, latest = {tv_sec = 1436446374, tv_usec = 980552}, domain = 0x7f8d09f56190 "vol3-replicate-0", 
  file = 0x7f8d58008860 "afr-common.c", function = 0x7f8d58008460 "afr_local_discovery_cbk", line = 1745, level = GF_LOG_INFO, refcount = 0, graph_id = 0, msg_list = {next = 0x7f8d2c51470c, prev = 0x7f8d2c51470c}}

If you notice, the actual data (the portion between header and trailer) in the GF_CALLOC'd objects are perfectly intact and point to valid addresses.

(gdb) p/x header->magic 
$4 = 0xffffffff

The magic number associated with buf->msg is supposed to be 0xcafebabe as opposed to the "0xffffffff" that it is.

(gdb) p *header
$6 = {type = 0, size = 0, mem_acct = 0x200030003, magic = 4294967295, padding = {0, 0, 0, -1163005939, 0, 0, 0, 69}}

This actually means it is only the header and the trailer that seem to have gone bad, leading us to suspect that the bufs were actually allocated using CALLOC() as opposed to GF_CALLOC().
This can only happen if memory accounting on THIS at the time of allocation was off, and yet at the time of freeing it is found to be on, with the object getting freed via GF_FREE() leading to the crash.

Poornima points out that this can happen with the way global_xlator->ctx is managed in gfapi. I will leave it to her to explain the idiosyncrasies in gfapi's ctx mgmt.

Setting/initialising log_buf_size to 0 (which means log bufs are never allocated in the first place, and which is equivalent to disabling log suppression) should as a workaround cause this crash to go away in logging.

Comment 11 Soumya Koduri 2015-07-16 10:18:40 UTC

Thanks to everyone involved.

Adding more theory to Krutika's comment above,

Before doing allocations in log_buf_init, we set THIS to 'global_xlator' which means that 'global_xlator->ctx->mem_acct_enable' is checked to consider mem_acct while allocating the memory for 'msg', where as, during log_buf_destroy->GF_FREE, 'THIS->ctx(current xlator)->mem_acct_enable' is checked to determine if the buf to be freed has been memory accounted.
In case if those two ctxs have different values for mem_acct_enable, we shall most likely hit this issue.

To summarize, in gfapi case (for that matter any process with multiple ctxs involved),
a) Its highly likely that global_xlator->ctx and THIS->ctx could be different. This is the primary thing to be addressed here. Both in log_buf_init and log_buf_destroy, we should use global_xlator->ctx.

b) However even with the above fix, as global_xlator doesn't have a single ctx dedicated but points to the latest ctx allocated in 'glfs_new', we could still see some race conditions where in global_xlator->ctx->mem_acct_enable is disabled while allocating a buffer but has got enabled by the time GF_FREE is called on that buffer. To fix this, we may need to disable memory accounting (as done for snapshots). But this still doesn't ensure that there are no other references to the global_xlator->ctx which may have got destroyed.
The appropriate fix could be as discussed in the #gluster-devel - http://article.gmane.org/gmane.comp.file-systems.gluster.devel/10922/match=issue+libgfapi

Will send in the fix soon to address (a). However, since the present issue is hit only when logging is enabled & log.lru_queue is full and as there is a work around to avoid it as mentioned by Kruthika this issue will be documented as known issue for this release.

Comment 12 monti lawrence 2015-07-22 21:17:09 UTC

Doc text is edited. Please sign off to be included in Known Issues.

Comment 13 Meghana 2015-07-27 09:20:31 UTC

Doc text looks good to me

Comment 16 Saurabh 2015-08-26 13:19:58 UTC

Executed different tests to figure out if we can get the crash post fix and didn't see any crash.

Comment 18 errata-xmlrpc 2015-10-05 07:19:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1845.html

Note You need to log in before you can comment on or make changes to this bug.