Red Hat Bugzilla – Bug 1241761
nfs-ganesha: coredump "pthread_spin_lock () from /lib64/libpthread.so.0"
Last modified: 2016-01-19 01:14:55 EST
Created attachment 1050509 [details] coredump of quotad on nfs11 Description of problem: ran iozone and did a failover, on one node found a coredump. Although iozone finished as ganesha failed over to other node. Version-Release number of selected component (if applicable): glusterfs-3.7.1-8.el6rhs.x86_64 nfs-ganesha-2.2.0-4.el6rhs.x86_64 How reproducible: happen to be seen first time. Steps to Reproduce: 1. create a volume of 6x2, start it 2. configure nfs-ganesha and mount volume via vers=4 3. check for the status of the iozone completion and ganesha health, Actual results: coredump bt (gdb) #0 0x0000003e96a0c400 in pthread_spin_lock () from /lib64/libpthread.so.0 #1 0x00007f8d74920d85 in __gf_free () from /usr/lib64/libglusterfs.so.0 #2 0x00007f8d748eaa06 in log_buf_destroy () from /usr/lib64/libglusterfs.so.0 #3 0x00007f8d748ec69a in _gf_msg () from /usr/lib64/libglusterfs.so.0 #4 0x00007f8d71d4aa6f in ?? () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/replicate.so #5 0x00007f8d71f83a85 in client3_3_getxattr_cbk () from /usr/lib64/glusterfs/3.7.1/xlator/protocol/client.so #6 0x00007f8d746b9445 in rpc_clnt_handle_reply () from /usr/lib64/libgfrpc.so.0 #7 0x00007f8d746ba8f2 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0 #8 0x00007f8d746b5ad8 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0 #9 0x00007f8d6a271255 in ?? () from /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so #10 0x00007f8d6a272e4d in ?? () from /usr/lib64/glusterfs/3.7.1/rpc-transport/socket.so #11 0x00007f8d7494e980 in ?? () from /usr/lib64/libglusterfs.so.0 #12 0x0000003e96a07a51 in start_thread () from /lib64/libpthread.so.0 #13 0x0000003e966e896d in clone () from /lib64/libc.so.6 Expected results: no coredumps expected. Additional info:
Created attachment 1050511 [details] sosreport of nfs11
Similar crash seems to have been already reported in bug1238626. Investigating if its the same issue.
Thanks Nithya. Bug1212660 talks about an issue of referencing a variable whose xlator object got freed during graph switch. From the logs I see that graph switch has happened. [2015-07-10 12:43:28.411512] I [MSGID: 104041] [glfs-resolve.c:843:__glfs_active_subvol] 0-vol4: switched to graph 6e667331-312d-3235-3138-322d32303135 (0) [2015-07-10 12:43:28.411895] I [MSGID: 108031] [afr-common.c:1745:afr_local_discovery_cbk] 0-vol4-replicate-4: selecting local read_child vol4-client-8 But post that there are also logs of connection failure with bricks - [2015-07-10 12:47:10.807101] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-vol4-client-2: disconnected from vol4-client-2. Client process will keep trying to connect to glusterd until brick's port is available [2015-07-10 12:47:10.807496] W [socket.c:642:__socket_rwv] 0-vol4-client-2: readv on 10.70.46.25:49157 failed (No data available) [2015-07-10 12:47:10.807551] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-vol4-client-2: disconnected from vol4-client-2. Client process will keep trying to connect to glusterd until brick's port is available [2015-07-10 12:47:11.781752] W [socket.c:642:__socket_rwv] 0-vol4-client-11: readv on 10.70.46.29:49158 failed (No data available) [2015-07-10 12:47:11.781862] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-vol4-client-11: disconnected from vol4-client-11. Client process will keep trying to connect to glusterd until brick's port is available [2015-07-10 12:47:11.782174] W [socket.c:642:__socket_rwv] 0-vol4-client-11: readv on 10.70.46.29:49158 failed (No data available) [2015-07-10 12:47:11.782239] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-vol4-client-11: disconnected from vol4-client-11. Client process will keep trying to connect to glusterd until brick's port is available [2015-07-10 12:47:11.912728] W [socket.c:642:__socket_rwv] 0-vol4-client-3: readv on 10.70.46.29:49156 failed (No data available) [2015-07-10 12:47:11.912809] I [MSGID: 114018] [client.c:2042:client_rpc_notify] 0-vol4-client-3: disconnected from vol4-client-3. Client process will keep trying to connect to glusterd until brick's port is available [2015-07-10 12:47:11.912855] E [MSGID: 108006] [afr-common.c:3999:afr_notify] 0-vol4-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up. AFAIR, Saurabh has reported another issue of bricks crashing while running ACLs tests. I shall try to reproduce this issue by following the exact steps executed when this crash is seen. The tests were run via non-root user.
I looked at this a bit, to investigate a possible connection to bug1212660. While it looks similar in some ways, I don't think it's related in the same way to a recent graph switch. First, a bit of background. Any block that we allocate is preceded by a 0x40-byte header and followed by a short trailer. Both of these contain magic numbers so that we can check their validity. What I actually see in the dump is a complete header..data..trailer sequence only 0x20 bytes before the pointer we're trying to free. In other words, any valid header we might have had in that space has been overwritten by a subsequent allocation. The disturbing part is that we're at a point in __gf_free where we would already have checked both magic numbers, and they must have been OK for us to get this far. We're right after a LOCK call, so the implication is that the memory was freed and reallocated while we were asleep. There's no *obvious* race condition that would cause two threads to be freeing the same log buffers, but such a race would be consistent with what's in the dump.
We might be looking at the wrong time in the logs From ganesha-gfapi.log, following messages were logged [2015-07-09 12:52:15.579331] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy] 0-vol3-dht: size=2028 max=7 total=8 [2015-07-09 12:52:15.579356] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy] 0-vol3-read-ahead: size=188 max=0 total=0 [2015-07-09 12:52:15.579417] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy] 0-vol3-readdir-ahead: size=52 max=0 total=0 [2015-07-09 12:52:15.579447] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy] 0-vol3-io-cache: size=68 max=0 total=0 [2015-07-09 12:52:15.579468] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy] 0-vol3-io-cache: size=252 max=1 total=2 [2015-07-09 12:52:15.579498] I [io-stats.c:2951:fini] 0-vol3: io-stats translator unloaded [2015-07-09 12:52:15.583531] I [MSGID: 101191] [event-epoll.c:663:event_dispatch_epoll_worker] 0-epoll: Exited thread with index 2 [2015-07-09 12:52:15.583683] I [MSGID: 101191] [event-epoll.c:663:event_dispatch_epoll_worker] 0-epoll: Exited thread with index 1 [2015-07-09 12:52:54.980552] I [MSGID: 108031] [afr-common.c:1745:afr_local_discovery_cbk] 0-vol3-replicate-0: selecting local read_child vol3-client-0 [2015-07-09 12:52:54.982823] E [mem-pool.c:305:__gf_free] (--> 0-: Assertion failed: GF_MEM_HEADER_MAGIC == header->magic [2015-07-09 12:52:54.982956] E [mem-pool.c:314:__gf_free] (--> 0-: Assertion failed: GF_MEM_TRAILER_MAGIC == *(uint32_t *)((char *)free_ptr + header->size) [2015-07-09 13:36:02.801096] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-07-09 13:36:03.532066] I [MSGID: 104045] [glfs-master.c:95:notify] 0-gfapi: New graph 6e667331-312d-3837-3432-2d323031352d (0) coming up And in /var/log/messages (which is set to IST time) Jul 9 18:23:41 nfs11 abrtd: Generating core_backtrace Jul 9 18:23:42 nfs11 abrtd: New problem directory /var/spool/abrt/ccpp-2015-07-09-18:22:55-25756, processing Assuming those were the logs generated while the core was generated, as it was a non-debug build, the process did not crash at GF_ASSERT (GF_MEM_HEADER_MAGIC == header->magic). Also as Nithya has pointed out, from the core, (gdb) p *first $14 = {msg = 0x7f8d5b028a60 "selecting local read_child vol3-client-0", msg_id = 108031, errnum = 0, oldest = {tv_sec = 1436446374, tv_usec = 980552}, latest = {tv_sec = 1436446374, tv_usec = 980552}, domain = 0x7f8d09f56190 "vol3-replicate-0", file = 0x7f8d58008860 "afr-common.c", function = 0x7f8d58008460 "afr_local_discovery_cbk", line = 1745, level = GF_LOG_INFO, refcount = 0, graph_id = 0, msg_list = { next = 0x7f8d2c51470c, prev = 0x7f8d2c51470c}} (gdb) This was the log message flushed to the disk just before the assertion was hit. Continuing on the core, (gdb) p *first $14 = {msg = 0x7f8d5b028a60 "selecting local read_child vol3-client-0", msg_id = 108031, errnum = 0, oldest = {tv_sec = 1436446374, tv_usec = 980552}, latest = {tv_sec = 1436446374, tv_usec = 980552}, domain = 0x7f8d09f56190 "vol3-replicate-0", file = 0x7f8d58008860 "afr-common.c", function = 0x7f8d58008460 "afr_local_discovery_cbk", line = 1745, level = GF_LOG_INFO, refcount = 0, graph_id = 0, msg_list = { next = 0x7f8d2c51470c, prev = 0x7f8d2c51470c}} (gdb) p first->refcount $15 = 0 But as per the code, first->refcount should have been >=1 /* If the list is full, flush the lru msg to disk and also * release it after unlock, and ... * */ if (first->refcount >= 1) TEST_LOG("Buffer overflow of a buffer whose size limit " "is %d. About to flush least recently used log" " message to disk", size); list_del_init (&first->msg_list); ctx->log.lru_cur_size--; flush_lru = _gf_true; } Not sure, where else recount is set to 0. Leaving that - int log_buf_init (log_buf_t *buf, const char *domain, const char *file, const char *function, int32_t line, gf_loglevel_t level, int errnum, uint64_t msgid, char **appmsgstr, int graph_id) { int ret = -1; ............. ............. /* * The current translator will be put in the block header for any * memory block we allocate here. Unfortunately, these objects might * outlive the current translator, and if we then try to dereference * that pointer we go BOOM. Since this is really a global structure, * use the global translator. */ old_THIS = THIS; THIS = &global_xlator; if (!buf || !domain || !file || !function || !appmsgstr || !*appmsgstr) goto out; buf->msg = gf_strdup (*appmsgstr); .......... .......... log bug messages are created using 'global translator'. (gdb) p buf_new $28 = (log_buf_t *) 0x7f8d2c51447c (gdb) p first $29 = (log_buf_t *) 0x7f8d2c5146ac (gdb) p first->msg $30 = 0x7f8d5b028a60 "selecting local read_child vol3-client-0" (gdb) p buf_new->msg $31 = 0x7f8d580adaf0 "selecting local read_child vol3-client-8" (gdb) p *(uint32_t *)((char *)$31 + header->size) No symbol "header" in current context. (gdb) p sizeof (struct mem_header) $32 = 64 (gdb) p $31 - $32 $33 = 0x7f8d580adab0 "\300\212\323q\215\177" (gdb) p header = (struct mem_header*)$33 No symbol "header" in current context. (gdb) p (struct mem_header*)$33 $34 = (struct mem_header *) 0x7f8d580adab0 (gdb) p $34->size $35 = 0 (gdb) p/x $34->magic $36 = 0x0 (gdb) (gdb) p $29 - $32 $37 = (log_buf_t *) 0x7f8d2c512aac (gdb) p $30 - $32 $38 = 0x7f8d5b028a20 "" (gdb) p (struct mem_header*)$38 $39 = (struct mem_header *) 0x7f8d5b028a20 (gdb) p $39->size $40 = 0 (gdb) p $39->magic $41 = 4294967295 (gdb) p/x $39->magic $42 = 0xffffffff (gdb) Even 'buf_new->msg' seem to gone bad. coming to global_xlator->ctx (gdb) p global_xlator->ctx $43 = (glusterfs_ctx_t *) 0x7f8d2c027840 (gdb) p global_xlator->ctx->cleanup_started $44 = 0 '\000' (gdb) p global_xlator->ctx->logbuf_pool $45 = (void *) 0x0 (gdb) p global_xlator->ctx->fin $46 = -80 '\260' (gdb) p global_xlator->ctx->master $47 = (void *) 0x0 (gdb) (gdb) p *$43 $48 = {cmd_args = {volfile_server = 0x7f8d2c000748 "8\a", curr_server = 0x7f8d2c000748, volfile_servers = {next = 0x7f8d2c027830, prev = 0x7f8d2c027830}, volfile = 0x74726f7078652f73 <Address 0x74726f7078652f73 out of bounds>, log_server = 0x6f632e336c6f762e <Address 0x6f632e336c6f762e out of bounds>, log_level = 943351406, log_file = 0x746164696c617620 <Address 0x746164696c617620 out of bounds>, log_ident = 0x6f727265206e6f69 <Address 0x6f727265206e6f69 out of bounds>, logger = 1763734386, log_format = 1818370158, log_buf_size = 543908719, log_flush_timeout = 1279349574, max_connect_attempts = 1852785418, print_exports = 0x652f2820656c6946 <Address 0x652f2820656c6946 out of bounds>, print_netgroups = 0x73656e61672f6374 <Address 0x73656e61672f6374 out of bounds>, volfile_server_port = 1697603944, volfile_server_transport = 0x726f7078652f7374 <Address 0x726f7078652f7374 out of bounds>, log_server_port = 1870016116, pid_file = 0x203a29383a666e6f <Address 0x203a29383a666e6f out of bounds>, sock_file = 0x702073726f727245 <Address 0x702073726f727245 out of bounds>, no_daemon_mode = 1701015410, run_id = 0x206b636f6c622067 <Address 0x206b636f6c622067 out of bounds>, debug_mode = 1095976488, read_only = 1124739404, acl = 1768320623, selinux = 1766203495, enable_ino32 = 673211756, worm = 1668572463, mac_compat = 1851877167, fopen_keep_cache = 1634235237, gid_timeout = 1886938415, gid_timeout_set = 111 'o', ......... .......... ........ global_xlator->ctx seem like either got freed or about to be initialized. Also from the below gfapi logs before the assertion , [mem-pool.c:616:mem_pool_destroy] 0-vol3-io-cache: size=68 max=0 total=0 [2015-07-09 12:52:15.579468] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy] 0-vol3-io-cache: size=252 max=1 total=2 [2015-07-09 12:52:15.579498] I [io-stats.c:2951:fini] 0-vol3: io-stats translator unloaded We suspect 'glfs_fini' may be playing some role here. The ctx which global_xlator is referring to could have got freed.
Adding a few details missing in comment #9... Poornima and I investigated into the core and found the following: (gdb) f 1 #1 0x00007f8d74920d85 in __gf_free (free_ptr=0x7f8d5b028a60) at mem-pool.c:316 316 LOCK (&mem_acct->rec[header->type].lock); (gdb) p header $1 = (struct mem_header *) 0x7f8d5b028a20 (gdb) x/256xh (char *)header - 128 0x7f8d5b0289a0: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x7f8d5b0289b0: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x7f8d5b0289c0: 0x652f 0x6374 0x732f 0x6c73 0x672f 0x756c 0x7473 0x7265 0x7f8d5b0289d0: 0x7366 0x632e 0x0061 0xf00d 0xbaad 0x0000 0x0000 0x0000 0x7f8d5b0289e0: 0x0000 0x0000 0x0000 0x0000 0x0075 0x0000 0x0000 0x0000 0x7f8d5b0289f0: 0x009a 0x0000 0x0000 0x0000 0x0018 0x0000 0x0000 0x0000 0x7f8d5b028a00: 0x7500 0x5a4d 0x7f8d 0x0000 0xbabe 0xcafe 0x0000 0x0000 0x7f8d5b028a10: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 <start of header> 0x7f8d5b028a20: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x7f8d5b028a30: 0x0003 0x0003 0x0002 0x0000 0xffff 0xffff <-----supposedly the magic number of the header of buf->msg 0x0000 0x0000 0x7f8d5b028a40: 0x0000 0x0000 0x0000 0x0000 0xf00d 0xbaad <------ trailer of a different allocation> 0x0000 0x0000 0x7f8d5b028a50: 0x0000 0x0000 0x0000 0x0000 0x0045 0x0000 0x0000 0x0000 <start of the actual data> 0x7f8d5b028a60: 0x6573 0x656c 0x7463 0x6e69 0x2067 0x6f6c 0x6163 0x206c 0x7f8d5b028a70: 0x6572 0x6461 0x635f 0x6968 0x646c 0x7620 0x6c6f 0x2d33 0x7f8d5b028a80: 0x6c63 0x6569 0x746e 0x302d 0x0000 0x0000 0x0000 0x0000 ^^ missing trailer. 0x7f8d5b028a90: 0x0000 0x0000 0x0000 0x0000 0x0035 0x0000 0x0000 0x0000 0x7f8d5b028aa0: 0x5900 0x5801 0x7f8d 0x0000 0xb86d 0x075b 0x0000 0x0000 0x7f8d5b028ab0: 0x0001 0x0000 0x0000 0x0000 0x2bc0 0x09f5 0x7f8d 0x0000 0x7f8d5b028ac0: 0x0000 0x0000 0x0000 0x0000 0x0045 0x0000 0x0000 0x0000 0x7f8d5b028ad0: 0x6573 0x656c 0x7463 0x6e69 0x2067 0x6f6c 0x6163 0x206c 0x7f8d5b028ae0: 0x6572 0x6461 0x635f 0x6968 0x646c 0x7620 0x6c6f 0x2d33 0x7f8d5b028af0: 0x6c63 0x6569 0x746e 0x382d 0x2f00 0x696c 0x6362 0x6d2e 0x7f8d5b028b00: 0x006f 0x0000 0x0000 0x0000 0x00b5 0x0000 0x0000 0x0000 0x7f8d5b028b10: 0x004a 0x0000 0x0000 0x0000 0x0058 0x0000 0x0000 0x0000 0x7f8d5b028b20: 0x8770 0x5afb 0x7f8d 0x0000 0xbabe 0xcafe 0x0000 0x0000 0x7f8d5b028b30: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x7f8d5b028b40: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x7f8d5b028b50: 0x6c47 0x7375 0x6574 0x4672 0x2053 0x6143 0x6c6c 0x6162 0x7f8d5b028b60: 0x6b63 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x7f8d5b028b70: 0xcc42 0x0324 0x0001 0x0000 0x5ee0 0x721a 0x7f8d 0x0000 0x7f8d5b028b80: 0x0006 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x7f8d5b028b90: 0xe680 0x5afc 0x7f8d 0x0000 0xe680 0x5afc 0x7f8d 0x0000 (gdb) p free_ptr $2 = (void *) 0x7f8d5b028a60 (gdb) p (char *)free_ptr $3 = 0x7f8d5b028a60 "selecting local read_child vol3-client-0" (gdb) p/x header->magic $4 = 0xffffffff (gdb) f 2 #2 0x00007f8d748eaa06 in log_buf_destroy (buf=0x7f8d2c5146ac) at logging.c:353 353 GF_FREE (buf->msg); (gdb) p *buf $5 = {msg = 0x7f8d5b028a60 "selecting local read_child vol3-client-0", msg_id = 108031, errnum = 0, oldest = {tv_sec = 1436446374, tv_usec = 980552}, latest = {tv_sec = 1436446374, tv_usec = 980552}, domain = 0x7f8d09f56190 "vol3-replicate-0", file = 0x7f8d58008860 "afr-common.c", function = 0x7f8d58008460 "afr_local_discovery_cbk", line = 1745, level = GF_LOG_INFO, refcount = 0, graph_id = 0, msg_list = {next = 0x7f8d2c51470c, prev = 0x7f8d2c51470c}} If you notice, the actual data (the portion between header and trailer) in the GF_CALLOC'd objects are perfectly intact and point to valid addresses. (gdb) p/x header->magic $4 = 0xffffffff The magic number associated with buf->msg is supposed to be 0xcafebabe as opposed to the "0xffffffff" that it is. (gdb) p *header $6 = {type = 0, size = 0, mem_acct = 0x200030003, magic = 4294967295, padding = {0, 0, 0, -1163005939, 0, 0, 0, 69}} This actually means it is only the header and the trailer that seem to have gone bad, leading us to suspect that the bufs were actually allocated using CALLOC() as opposed to GF_CALLOC(). This can only happen if memory accounting on THIS at the time of allocation was off, and yet at the time of freeing it is found to be on, with the object getting freed via GF_FREE() leading to the crash. Poornima points out that this can happen with the way global_xlator->ctx is managed in gfapi. I will leave it to her to explain the idiosyncrasies in gfapi's ctx mgmt. Setting/initialising log_buf_size to 0 (which means log bufs are never allocated in the first place, and which is equivalent to disabling log suppression) should as a workaround cause this crash to go away in logging.
Thanks to everyone involved. Adding more theory to Krutika's comment above, Before doing allocations in log_buf_init, we set THIS to 'global_xlator' which means that 'global_xlator->ctx->mem_acct_enable' is checked to consider mem_acct while allocating the memory for 'msg', where as, during log_buf_destroy->GF_FREE, 'THIS->ctx(current xlator)->mem_acct_enable' is checked to determine if the buf to be freed has been memory accounted. In case if those two ctxs have different values for mem_acct_enable, we shall most likely hit this issue. To summarize, in gfapi case (for that matter any process with multiple ctxs involved), a) Its highly likely that global_xlator->ctx and THIS->ctx could be different. This is the primary thing to be addressed here. Both in log_buf_init and log_buf_destroy, we should use global_xlator->ctx. b) However even with the above fix, as global_xlator doesn't have a single ctx dedicated but points to the latest ctx allocated in 'glfs_new', we could still see some race conditions where in global_xlator->ctx->mem_acct_enable is disabled while allocating a buffer but has got enabled by the time GF_FREE is called on that buffer. To fix this, we may need to disable memory accounting (as done for snapshots). But this still doesn't ensure that there are no other references to the global_xlator->ctx which may have got destroyed. The appropriate fix could be as discussed in the #gluster-devel - http://article.gmane.org/gmane.comp.file-systems.gluster.devel/10922/match=issue+libgfapi Will send in the fix soon to address (a). However, since the present issue is hit only when logging is enabled & log.lru_queue is full and as there is a work around to avoid it as mentioned by Kruthika this issue will be documented as known issue for this release.
Doc text is edited. Please sign off to be included in Known Issues.
Doc text looks good to me
Executed different tests to figure out if we can get the crash post fix and didn't see any crash.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1845.html