+++ This bug was initially created as a clone of Bug #1159269 +++ Description of problem: Sometimes a segmentation fault is generated while dumping internal state. An analysis of the core dump seems to indicate that the bug is caused by an unaligned structure: In gf_proc_dump_call_frame() a copy of the frame is made inside a locked region: 88 ret = TRY_LOCK(&call_frame->lock); 89 if (ret) 90 goto out; 91 92 memcpy(&my_frame, call_frame, sizeof(my_frame)); 93 UNLOCK(&call_frame->lock); call_frame->lock does not protect most of the updates to fields inside the call_frame_t structure, specially the pointers to wind_from, wind_to, unwind_from and unwind_to modified in macros STACK_WIND and STACK_UNWIND. This shouldn't be a problem if all these updates were atomic, however it seems that the memory pool framework can return unaligned pointers (at least on 64-bits architectures): (gdb) print call_frame $19 = (call_frame_t *) 0x7f4609a141c4 This means that all pointers inside the structure can be unaligned: (gdb) print &call_frame->unwind_from $20 = (const char **) 0x7f4609a14244 Translated to the microprocessor level, this means that a modification of the unwind_from field will need 2 memory access cycles making the update non atomic and prone to partial reads by other threads. In fact this seems to be what happened: (gdb) print *call_frame $21 = {root = 0x7f460984a280, parent = 0x7f460984a8e8, next = 0x7f4609a13454, prev = 0x7f4609a15540, local = 0x0, this = 0xae2470, ret = 0x7f45fec75311 <ec_lookup_cbk>, ref_count = 0, lock = 1, cookie = 0x9, complete = _gf_true, op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}, wind_from = 0x7f45fecdc082 <__FUNCTION__.13893> "ec_wind_lookup", wind_to = 0x7f45fecdbd20 "ec->xl_list[idx]->fops->lookup", unwind_from = 0x7f45fef26c80 <__FUNCTION__.19453> "client3_3_lookup_cbk", unwind_to = 0x7f45fecdbd3f "ec_lookup_cbk"} (gdb) print my_frame $22 = {root = 0x7f460984a280, parent = 0x7f460984a8e8, next = 0x7f4609a13454, prev = 0x7f4609a15540, local = 0xb6a0b4, this = 0xae2470, ret = 0x7f45fec75311 <ec_lookup_cbk>, ref_count = 0, lock = 0, cookie = 0x9, complete = _gf_false, op = GF_FOP_NULL, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}, wind_from = 0x7f45fecdc082 <__FUNCTION__.13893> "ec_wind_lookup", wind_to = 0x7f45fecdbd20 "ec->xl_list[idx]->fops->lookup", unwind_from = 0x7f4500000000 <error: Cannot access memory at address 0x7f4500000000>, unwind_to = 0x7f45fecdbd3f "ec_lookup_cbk"} The copy made to my_frame has only copied half of the unwind_from pointer because it was being updated in another thread. If we check current contents of call_frame, we can see that the pointer has completed to be updated before crashing, but the copy on my_frame remains incorrect: (gdb) print call_frame->unwind_from $23 = 0x7f45fef26c80 <__FUNCTION__.19453> "client3_3_lookup_cbk" (gdb) print my_frame.unwind_from $24 = 0x7f4500000000 <error: Cannot access memory at address 0x7f4500000000> Version-Release number of selected component (if applicable): master How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Anand Avati on 2014-10-31 12:33:40 CET --- REVIEW: http://review.gluster.org/9031 (mem-pool: Fix memory block alignments) posted (#1) for review on master by Xavier Hernandez (xhernandez)
REVIEW: http://review.gluster.org/9032 (mem-pool: Fix memory block alignments) posted (#1) for review on release-3.6 by Xavier Hernandez (xhernandez)
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
This bug is being closed as GlusterFS-3.6 is nearing its End-Of-Life and only important security bugs will be fixed. This bug has been fixed in more recent GlusterFS releases. If you still face this bug with the newer GlusterFS versions, please open a new bug.