Bug 1465861
| Summary: | Removal of io threads from graph causes segfault in quota enable volume | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Sanoj Unnikrishnan <sunnikri> | 
| Component: | quota | Assignee: | bugs <bugs> | 
| Status: | CLOSED WONTFIX | QA Contact: | Rahul Hinduja <rhinduja> | 
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | mainline | CC: | bugs, hgowtham | 
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-11-21 03:15:58 UTC | Type: | Bug | 
| Regression: | --- | Mount Type: | --- | 
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| 
 
        
          Description
        
        
          Sanoj Unnikrishnan
        
        
        
        
        
          2017-06-28 11:43:34 UTC
        
       
      
      
      
    
In attempts to repro this , I found that on each run some random structures where getting corrupted and running into segfault.
In order to assert that the stack was indeed growing into all the allocated space and beyond, I set a guard page in the end of the allocated stack space (so that we hit a segfault before overusing the space).
Below are the code changes.
@@ -443,6 +443,8 @@ synctask_create (struct syncenv *env, size_t stacksize, synctask_fn_t fn,
         struct synctask *newtask = NULL;
         xlator_t        *this    = THIS;
         int             destroymode = 0;
+        int                     r=0;
+        char                    *v;
 
         VALIDATE_OR_GOTO (env, err);
         VALIDATE_OR_GOTO (fn, err);
@@ -498,9 +500,15 @@ synctask_create (struct syncenv *env, size_t stacksize, synctask_fn_t fn,
                                             gf_common_mt_syncstack);
                 newtask->ctx.uc_stack.ss_size = env->stacksize;
         } else {
-                newtask->stack = GF_CALLOC (1, stacksize,
+               newtask->stack = GF_CALLOC (1, stacksize,
                                             gf_common_mt_syncstack);
                 newtask->ctx.uc_stack.ss_size = stacksize;
+                if (stacksize == 16*1024) {
+                        v = (unsigned long)((char *)(newtask->stack) + 4095) & (~4095);
+                        r = mprotect(v, 4096, PROT_NONE);
+                       gf_msg ("syncop", GF_LOG_ERROR, errno,
+                                LG_MSG_GETCONTEXT_FAILED, "SKU: using 16k stack starting at %p, mprotect returned %d, guard page: %p", newtask->stack, r, v);
+               }
         }
 
(gdb) where
#0  0x00007f8a92c51204 in _dl_lookup_symbol_x () from /lib64/ld-linux-x86-64.so.2
#1  0x00007f8a92c561e3 in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
#2  0x00007f8a92c5dd33 in _dl_runtime_resolve_avx () from /lib64/ld-linux-x86-64.so.2
#3  0x0000000000000000 in ?? ()
(gdb) info reg
rdi            0x7f8a92946188	140233141412232
rbp            0x7f8a800b4000	0x7f8a800b4000
rsp            0x7f8a800b4000	0x7f8a800b4000
r8             0x7f8a92e4ba60	140233146677856
(gdb) layout asm
  >│0x7f8a92c51204 <_dl_lookup_symbol_x+4>          push   %r15                   <== push on stack at the guarded page caused segfault
From the brick log we have,
[syncop.c:515:synctask_create] 0-syncop: SKU: using 16k stack starting at 0x7f8a800b28f0, mprotect returned 0, guard page: 0x7f8a800b3000 [No data available]
Stack grows downward from 0x7f8a800b68f0 to 0x7f8a800b28f0  and the page 0x7f8a800b3000 - 0x7f8a800b4000 is guarded , which is where the segfault hit as seen in gdb.
This confirms that the stack space is not sufficient and overflowing, 
I am not sure why we don't hit this in the presence of IO threads though, It may just be that with io threads in graph we may have some allocated and unused memory which our stack freely grows into.
It may just be a silent undetected reuse of some memory.
    As quota is not being actively developed, we are closing this bug.  |