Bug 1755679

Summary: Segmentation fault occurs while truncate file
Product: [Community] GlusterFS Reporter: Nithya Balachandran <nbalacha>
Component: write-behindAssignee: bugs <bugs>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6CC: bugs, nbalacha, pasik, rgowdapp, vpolakis
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1753592 Environment:
Last Closed: 2019-10-04 05:23:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1753592, 1797882    
Bug Blocks: 1755678    

Description Nithya Balachandran 2019-09-26 02:33:53 UTC
+++ This bug was initially created as a clone of Bug #1753592 +++

+++ This bug was initially created as a clone of Bug #1749369 +++

Description of problem:
Segmentation fault occurs when bricks are nearly full 100% and in parallel truncate of a file is attempted (No space left on device).
Prerequisite is that performance xlators are activated (read-ahead, write-behind etc) like following

[root@ glusterfs]# gluster --glusterd-sock=/d/backends/1/glusterd/gd.sock volume info

Volume Name: patchy
Type: Replicate
Volume ID: 5b79a54d-8207-4d38-a0e2-001adbdd11d8
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 127.1.1.1:/d/backends/1/brick
Brick2: 127.1.1.2:/d/backends/2/brick
Brick3: 127.1.1.3:/d/backends/3/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off


back trace is like follwoing
#0  mem_put (ptr=0x7fa8e006ec40) at mem-pool.c:847
#1  0x00007fa8e94d3fea in FRAME_DESTROY (frame=0x7fa8d0015698) at ../../../../libglusterfs/src/stack.h:173
#2  STACK_DESTROY (stack=0x7fa8d000b238) at ../../../../libglusterfs/src/stack.h:193
#3  fuse_truncate_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=28, prebuf=<optimized out>, postbuf=0x7fa8e6637530, xdata=0x0) at fuse-bridge.c:813
#4  0x00007fa8e45b5f7d in io_stats_ftruncate_cbk (frame=0x7fa8d0009508, cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28, prebuf=0x7fa8d001cac0, postbuf=0x7fa8e6637530, xdata=0x0) at io-stats.c:2425
#5  0x00007fa8e47d6feb in mdc_ftruncate_cbk (frame=0x7fa8d0015e58, cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28, prebuf=0x7fa8d001cac0, postbuf=0x7fa8e6637530, xdata=0x0) at md-cache.c:1477
#6  0x00007fa8ec29e668 in default_ftruncate_cbk (frame=0x7fa8d000a558, cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28, prebuf=0x7fa8d001cac0, postbuf=0x7fa8e6637530, xdata=0x0) at defaults.c:1375
#7  0x00007fa8e4bfc539 in qr_ftruncate_cbk (frame=0x7fa8d000d288, cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28, prebuf=0x7fa8d001cac0, postbuf=0x7fa8e6637530, xdata=0x0) at quick-read.c:890
#8  0x00007fa8e4e06c48 in ioc_ftruncate_cbk (frame=frame@entry=0x7fa8d000f1e8, cookie=<optimized out>, this=<optimized out>, op_ret=op_ret@entry=-1, op_errno=op_errno@entry=28, prebuf=prebuf@entry=0x7fa8d001cac0, postbuf=postbuf@entry=0x7fa8e6637530, xdata=xdata@entry=0x0) at io-cache.c:1277
#9  0x00007fa8e5022f2a in rda_ftruncate_cbk (frame=0x7fa8d000f088, cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28, prebuf=0x7fa8d001cac0, postbuf=0x7fa8d001cb58, xdata=0x0) at readdir-ahead.c:826
#10 0x00007fa8e522df58 in ra_truncate_cbk (frame=0x7fa8d0015588, cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28, prebuf=0x7fa8d001cac0, postbuf=0x7fa8d001cb58, xdata=0x0) at read-ahead.c:668
#11 0x00007fa8ec22bd43 in call_resume_unwind (stub=stub@entry=0x7fa8d001bf88) at call-stub.c:2348
#12 0x00007fa8ec23a989 in call_unwind_error_keep_stub (stub=0x7fa8d001bf88, op_ret=op_ret@entry=-1, op_errno=28) at call-stub.c:2541
#13 0x00007fa8e544482c in wb_do_winds (wb_inode=wb_inode@entry=0x7fa8e006ec40, tasks=tasks@entry=0x7fa8e6637750) at write-behind.c:1735
#14 0x00007fa8e54448db in wb_process_queue (wb_inode=wb_inode@entry=0x7fa8e006ec40) at write-behind.c:1778
#15 0x00007fa8e5444a5f in wb_fulfill_cbk (frame=frame@entry=0x7fa8e005e808, cookie=<optimized out>, this=<optimized out>, op_ret=op_ret@entry=-1, op_errno=op_errno@entry=28, prebuf=prebuf@entry=0x7fa8e00731e0, postbuf=postbuf@entry=0x7fa8e0073278, xdata=xdata@entry=0x0) at write-behind.c:1105
#16 0x00007fa8e56c5f60 in dht_writev_cbk (frame=frame@entry=0x7fa8e006c0a8, cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28, prebuf=prebuf@entry=0x7fa8e00731e0, postbuf=postbuf@entry=0x7fa8e0073278, xdata=0x0) at dht-inode-write.c:140
#17 0x00007fa8e592f31e in afr_writev_unwind (frame=frame@entry=0x7fa8e0000b18, this=this@entry=0x7fa8e000f820) at afr-inode-write.c:234
#18 0x00007fa8e592f4fd in afr_transaction_writev_unwind (frame=0x7fa8e005ed78, this=0x7fa8e000f820) at afr-inode-write.c:248
#19 0x00007fa8e5941be2 in afr_transaction_done (frame=0x7fa8e005ed78, this=0x7fa8e000f820) at afr-transaction.c:367
#20 0x00007fa8e59453d3 in afr_unlock_common_cbk (frame=frame@entry=0x7fa8e005ed78, this=this@entry=0x7fa8e000f820, op_ret=op_ret@entry=0, xdata=<optimized out>, op_errno=<optimized out>, cookie=0x2) at afr-lk-common.c:243
#21 0x00007fa8e59461e8 in afr_unlock_inodelk_cbk (frame=0x7fa8e005ed78, cookie=0x2, this=0x7fa8e000f820, op_ret=0, op_errno=<optimized out>, xdata=<optimized out>) at afr-lk-common.c:281
#22 0x00007fa8e5c18f3f in client4_0_finodelk_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=<optimized out>) at client-rpc-fops_v2.c:1398
#23 0x00007fa8ebfe0980 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7fa8e0053800, pollin=pollin@entry=0x7fa8e006e270) at rpc-clnt.c:755
#24 0x00007fa8ebfe0d53 in rpc_clnt_notify (trans=0x7fa8e0053b20, mydata=0x7fa8e0053830, event=<optimized out>, data=0x7fa8e006e270) at rpc-clnt.c:923
#25 0x00007fa8ebfdcd83 in rpc_transport_notify (this=this@entry=0x7fa8e0053b20, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7fa8e006e270) at rpc-transport.c:525
#26 0x00007fa8e6aae2cb in socket_event_poll_in (notify_handled=true, this=0x7fa8e0053b20) at socket.c:2504
#27 socket_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7fa8e0053b20, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0) at socket.c:2905
#28 0x00007fa8ec2762d9 in event_dispatch_epoll_handler (event=0x7fa8e6637e90, event_pool=0x18420b0) at event-epoll.c:591
#29 event_dispatch_epoll_worker (data=0x1865220) at event-epoll.c:668
#30 0x00007fa8eb99b182 in start_thread (arg=0x7fa8e6638700) at pthread_create.c:312
#31 0x00007fa8eb6c830d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Program received signal SIGSEGV, Segmentation fault.
mem_put (ptr=0x7fa8e006ec40) at mem-pool.c:869
869         GF_ATOMIC_DEC(hdr->pool->active);
(gdb)

so it appears that while stack unwind of the frames following an error response from brick (No space left on device) frame->local includes a memory location that is not allocated via
mem_get but via calloc.

Breakpoint 1, __gf_calloc (nmemb=nmemb@entry=1, size=size@entry=208, type=type@entry=163, typestr=typestr@entry=0x7fa8e544b892 "gf_wb_mt_wb_inode_t") at mem-pool.c:119
119         return (void *)ptr;
$2034 = (void *) 0x7fa8e006ec40
#0  __gf_calloc (nmemb=nmemb@entry=1, size=size@entry=208, type=type@entry=163, typestr=typestr@entry=0x7fa8e544b892 "gf_wb_mt_wb_inode_t") at mem-pool.c:119
#1  0x00007fa8e5441e7b in __wb_inode_create (this=this@entry=0x7fa8e0013470, inode=inode@entry=0x7fa8e00603a8) at write-behind.c:667
#2  0x00007fa8e5442063 in wb_inode_create (this=0x7fa8e0013470, inode=0x7fa8e00603a8) at write-behind.c:708
#3  0x00007fa8e5444fc2 in wb_writev (frame=frame@entry=0x7fa8e005ed78, this=<optimized out>, fd=fd@entry=0x7fa8d000cb88, vector=vector@entry=0x7fa8d000d0b0, count=count@entry=1, offset=offset@entry=0, flags=flags@entry=32769, iobref=iobref@entry=0x7fa8d00160f0, xdata=xdata@entry=0x0) at write-behind.c:1851
#4  0x00007fa8e5231c78 in ra_writev (frame=frame@entry=0x7fa8e005d2d8, this=0x7fa8e0014f50, fd=fd@entry=0x7fa8d000cb88, vector=vector@entry=0x7fa8d000d0b0, count=count@entry=1, offset=offset@entry=0, flags=flags@entry=32769, iobref=iobref@entry=0x7fa8d00160f0, xdata=xdata@entry=0x0) at read-ahead.c:650
#5  0x00007fa8e501f10a in rda_writev (frame=frame@entry=0x7fa8e0000b18, this=0x7fa8e0016940, fd=fd@entry=0x7fa8d000cb88, vector=vector@entry=0x7fa8d000d0b0, count=count@entry=1, off=off@entry=0, flags=flags@entry=32769, iobref=iobref@entry=0x7fa8d00160f0, xdata=xdata@entry=0x0) at readdir-ahead.c:696
#6  0x00007fa8e4e0b2e6 in ioc_writev (frame=frame@entry=0x7fa8e006c0a8, this=0x7fa8e00188d0, fd=fd@entry=0x7fa8d000cb88, vector=vector@entry=0x7fa8d000d0b0, count=count@entry=1, offset=offset@entry=0, flags=flags@entry=32769, iobref=iobref@entry=0x7fa8d00160f0, xdata=xdata@entry=0x0) at io-cache.c:1233
#7  0x00007fa8e4bfabce in qr_writev (frame=frame@entry=0x7fa8e005e808, this=0x7fa8e001a370, fd=fd@entry=0x7fa8d000cb88, iov=iov@entry=0x7fa8d000d0b0, count=count@entry=1, offset=offset@entry=0, flags=flags@entry=32769, iobref=iobref@entry=0x7fa8d00160f0, xdata=xdata@entry=0x0) at quick-read.c:845
#8  0x00007fa8ec2c2240 in default_writev_resume (frame=0x7fa8d000f088, this=0x7fa8e001bde0, fd=0x7fa8d000cb88, vector=0x7fa8d000d0b0, count=1, off=0, flags=32769, iobref=0x7fa8d00160f0, xdata=0x0) at defaults.c:2274
#9  0x00007fa8ec23a66a in call_resume_wind (stub=<optimized out>) at call-stub.c:2040

and contains the inode entry.
magik number is there which troubles the mem_put and core dump occurs while trying to derefenrce the pool pointer.

The destroyed frame is always ra_truncate_cbk winded from ra_ftruncate
e.g.
(gdb) p *frame
$9 = {root = 0x7fa1a0000ec8, parent = 0x7fa1a000c098, frames = {next = 0x7fa1a000a338, prev = 0x7fa1a000a338}, local = 0x0, this = 0x7fa1ac013470, ret = 0x7fa1b2baedf0 <ra_truncate_cbk>, ref_count = 0, lock = {spinlock = 0, mutex = {
      __data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = -1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 16 times>, "\377\377\377\377", '\000' <repeats 19 times>,
      __align = 0}}, cookie = 0x7fa1a000a328, complete = true, op = GF_FOP_FTRUNCATE, begin = {tv_sec = 164325, tv_nsec = 696128864}, end = {tv_sec = 164325, tv_nsec = 698002711},
  wind_from = 0x7fa1b2bb86ff <__FUNCTION__.14980> "ra_ftruncate", wind_to = 0x7fa1b2bb8470 "(this->children->xlator)->fops->ftruncate", unwind_from = 0x7fa1b9c48e50 <__FUNCTION__.18620> "call_resume_unwind",
  unwind_to = 0x7fa1b2bb7fd7 "ra_truncate_cbk"}
(gdb) q

and the inode ptr is copied to the frame local in the 

2237 wb_ftruncate(call_frame_t *frame, xlator_t *this, fd_t *fd, off_t offset,
2238              dict_t *xdata)
2239 {
2240     wb_inode_t *wb_inode = NULL;
2241     call_stub_t *stub = NULL;
2242     int32_t op_errno = 0;
2243
2244     wb_inode = wb_inode_create(this, fd->inode);
2245     if (!wb_inode) {
2246         op_errno = ENOMEM;
2247         goto unwind;
2248     }
2249
2250     frame->local = wb_inode;
2251
2252     stub = fop_ftruncate_stub(frame, wb_ftruncate_helper, fd, offset, xdata);

as seen in the gdb:

Breakpoint 1, wb_ftruncate (frame=frame@entry=0x7f605c00fc18, this=0x7f6060013470, fd=fd@entry=0x7f605c00cbd8, offset=offset@entry=1,
    xdata=xdata@entry=0x0) at write-behind.c:2239
2239    {
(gdb) p frame
$1 = (call_frame_t *) 0x7f605c00fc18
(gdb) p frame->local
$2 = (void *) 0x0
(gdb) watch frame->local
Watchpoint 2: frame->local
(gdb) c
Continuing.
Watchpoint 2: frame->local

Old value = (void *) 0x0
New value = (void *) 0x7f606006c540
wb_ftruncate (frame=frame@entry=0x7f605c00fc18, this=<optimized out>, fd=fd@entry=0x7f605c00cbd8, offset=offset@entry=1,
    xdata=xdata@entry=0x0) at write-behind.c:2252
2252        stub = fop_ftruncate_stub(frame, wb_ftruncate_helper, fd, offset, xdata);
(gdb) bt
#0  wb_ftruncate (frame=frame@entry=0x7f605c00fc18, this=<optimized out>, fd=fd@entry=0x7f605c00cbd8, offset=offset@entry=1,
    xdata=xdata@entry=0x0) at write-behind.c:2252
#1  0x00007f6066c7dfc6 in ra_ftruncate (frame=frame@entry=0x7f605c00a308, this=0x7f6060014f50, fd=fd@entry=0x7f605c00cbd8,
    offset=offset@entry=1, xdata=xdata@entry=0x0) at read-ahead.c:903
#2  0x00007f6066a69627 in rda_ftruncate (frame=frame@entry=0x7f605c00c148, this=0x7f6060016940, fd=fd@entry=0x7f605c00cbd8,
    offset=offset@entry=1, xdata=xdata@entry=0x0) at readdir-ahead.c:835
#3  0x00007f6066856b78 in ioc_ftruncate (frame=frame@entry=0x7f605c00c9b8, this=0x7f60600188d0, fd=fd@entry=0x7f605c00cbd8,
    offset=offset@entry=1, xdata=xdata@entry=0x0) at io-cache.c:1327
#4  0x00007f60666461d1 in qr_ftruncate (frame=frame@entry=0x7f605c00fe98, this=0x7f606001a370, fd=fd@entry=0x7f605c00cbd8,
    offset=offset@entry=1, xdata=xdata@entry=0x0) at quick-read.c:904
#5  0x00007f606dd0986a in default_ftruncate_resume (frame=0x7f605c0016d8, this=0x7f606001bde0, fd=0x7f605c00cbd8, offset=1, xdata=0x0)
    at defaults.c:2023


How reproducible:
always

Steps to Reproduce:
- setup a test cluster 
Volume Name: patchy
Type: Replicate
Volume ID: 5b79a54d-8207-4d38-a0e2-001adbdd11d8
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 127.1.1.1:/d/backends/1/brick
Brick2: 127.1.1.2:/d/backends/2/brick
Brick3: 127.1.1.3:/d/backends/3/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

- fill the volume with dummy files e.g.
for i in {1..10}; do dd if=/home/epolakis/prom.img of=/mnt/glusterfs/0/smallfile_IB__N$i bs=4096 count=10000 ; done

- use a test code to truncate a file like:
#include <unistd.h>                                                                
#include <sys/types.h>                                                             
#include <sys/stat.h>                                                              
#include <fcntl.h>                                                                 
#include <stdio.h>                                                                 
#include <stdlib.h>                                                                
#include <memory.h>                                                                
                                                                                   
#define STR_LEN 1000                                                            
                                                                                   
main() {                                                                           
  char *m_string;                                                               
  int  fd;                                                            
  int  ret;                                                                        
  char fn[]="/mnt/glusterfs/0/smallfile_N1";                                   
  struct stat st;                                                                  
                                                                                   
  if ((m_string = (char*) malloc(STR_LEN)) == NULL)                          
    perror("malloc() error");                                                      
  else if ((fd = open(fn, O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR)) < 0)
    perror("open error");                                                       
  else {                                                                           
    memset(m_string, '0', STR_LEN);                                          
    if ((ret = write(fd, m_string, STR_LEN)) == -1)             
      perror("write() error");                                                     
    else {                                                                         
      if (ftruncate(fd, 1) != 0)                                      
        perror("ftruncate() error");                                               
      else {                                                                       
        fstat(fd, &st);                                               
        printf("the file has %ld bytes\n", (long) st.st_size);                     
      }                                                                            
    }                                                                              
    close(fd);                                                        
  }                                                                                
  free(m_string);                                                               
}       

Actual results:


Expected results:


Additional info:
If my understanding is correct the inode that is calloc'ed should not released till the forget.

--- Additional comment from evangelos on 2019-09-10 10:18:30 UTC ---

Would something like below enough ?

diff --git a/libglusterfs/src/mem-pool.c b/libglusterfs/src/mem-pool.c
index cff9632..fad6a2c 100644
--- a/libglusterfs/src/mem-pool.c
+++ b/libglusterfs/src/mem-pool.c
@@ -863,6 +863,13 @@ mem_put(void *ptr)
         /* Not one of ours; don't touch it. */
         return;
     }
+
+    if(!hdr->pool_list){
+        gf_msg_callingfn("mem-pool", GF_LOG_ERROR, EINVAL, LG_MSG_INVALID_ARG,
+                        "invalid argument hdr->pool_list NULL");
+        return;
+    }
+
     pool_list = hdr->pool_list;
     pt_pool = &pool_list->pools[hdr->power_of_two - POOL_SMALLEST];

this (as far as i understood but please correct me) this allocation is for the inode that shouldn't be released unless kernel send the forget (rm/unlint etc).

--- Additional comment from Worker Ant on 2019-09-19 11:34:42 UTC ---

REVIEW: https://review.gluster.org/23445 (Segmentation fault occurs during truncate) posted (#2) for review on master by Vangelis Polakis

--- Additional comment from Nithya Balachandran on 2019-09-24 05:37:47 UTC ---

Hi,

What version of gluster are you using?

--- Additional comment from evangelos on 2019-09-24 07:54:23 UTC ---

I have seen in this issue in 5.6

--- Additional comment from Nithya Balachandran on 2019-09-25 04:30:33 UTC ---

Thanks. We will take a look and get back to you.

--- Additional comment from Nithya Balachandran on 2019-09-25 13:50:34 UTC ---

Hi,


Thank you for an excellent test. Would you mind if I added it to the regression framework?

I'm afraid the patch provided will not fix the issue here. In most xlators, the local structure is allocated from a mempool which is why the mem_put will usually work. 

However, the wb xlator does not have a defined local struct of its own. Instead, for the truncate and ftruncate fops, it saved the wb_inode (which is allocated using GF_CALLOC) in frame->local. In the normal code path, this is cleared in the wb_truncate_cbk and wb_ftruncate_cbk functions and frame->local is set to NULL before unwinding the fop. 


The test code, however, does a write followed by a conflicting truncate. As the write fails with ENOSPC, __wb_pick_winds determines that there is a conflict, updates the request op_ret and op_errno for the ftruncate request. The wb_ftruncate_cbk is not called and the req is instead unwound in wb_do_winds -> call_unwind_error_keep_stub code path. This does not clear frame->local which causes the mem_put to crash.

--- Additional comment from Nithya Balachandran on 2019-09-25 14:27:30 UTC ---

(In reply to evangelos from comment #3)
> I have seen in this issue in 5.6

The issue also exists in the master branch.

--- Additional comment from Worker Ant on 2019-09-25 14:29:24 UTC ---

REVIEW: https://review.gluster.org/23485 (perf/write-behind: Clear frame->local on conflict error) posted (#1) for review on master by N Balachandran

--- Additional comment from Worker Ant on 2019-09-26 02:28:52 UTC ---

REVIEW: https://review.gluster.org/23485 (perf/write-behind: Clear frame->local on conflict error) merged (#1) on master by N Balachandran

Comment 1 Worker Ant 2019-09-26 03:00:51 UTC
REVIEW: https://review.gluster.org/23487 (perf/write-behind: Clear frame->local on conflict error) posted (#1) for review on release-6 by N Balachandran

Comment 2 Worker Ant 2019-10-04 05:23:12 UTC
REVIEW: https://review.gluster.org/23487 (perf/write-behind: Clear frame->local on conflict error) merged (#2) on release-6 by hari gowtham