Bug 1749369 - Segmentation fault occurs while truncate file
Summary: Segmentation fault occurs while truncate file
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: GlusterFS
Classification: Community
Component: write-behind
Version: 5
Hardware: All
OS: All
unspecified
urgent
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-05 13:21 UTC by evangelos
Modified: 2020-03-12 12:15 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1753592 (view as bug list)
Environment:
Last Closed: 2020-03-12 12:15:04 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description evangelos 2019-09-05 13:21:22 UTC
Description of problem:
Segmentation fault occurs when bricks are nearly full 100% and in parallel truncate of a file is attempted (No space left on device).
Prerequisite is that performance xlators are activated (read-ahead, write-behind etc) like following

[root@ glusterfs]# gluster --glusterd-sock=/d/backends/1/glusterd/gd.sock volume info

Volume Name: patchy
Type: Replicate
Volume ID: 5b79a54d-8207-4d38-a0e2-001adbdd11d8
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 127.1.1.1:/d/backends/1/brick
Brick2: 127.1.1.2:/d/backends/2/brick
Brick3: 127.1.1.3:/d/backends/3/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off


back trace is like follwoing
#0  mem_put (ptr=0x7fa8e006ec40) at mem-pool.c:847
#1  0x00007fa8e94d3fea in FRAME_DESTROY (frame=0x7fa8d0015698) at ../../../../libglusterfs/src/stack.h:173
#2  STACK_DESTROY (stack=0x7fa8d000b238) at ../../../../libglusterfs/src/stack.h:193
#3  fuse_truncate_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=28, prebuf=<optimized out>, postbuf=0x7fa8e6637530, xdata=0x0) at fuse-bridge.c:813
#4  0x00007fa8e45b5f7d in io_stats_ftruncate_cbk (frame=0x7fa8d0009508, cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28, prebuf=0x7fa8d001cac0, postbuf=0x7fa8e6637530, xdata=0x0) at io-stats.c:2425
#5  0x00007fa8e47d6feb in mdc_ftruncate_cbk (frame=0x7fa8d0015e58, cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28, prebuf=0x7fa8d001cac0, postbuf=0x7fa8e6637530, xdata=0x0) at md-cache.c:1477
#6  0x00007fa8ec29e668 in default_ftruncate_cbk (frame=0x7fa8d000a558, cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28, prebuf=0x7fa8d001cac0, postbuf=0x7fa8e6637530, xdata=0x0) at defaults.c:1375
#7  0x00007fa8e4bfc539 in qr_ftruncate_cbk (frame=0x7fa8d000d288, cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28, prebuf=0x7fa8d001cac0, postbuf=0x7fa8e6637530, xdata=0x0) at quick-read.c:890
#8  0x00007fa8e4e06c48 in ioc_ftruncate_cbk (frame=frame@entry=0x7fa8d000f1e8, cookie=<optimized out>, this=<optimized out>, op_ret=op_ret@entry=-1, op_errno=op_errno@entry=28, prebuf=prebuf@entry=0x7fa8d001cac0, postbuf=postbuf@entry=0x7fa8e6637530, xdata=xdata@entry=0x0) at io-cache.c:1277
#9  0x00007fa8e5022f2a in rda_ftruncate_cbk (frame=0x7fa8d000f088, cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28, prebuf=0x7fa8d001cac0, postbuf=0x7fa8d001cb58, xdata=0x0) at readdir-ahead.c:826
#10 0x00007fa8e522df58 in ra_truncate_cbk (frame=0x7fa8d0015588, cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28, prebuf=0x7fa8d001cac0, postbuf=0x7fa8d001cb58, xdata=0x0) at read-ahead.c:668
#11 0x00007fa8ec22bd43 in call_resume_unwind (stub=stub@entry=0x7fa8d001bf88) at call-stub.c:2348
#12 0x00007fa8ec23a989 in call_unwind_error_keep_stub (stub=0x7fa8d001bf88, op_ret=op_ret@entry=-1, op_errno=28) at call-stub.c:2541
#13 0x00007fa8e544482c in wb_do_winds (wb_inode=wb_inode@entry=0x7fa8e006ec40, tasks=tasks@entry=0x7fa8e6637750) at write-behind.c:1735
#14 0x00007fa8e54448db in wb_process_queue (wb_inode=wb_inode@entry=0x7fa8e006ec40) at write-behind.c:1778
#15 0x00007fa8e5444a5f in wb_fulfill_cbk (frame=frame@entry=0x7fa8e005e808, cookie=<optimized out>, this=<optimized out>, op_ret=op_ret@entry=-1, op_errno=op_errno@entry=28, prebuf=prebuf@entry=0x7fa8e00731e0, postbuf=postbuf@entry=0x7fa8e0073278, xdata=xdata@entry=0x0) at write-behind.c:1105
#16 0x00007fa8e56c5f60 in dht_writev_cbk (frame=frame@entry=0x7fa8e006c0a8, cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=28, prebuf=prebuf@entry=0x7fa8e00731e0, postbuf=postbuf@entry=0x7fa8e0073278, xdata=0x0) at dht-inode-write.c:140
#17 0x00007fa8e592f31e in afr_writev_unwind (frame=frame@entry=0x7fa8e0000b18, this=this@entry=0x7fa8e000f820) at afr-inode-write.c:234
#18 0x00007fa8e592f4fd in afr_transaction_writev_unwind (frame=0x7fa8e005ed78, this=0x7fa8e000f820) at afr-inode-write.c:248
#19 0x00007fa8e5941be2 in afr_transaction_done (frame=0x7fa8e005ed78, this=0x7fa8e000f820) at afr-transaction.c:367
#20 0x00007fa8e59453d3 in afr_unlock_common_cbk (frame=frame@entry=0x7fa8e005ed78, this=this@entry=0x7fa8e000f820, op_ret=op_ret@entry=0, xdata=<optimized out>, op_errno=<optimized out>, cookie=0x2) at afr-lk-common.c:243
#21 0x00007fa8e59461e8 in afr_unlock_inodelk_cbk (frame=0x7fa8e005ed78, cookie=0x2, this=0x7fa8e000f820, op_ret=0, op_errno=<optimized out>, xdata=<optimized out>) at afr-lk-common.c:281
#22 0x00007fa8e5c18f3f in client4_0_finodelk_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=<optimized out>) at client-rpc-fops_v2.c:1398
#23 0x00007fa8ebfe0980 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7fa8e0053800, pollin=pollin@entry=0x7fa8e006e270) at rpc-clnt.c:755
#24 0x00007fa8ebfe0d53 in rpc_clnt_notify (trans=0x7fa8e0053b20, mydata=0x7fa8e0053830, event=<optimized out>, data=0x7fa8e006e270) at rpc-clnt.c:923
#25 0x00007fa8ebfdcd83 in rpc_transport_notify (this=this@entry=0x7fa8e0053b20, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7fa8e006e270) at rpc-transport.c:525
#26 0x00007fa8e6aae2cb in socket_event_poll_in (notify_handled=true, this=0x7fa8e0053b20) at socket.c:2504
#27 socket_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7fa8e0053b20, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0) at socket.c:2905
#28 0x00007fa8ec2762d9 in event_dispatch_epoll_handler (event=0x7fa8e6637e90, event_pool=0x18420b0) at event-epoll.c:591
#29 event_dispatch_epoll_worker (data=0x1865220) at event-epoll.c:668
#30 0x00007fa8eb99b182 in start_thread (arg=0x7fa8e6638700) at pthread_create.c:312
#31 0x00007fa8eb6c830d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Program received signal SIGSEGV, Segmentation fault.
mem_put (ptr=0x7fa8e006ec40) at mem-pool.c:869
869         GF_ATOMIC_DEC(hdr->pool->active);
(gdb)

so it appears that while stack unwind of the frames following an error response from brick (No space left on device) frame->local includes a memory location that is not allocated via
mem_get but via calloc.

Breakpoint 1, __gf_calloc (nmemb=nmemb@entry=1, size=size@entry=208, type=type@entry=163, typestr=typestr@entry=0x7fa8e544b892 "gf_wb_mt_wb_inode_t") at mem-pool.c:119
119         return (void *)ptr;
$2034 = (void *) 0x7fa8e006ec40
#0  __gf_calloc (nmemb=nmemb@entry=1, size=size@entry=208, type=type@entry=163, typestr=typestr@entry=0x7fa8e544b892 "gf_wb_mt_wb_inode_t") at mem-pool.c:119
#1  0x00007fa8e5441e7b in __wb_inode_create (this=this@entry=0x7fa8e0013470, inode=inode@entry=0x7fa8e00603a8) at write-behind.c:667
#2  0x00007fa8e5442063 in wb_inode_create (this=0x7fa8e0013470, inode=0x7fa8e00603a8) at write-behind.c:708
#3  0x00007fa8e5444fc2 in wb_writev (frame=frame@entry=0x7fa8e005ed78, this=<optimized out>, fd=fd@entry=0x7fa8d000cb88, vector=vector@entry=0x7fa8d000d0b0, count=count@entry=1, offset=offset@entry=0, flags=flags@entry=32769, iobref=iobref@entry=0x7fa8d00160f0, xdata=xdata@entry=0x0) at write-behind.c:1851
#4  0x00007fa8e5231c78 in ra_writev (frame=frame@entry=0x7fa8e005d2d8, this=0x7fa8e0014f50, fd=fd@entry=0x7fa8d000cb88, vector=vector@entry=0x7fa8d000d0b0, count=count@entry=1, offset=offset@entry=0, flags=flags@entry=32769, iobref=iobref@entry=0x7fa8d00160f0, xdata=xdata@entry=0x0) at read-ahead.c:650
#5  0x00007fa8e501f10a in rda_writev (frame=frame@entry=0x7fa8e0000b18, this=0x7fa8e0016940, fd=fd@entry=0x7fa8d000cb88, vector=vector@entry=0x7fa8d000d0b0, count=count@entry=1, off=off@entry=0, flags=flags@entry=32769, iobref=iobref@entry=0x7fa8d00160f0, xdata=xdata@entry=0x0) at readdir-ahead.c:696
#6  0x00007fa8e4e0b2e6 in ioc_writev (frame=frame@entry=0x7fa8e006c0a8, this=0x7fa8e00188d0, fd=fd@entry=0x7fa8d000cb88, vector=vector@entry=0x7fa8d000d0b0, count=count@entry=1, offset=offset@entry=0, flags=flags@entry=32769, iobref=iobref@entry=0x7fa8d00160f0, xdata=xdata@entry=0x0) at io-cache.c:1233
#7  0x00007fa8e4bfabce in qr_writev (frame=frame@entry=0x7fa8e005e808, this=0x7fa8e001a370, fd=fd@entry=0x7fa8d000cb88, iov=iov@entry=0x7fa8d000d0b0, count=count@entry=1, offset=offset@entry=0, flags=flags@entry=32769, iobref=iobref@entry=0x7fa8d00160f0, xdata=xdata@entry=0x0) at quick-read.c:845
#8  0x00007fa8ec2c2240 in default_writev_resume (frame=0x7fa8d000f088, this=0x7fa8e001bde0, fd=0x7fa8d000cb88, vector=0x7fa8d000d0b0, count=1, off=0, flags=32769, iobref=0x7fa8d00160f0, xdata=0x0) at defaults.c:2274
#9  0x00007fa8ec23a66a in call_resume_wind (stub=<optimized out>) at call-stub.c:2040

and contains the inode entry.
magik number is there which troubles the mem_put and core dump occurs while trying to derefenrce the pool pointer.

The destroyed frame is always ra_truncate_cbk winded from ra_ftruncate
e.g.
(gdb) p *frame
$9 = {root = 0x7fa1a0000ec8, parent = 0x7fa1a000c098, frames = {next = 0x7fa1a000a338, prev = 0x7fa1a000a338}, local = 0x0, this = 0x7fa1ac013470, ret = 0x7fa1b2baedf0 <ra_truncate_cbk>, ref_count = 0, lock = {spinlock = 0, mutex = {
      __data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = -1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 16 times>, "\377\377\377\377", '\000' <repeats 19 times>,
      __align = 0}}, cookie = 0x7fa1a000a328, complete = true, op = GF_FOP_FTRUNCATE, begin = {tv_sec = 164325, tv_nsec = 696128864}, end = {tv_sec = 164325, tv_nsec = 698002711},
  wind_from = 0x7fa1b2bb86ff <__FUNCTION__.14980> "ra_ftruncate", wind_to = 0x7fa1b2bb8470 "(this->children->xlator)->fops->ftruncate", unwind_from = 0x7fa1b9c48e50 <__FUNCTION__.18620> "call_resume_unwind",
  unwind_to = 0x7fa1b2bb7fd7 "ra_truncate_cbk"}
(gdb) q

and the inode ptr is copied to the frame local in the 

2237 wb_ftruncate(call_frame_t *frame, xlator_t *this, fd_t *fd, off_t offset,
2238              dict_t *xdata)
2239 {
2240     wb_inode_t *wb_inode = NULL;
2241     call_stub_t *stub = NULL;
2242     int32_t op_errno = 0;
2243
2244     wb_inode = wb_inode_create(this, fd->inode);
2245     if (!wb_inode) {
2246         op_errno = ENOMEM;
2247         goto unwind;
2248     }
2249
2250     frame->local = wb_inode;
2251
2252     stub = fop_ftruncate_stub(frame, wb_ftruncate_helper, fd, offset, xdata);

as seen in the gdb:

Breakpoint 1, wb_ftruncate (frame=frame@entry=0x7f605c00fc18, this=0x7f6060013470, fd=fd@entry=0x7f605c00cbd8, offset=offset@entry=1,
    xdata=xdata@entry=0x0) at write-behind.c:2239
2239    {
(gdb) p frame
$1 = (call_frame_t *) 0x7f605c00fc18
(gdb) p frame->local
$2 = (void *) 0x0
(gdb) watch frame->local
Watchpoint 2: frame->local
(gdb) c
Continuing.
Watchpoint 2: frame->local

Old value = (void *) 0x0
New value = (void *) 0x7f606006c540
wb_ftruncate (frame=frame@entry=0x7f605c00fc18, this=<optimized out>, fd=fd@entry=0x7f605c00cbd8, offset=offset@entry=1,
    xdata=xdata@entry=0x0) at write-behind.c:2252
2252        stub = fop_ftruncate_stub(frame, wb_ftruncate_helper, fd, offset, xdata);
(gdb) bt
#0  wb_ftruncate (frame=frame@entry=0x7f605c00fc18, this=<optimized out>, fd=fd@entry=0x7f605c00cbd8, offset=offset@entry=1,
    xdata=xdata@entry=0x0) at write-behind.c:2252
#1  0x00007f6066c7dfc6 in ra_ftruncate (frame=frame@entry=0x7f605c00a308, this=0x7f6060014f50, fd=fd@entry=0x7f605c00cbd8,
    offset=offset@entry=1, xdata=xdata@entry=0x0) at read-ahead.c:903
#2  0x00007f6066a69627 in rda_ftruncate (frame=frame@entry=0x7f605c00c148, this=0x7f6060016940, fd=fd@entry=0x7f605c00cbd8,
    offset=offset@entry=1, xdata=xdata@entry=0x0) at readdir-ahead.c:835
#3  0x00007f6066856b78 in ioc_ftruncate (frame=frame@entry=0x7f605c00c9b8, this=0x7f60600188d0, fd=fd@entry=0x7f605c00cbd8,
    offset=offset@entry=1, xdata=xdata@entry=0x0) at io-cache.c:1327
#4  0x00007f60666461d1 in qr_ftruncate (frame=frame@entry=0x7f605c00fe98, this=0x7f606001a370, fd=fd@entry=0x7f605c00cbd8,
    offset=offset@entry=1, xdata=xdata@entry=0x0) at quick-read.c:904
#5  0x00007f606dd0986a in default_ftruncate_resume (frame=0x7f605c0016d8, this=0x7f606001bde0, fd=0x7f605c00cbd8, offset=1, xdata=0x0)
    at defaults.c:2023


How reproducible:
always

Steps to Reproduce:
- setup a test cluster 
Volume Name: patchy
Type: Replicate
Volume ID: 5b79a54d-8207-4d38-a0e2-001adbdd11d8
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 127.1.1.1:/d/backends/1/brick
Brick2: 127.1.1.2:/d/backends/2/brick
Brick3: 127.1.1.3:/d/backends/3/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

- fill the volume with dummy files e.g.
for i in {1..10}; do dd if=/home/epolakis/prom.img of=/mnt/glusterfs/0/smallfile_IB__N$i bs=4096 count=10000 ; done

- use a test code to truncate a file like:
#include <unistd.h>                                                                
#include <sys/types.h>                                                             
#include <sys/stat.h>                                                              
#include <fcntl.h>                                                                 
#include <stdio.h>                                                                 
#include <stdlib.h>                                                                
#include <memory.h>                                                                
                                                                                   
#define STR_LEN 1000                                                            
                                                                                   
main() {                                                                           
  char *m_string;                                                               
  int  fd;                                                            
  int  ret;                                                                        
  char fn[]="/mnt/glusterfs/0/smallfile_N1";                                   
  struct stat st;                                                                  
                                                                                   
  if ((m_string = (char*) malloc(STR_LEN)) == NULL)                          
    perror("malloc() error");                                                      
  else if ((fd = open(fn, O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR)) < 0)
    perror("open error");                                                       
  else {                                                                           
    memset(m_string, '0', STR_LEN);                                          
    if ((ret = write(fd, m_string, STR_LEN)) == -1)             
      perror("write() error");                                                     
    else {                                                                         
      if (ftruncate(fd, 1) != 0)                                      
        perror("ftruncate() error");                                               
      else {                                                                       
        fstat(fd, &st);                                               
        printf("the file has %ld bytes\n", (long) st.st_size);                     
      }                                                                            
    }                                                                              
    close(fd);                                                        
  }                                                                                
  free(m_string);                                                               
}       

Actual results:


Expected results:


Additional info:
If my understanding is correct the inode that is calloc'ed should not released till the forget.

Comment 1 evangelos 2019-09-10 10:18:30 UTC
Would something like below enough ?

diff --git a/libglusterfs/src/mem-pool.c b/libglusterfs/src/mem-pool.c
index cff9632..fad6a2c 100644
--- a/libglusterfs/src/mem-pool.c
+++ b/libglusterfs/src/mem-pool.c
@@ -863,6 +863,13 @@ mem_put(void *ptr)
         /* Not one of ours; don't touch it. */
         return;
     }
+
+    if(!hdr->pool_list){
+        gf_msg_callingfn("mem-pool", GF_LOG_ERROR, EINVAL, LG_MSG_INVALID_ARG,
+                        "invalid argument hdr->pool_list NULL");
+        return;
+    }
+
     pool_list = hdr->pool_list;
     pt_pool = &pool_list->pools[hdr->power_of_two - POOL_SMALLEST];

this (as far as i understood but please correct me) this allocation is for the inode that shouldn't be released unless kernel send the forget (rm/unlint etc).

Comment 2 Worker Ant 2019-09-19 11:34:39 UTC
REVISION POSTED: https://review.gluster.org/23445 (Segmentation fault occurs during truncate) posted (#2) for review on master by Vangelis Polakis

Comment 3 Worker Ant 2020-03-12 12:15:04 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/850, and will be tracked there from now on. Visit GitHub issues URL for further details


Note You need to log in before you can comment on or make changes to this bug.