1390843 – write-behind: flush stuck by former failed write

Bug 1390843 - write-behind: flush stuck by former failed write

Summary: write-behind: flush stuck by former failed write

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	write-behind
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Raghavendra G
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:	1372211
Blocks:	1351528 1390837 1390838 1390840
TreeView+	depends on / blocked

Reported:	2016-11-02 04:42 UTC by Raghavendra G
Modified:	2017-03-23 06:15 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.8.4-6
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1372211
Environment:
Last Closed:	2017-03-23 06:15:58 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Raghavendra G 2016-11-02 04:42:59 UTC

+++ This bug was initially created as a clone of Bug #1372211 +++

Description of problem:
run ltp test, when there is some error in WRITE op, the running process is stuck, the stack is:
[root@dynode1 ~]# cat /proc/16449/stack 
[<ffffffffa0436241>] wait_answer_interruptible+0x91/0xe0 [fuse]
[<ffffffffa0436653>] __fuse_request_send+0x253/0x2c0 [fuse]
[<ffffffffa04366d2>] fuse_request_send+0x12/0x20 [fuse]
[<ffffffffa043f77f>] fuse_flush+0xff/0x150 [fuse]
[<ffffffff811dc8a4>] filp_close+0x34/0x80
[<ffffffff811fce68>] put_files_struct+0x88/0xe0
[<ffffffff811fcf67>] exit_files+0x47/0x50
[<ffffffff810815d6>] do_exit+0x2b6/0xa60
[<ffffffff81081dff>] do_group_exit+0x3f/0xa0
[<ffffffff81081e74>] SyS_exit_group+0x14/0x20
[<ffffffff81646889>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

glusterdump logs:
[xlator.performance.write-behind.wb_inode]
path=/ltp-F9eG0ZSOME/rw-buffered-16436
inode=0x7fdbe8039b9c
window_conf=1048576
window_current=249856
transit-size=0
dontsync=0

[.WRITE]
request-ptr=0x7fdbe8020200
refcount=1
wound=no
generation-number=4
req->op_ret=-1
req->op_errno=116
sync-attempts=3
sync-in-progress=no
size=131072
offset=1220608
lied=-1
append=0
fulfilled=0
go=0

[.WRITE]
request-ptr=0x7fdbe8068c30
refcount=1
wound=no
generation-number=5
req->op_ret=-1
req->op_errno=116
sync-attempts=2
sync-in-progress=no
size=118784
offset=1351680
lied=-1
append=0
fulfilled=0
go=0

[.FLUSH]
request-ptr=0x7fdbe8021cd0
refcount=1
wound=no
generation-number=6
req->op_ret=0
req->op_errno=0
sync-attempts=0

gdb detail about above 3 requests:
(gdb) print *((wb_request_t *)0x7fdbe8021cd0)
$2 = {all = {next = 0x7fdbe803a608, prev = 0x7fdbe8068c30}, todo = {next = 0x7fdbe803a618, prev = 0x7fdbe8068c40}, lie = {next = 0x7fdbe8021cf0, 
    prev = 0x7fdbe8021cf0}, winds = {next = 0x7fdbe8021d00, prev = 0x7fdbe8021d00}, unwinds = {next = 0x7fdbe8021d10, prev = 0x7fdbe8021d10}, wip = {
    next = 0x7fdbe8021d20, prev = 0x7fdbe8021d20}, stub = 0x7fdbe80224dc, write_size = 0, orig_size = 0, total_size = 0, op_ret = 0, op_errno = 0, 
  refcount = 1, wb_inode = 0x7fdbe803a5f0, fop = GF_FOP_FLUSH, lk_owner = {len = 8, data = "W\322T\f\271\367y$", '\000' <repeats 1015 times>}, 
  iobref = 0x0, gen = 6, fd = 0x7fdbe800f0dc, wind_count = 0, ordering = {size = 0, off = 0, append = 0, tempted = 0, lied = 0, fulfilled = 0, 
    go = 0}}
(gdb) print *((wb_request_t *)0x7fdbe8020200)
$3 = {all = {next = 0x7fdbe8068c30, prev = 0x7fdbe803a608}, todo = {next = 0x7fdbe8068c40, prev = 0x7fdbe803a618}, lie = {next = 0x7fdbe8068c50, 
    prev = 0x7fdbe803a628}, winds = {next = 0x7fdbe8020230, prev = 0x7fdbe8020230}, unwinds = {next = 0x7fdbe8020240, prev = 0x7fdbe8020240}, wip = {
    next = 0x7fdbe8020250, prev = 0x7fdbe8020250}, stub = 0x7fdbe8062c3c, write_size = 131072, orig_size = 4096, total_size = 0, op_ret = -1, 
  op_errno = 116, refcount = 1, wb_inode = 0x7fdbe803a5f0, fop = GF_FOP_WRITE, lk_owner = {len = 8, data = '\000' <repeats 1023 times>}, 
  iobref = 0x7fdbe80311a0, gen = 4, fd = 0x7fdbe805c89c, wind_count = 3, ordering = {size = 131072, off = 1220608, append = 0, tempted = -1, 
    lied = -1, fulfilled = 0, go = 0}}
(gdb) print *((wb_request_t *)0x7fdbe8068c30)
$4 = {all = {next = 0x7fdbe8021cd0, prev = 0x7fdbe8020200}, todo = {next = 0x7fdbe8021ce0, prev = 0x7fdbe8020210}, lie = {next = 0x7fdbe803a628, 
    prev = 0x7fdbe8020220}, winds = {next = 0x7fdbe8068c60, prev = 0x7fdbe8068c60}, unwinds = {next = 0x7fdbe8068c70, prev = 0x7fdbe8068c70}, wip = {
    next = 0x7fdbe8068c80, prev = 0x7fdbe8068c80}, stub = 0x7fdbe806746c, write_size = 118784, orig_size = 4096, total_size = 0, op_ret = -1, 
  op_errno = 116, refcount = 1, wb_inode = 0x7fdbe803a5f0, fop = GF_FOP_WRITE, lk_owner = {len = 8, data = '\000' <repeats 1023 times>}, 
  iobref = 0x7fdbe8052b10, gen = 5, fd = 0x7fdbe805c89c, wind_count = 2, ordering = {size = 118784, off = 1351680, append = 0, tempted = -1, 
    lied = -1, fulfilled = 0, go = 0}}

--- Additional comment from Worker Ant on 2016-09-06 02:29:21 EDT ---

REVIEW: http://review.gluster.org/15380 (performance/write-behind: fix flush stuck by former failed writes) posted (#2) for review on master by Oleksandr Natalenko (oleksandr)

--- Additional comment from Worker Ant on 2016-11-01 06:06:10 EDT ---

REVIEW: http://review.gluster.org/15380 (performance/write-behind: fix flush stuck by former failed writes) posted (#3) for review on master by Ryan Ding (ryan.ding)

--- Additional comment from Worker Ant on 2016-11-02 00:36:39 EDT ---

COMMIT: http://review.gluster.org/15380 committed in master by Raghavendra G (rgowdapp) 
------
commit 9340b3c7a6c8556d6f1d4046de0dbd1946a64963
Author: Ryan Ding <ryan.ding>
Date:   Thu Sep 1 15:40:35 2016 +0800

    performance/write-behind: fix flush stuck by former failed writes
    
    the issue is happened in this case:
    assume a file is opened with fd1 and fd2.
    1. some WRITE opto fd1 got error, they were add back to 'todo' queue
       because of those error.
    2. fd2 closed, a FLUSH op is send to write-behind.
    3. FLUSH can not be unwind because it's not a legal waiter for those
       failed write(as func __wb_request_waiting_on() say). and those failed
       WRITE also can not be ended if fd1 is not closed. fd2 stuck in close
       syscall.
    
    to resolve this issue, we can change the way we determine 2 requests is
    'conflict': flush/fsync is not conflict with those write that is not
    belonged to them. so __wb_pick_winds() can wind the FLUSH op.
    
    below is some information when the stuck issue happen:
    glusterdump logs:
    [xlator.performance.write-behind.wb_inode]
    path=/ltp-F9eG0ZSOME/rw-buffered-16436
    inode=0x7fdbe8039b9c
    window_conf=1048576
    window_current=249856
    transit-size=0
    dontsync=0
    
    [.WRITE]
    request-ptr=0x7fdbe8020200
    refcount=1
    wound=no
    generation-number=4
    req->op_ret=-1
    req->op_errno=116
    sync-attempts=3
    sync-in-progress=no
    size=131072
    offset=1220608
    lied=-1
    append=0
    fulfilled=0
    go=0
    
    [.WRITE]
    request-ptr=0x7fdbe8068c30
    refcount=1
    wound=no
    generation-number=5
    req->op_ret=-1
    req->op_errno=116
    sync-attempts=2
    sync-in-progress=no
    size=118784
    offset=1351680
    lied=-1
    append=0
    fulfilled=0
    go=0
    
    [.FLUSH]
    request-ptr=0x7fdbe8021cd0
    refcount=1
    wound=no
    generation-number=6
    req->op_ret=0
    req->op_errno=0
    sync-attempts=0
    
    gdb detail about above 3 requests:
    (gdb) print *((wb_request_t *)0x7fdbe8021cd0)
    $2 = {all = {next = 0x7fdbe803a608, prev = 0x7fdbe8068c30}, todo = {next
    = 0x7fdbe803a618, prev = 0x7fdbe8068c40}, lie = {next = 0x7fdbe8021cf0,
        prev = 0x7fdbe8021cf0}, winds = {next = 0x7fdbe8021d00, prev =
    0x7fdbe8021d00}, unwinds = {next = 0x7fdbe8021d10, prev =
    0x7fdbe8021d10}, wip = {
        next = 0x7fdbe8021d20, prev = 0x7fdbe8021d20}, stub =
    0x7fdbe80224dc, write_size = 0, orig_size = 0, total_size = 0, op_ret =
    0, op_errno = 0,
      refcount = 1, wb_inode = 0x7fdbe803a5f0, fop = GF_FOP_FLUSH, lk_owner
    = {len = 8, data = "W\322T\f\271\367y$", '\000' <repeats 1015 times>},
      iobref = 0x0, gen = 6, fd = 0x7fdbe800f0dc, wind_count = 0, ordering =
    {size = 0, off = 0, append = 0, tempted = 0, lied = 0, fulfilled = 0,
        go = 0}}
    (gdb) print *((wb_request_t *)0x7fdbe8020200)
    $3 = {all = {next = 0x7fdbe8068c30, prev = 0x7fdbe803a608}, todo = {next
    = 0x7fdbe8068c40, prev = 0x7fdbe803a618}, lie = {next = 0x7fdbe8068c50,
        prev = 0x7fdbe803a628}, winds = {next = 0x7fdbe8020230, prev =
    0x7fdbe8020230}, unwinds = {next = 0x7fdbe8020240, prev =
    0x7fdbe8020240}, wip = {
        next = 0x7fdbe8020250, prev = 0x7fdbe8020250}, stub =
    0x7fdbe8062c3c, write_size = 131072, orig_size = 4096, total_size = 0,
    op_ret = -1,
      op_errno = 116, refcount = 1, wb_inode = 0x7fdbe803a5f0, fop =
    GF_FOP_WRITE, lk_owner = {len = 8, data = '\000' <repeats 1023 times>},
      iobref = 0x7fdbe80311a0, gen = 4, fd = 0x7fdbe805c89c, wind_count = 3,
    ordering = {size = 131072, off = 1220608, append = 0, tempted = -1,
        lied = -1, fulfilled = 0, go = 0}}
    (gdb) print *((wb_request_t *)0x7fdbe8068c30)
    $4 = {all = {next = 0x7fdbe8021cd0, prev = 0x7fdbe8020200}, todo = {next
    = 0x7fdbe8021ce0, prev = 0x7fdbe8020210}, lie = {next = 0x7fdbe803a628,
        prev = 0x7fdbe8020220}, winds = {next = 0x7fdbe8068c60, prev =
    0x7fdbe8068c60}, unwinds = {next = 0x7fdbe8068c70, prev =
    0x7fdbe8068c70}, wip = {
        next = 0x7fdbe8068c80, prev = 0x7fdbe8068c80}, stub =
    0x7fdbe806746c, write_size = 118784, orig_size = 4096, total_size = 0,
    op_ret = -1,
      op_errno = 116, refcount = 1, wb_inode = 0x7fdbe803a5f0, fop =
    GF_FOP_WRITE, lk_owner = {len = 8, data = '\000' <repeats 1023 times>},
      iobref = 0x7fdbe8052b10, gen = 5, fd = 0x7fdbe805c89c, wind_count = 2,
    ordering = {size = 118784, off = 1351680, append = 0, tempted = -1,
        lied = -1, fulfilled = 0, go = 0}}
    
    you can see they are all on 'todo' queue, and FLUSH op fd is not the
    same WRITE op fd.
    
    Change-Id: Id687f9cd3b9f281e1a97c83f1ce981ede272b8ab
    BUG: 1372211
    Signed-off-by: Ryan Ding <ryan.ding>
    Reviewed-on: http://review.gluster.org/15380
    Tested-by: Raghavendra G <rgowdapp>
    Reviewed-by: Raghavendra G <rgowdapp>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>

Comment 2 Atin Mukherjee 2016-11-07 04:39:43 UTC

upstream mainline : http://review.gluster.org/15380 (merged)

Comment 5 Atin Mukherjee 2016-11-16 13:22:42 UTC

upstream mainline :http://review.gluster.org/15380
upstream 3.8 : http://review.gluster.org/#/c/15762
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/90346

Comment 11 errata-xmlrpc 2017-03-23 06:15:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.