Bug 1399476 - IO got hanged while doing in-service update from 3.1.3 to 3.2 [NEEDINFO]
Summary: IO got hanged while doing in-service update from 3.1.3 to 3.2
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: write-behind
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: RHGS 3.2.0
Assignee: Raghavendra G
QA Contact: Byreddy
URL:
Whiteboard:
Depends On:
Blocks: 1351528
TreeView+ depends on / blocked
 
Reported: 2016-11-29 06:41 UTC by Byreddy
Modified: 2017-03-23 05:52 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.8.4-7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-23 05:52:00 UTC
amukherj: needinfo? (asrivast)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description Byreddy 2016-11-29 06:41:50 UTC
Description of problem:
=======================
IO got hanged while doing in-service update from rhgs 3.1.3 to 3.2.
When IO got hanged the server was in RHGS3.2 and client was in 3.1.3 bits.


some dev debug details on the live setup:
=========================================

[root@dhcp gluster]# cat /proc/6370/stack
[<ffffffff811a490b>] pipe_wait+0x5b/0x80
[<ffffffff811a4caa>] pipe_write+0x37a/0x6b0
[<ffffffff8119996a>] do_sync_write+0xfa/0x140
[<ffffffff81199c68>] vfs_write+0xb8/0x1a0
[<ffffffff8119a7a1>] sys_write+0x51/0xb0
[<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
[root@dhcp gluster]# cat /proc/6369/stack
[<ffffffffa02592ad>] __fuse_request_send+0xed/0x2b0 [fuse]
[<ffffffffa0259482>] fuse_request_send+0x12/0x20 [fuse]
[<ffffffffa0260176>] fuse_flush+0x106/0x140 [fuse]
[<ffffffff8119683c>] filp_close+0x3c/0x90
[<ffffffff81196935>] sys_close+0xa5/0x100
[<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
[root@dhcp gluster]# lsof /mnt
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF                 NODE NAME
bash    6240 root  cwd    DIR   0,20     4096                    1 /mnt
tar     6369 root  cwd    DIR   0,20     4096                    1 /mnt
xz      6370 root  cwd    DIR   0,20     4096                    1 /mnt
xz      6370 root    0r   REG   0,20 91976832 11426561208144685685 /mnt/linux-4.8.11.tar.xz (deleted)



Version-Release number of selected component (if applicable):
=============================================================
Server: glusterfs-3.8.4-5.el6rhs.x86_64
Client: glusterfs-3.7.9-12.el6.x86_64


How reproducible:
=================
One time


Steps to Reproduce:
====================
1. Do in-service update from 3.1.3 to 3.2
2.
3.

Actual results:
===============
IO got hanged 

Expected results:
=================
IOs should not hang.


Additional info:

Comment 5 Byreddy 2016-12-01 05:45:13 UTC
Some more details:

[root@ ~]# lsof /mnt/
COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF                 NODE NAME
bash    32183 root  cwd    DIR   0,20     4096                    1 /mnt
bash    32369 root  cwd    DIR   0,20     4096 10728560248349618169 /mnt/tmp
tar     32403 root  cwd    DIR   0,20     4096 10728560248349618169 /mnt/tmp
xz      32404 root  cwd    DIR   0,20     4096 10728560248349618169 /mnt/tmp
xz      32404 root    0r   REG   0,20 91976832 10482997327777766662 /mnt/tmp/linux-4.8.11.tar.xz

[root@ ~]# cat /proc/32404/stack 
[<ffffffff811a490b>] pipe_wait+0x5b/0x80
[<ffffffff811a4caa>] pipe_write+0x37a/0x6b0
[<ffffffff8119996a>] do_sync_write+0xfa/0x140
[<ffffffff81199c68>] vfs_write+0xb8/0x1a0
[<ffffffff8119a7a1>] sys_write+0x51/0xb0
[<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
[root@ ~]# 
[root@ ~]# cat /proc/32403/stack 
[<ffffffffa0259181>] wait_answer_interruptible+0x81/0xc0 [fuse]
[<ffffffffa025939b>] __fuse_request_send+0x1db/0x2b0 [fuse]
[<ffffffffa0259482>] fuse_request_send+0x12/0x20 [fuse]
[<ffffffffa0260176>] fuse_flush+0x106/0x140 [fuse]
[<ffffffff8119683c>] filp_close+0x3c/0x90
[<ffffffff81196935>] sys_close+0xa5/0x100
[<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
[root@ ~]#

Comment 6 Atin Mukherjee 2016-12-01 10:05:52 UTC
Given this is hit one more time, its need to be fixed looking at the severity as it impacts the upgrade path. Providing dev_ack.

Comment 11 Karthik U S 2016-12-29 14:23:40 UTC
This is a bug in 3.1.3 as per my observations and testing. There is a fix which is missing in 3.1.3 and the same is fixed in 3.2.0. [1] is the link for the fix in upstream and [2] in downstream. The patch explains the scenario very well. I applied [1] on 3.1.3 and tried to reproduce the issue with single and multiple clients, and upgraded the servers to glusterfs-3.8.4-7.el6rhs.x86_64 build. I did not hit the issue in both cases.
[3] is the link to the custom build I used to reproduce the issue, which includes [1]. It took ~30 mins to untar the linux kernal in both cases. Could you please try to reproduce the issue with [3] and confirm whether we hit this again or not.

[1] http://review.gluster.org/#/c/15579/
[2] https://code.engineering.redhat.com/gerrit/#/c/91956/
[3] https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12279228

Comment 15 Byreddy 2017-01-04 05:49:08 UTC
Verified this issue multiple times from 3.1.3 bits to 3.2.0 ( glusterfs-3.8.4-10) and 
from glusterfs-3.8.4-7 to glusterfs-3.8.4-10.

In both cases,update worked well, didn't seen the reported issue.

Moving to verified state.

Comment 17 errata-xmlrpc 2017-03-23 05:52:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.