Bug 1399476
| Summary: | IO got hanged while doing in-service update from 3.1.3 to 3.2 | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Byreddy <bsrirama> |
| Component: | write-behind | Assignee: | Raghavendra G <rgowdapp> |
| Status: | CLOSED ERRATA | QA Contact: | Byreddy <bsrirama> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.2 | CC: | amukherj, asrivast, ksubrahm, rcyriac, rhs-bugs, storage-qa-internal |
| Target Milestone: | --- | ||
| Target Release: | RHGS 3.2.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | glusterfs-3.8.4-7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-03-23 05:52:00 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1351528 | ||
|
Description
Byreddy
2016-11-29 06:41:50 UTC
Some more details: [root@ ~]# lsof /mnt/ COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME bash 32183 root cwd DIR 0,20 4096 1 /mnt bash 32369 root cwd DIR 0,20 4096 10728560248349618169 /mnt/tmp tar 32403 root cwd DIR 0,20 4096 10728560248349618169 /mnt/tmp xz 32404 root cwd DIR 0,20 4096 10728560248349618169 /mnt/tmp xz 32404 root 0r REG 0,20 91976832 10482997327777766662 /mnt/tmp/linux-4.8.11.tar.xz [root@ ~]# cat /proc/32404/stack [<ffffffff811a490b>] pipe_wait+0x5b/0x80 [<ffffffff811a4caa>] pipe_write+0x37a/0x6b0 [<ffffffff8119996a>] do_sync_write+0xfa/0x140 [<ffffffff81199c68>] vfs_write+0xb8/0x1a0 [<ffffffff8119a7a1>] sys_write+0x51/0xb0 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff [root@ ~]# [root@ ~]# cat /proc/32403/stack [<ffffffffa0259181>] wait_answer_interruptible+0x81/0xc0 [fuse] [<ffffffffa025939b>] __fuse_request_send+0x1db/0x2b0 [fuse] [<ffffffffa0259482>] fuse_request_send+0x12/0x20 [fuse] [<ffffffffa0260176>] fuse_flush+0x106/0x140 [fuse] [<ffffffff8119683c>] filp_close+0x3c/0x90 [<ffffffff81196935>] sys_close+0xa5/0x100 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff [root@ ~]# Given this is hit one more time, its need to be fixed looking at the severity as it impacts the upgrade path. Providing dev_ack. This is a bug in 3.1.3 as per my observations and testing. There is a fix which is missing in 3.1.3 and the same is fixed in 3.2.0. [1] is the link for the fix in upstream and [2] in downstream. The patch explains the scenario very well. I applied [1] on 3.1.3 and tried to reproduce the issue with single and multiple clients, and upgraded the servers to glusterfs-3.8.4-7.el6rhs.x86_64 build. I did not hit the issue in both cases. [3] is the link to the custom build I used to reproduce the issue, which includes [1]. It took ~30 mins to untar the linux kernal in both cases. Could you please try to reproduce the issue with [3] and confirm whether we hit this again or not. [1] http://review.gluster.org/#/c/15579/ [2] https://code.engineering.redhat.com/gerrit/#/c/91956/ [3] https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12279228 Verified this issue multiple times from 3.1.3 bits to 3.2.0 ( glusterfs-3.8.4-10) and from glusterfs-3.8.4-7 to glusterfs-3.8.4-10. In both cases,update worked well, didn't seen the reported issue. Moving to verified state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |