Bug 1286108
Summary: | Write on fuse mount failed with "write error: Transport endpoint is not connected" after a successful remove brick operation | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Susant Kumar Palai <spalai> |
Component: | distribute | Assignee: | Raghavendra G <rgowdapp> |
Status: | CLOSED WONTFIX | QA Contact: | Anoop <annair> |
Severity: | unspecified | Docs Contact: | |
Priority: | high | ||
Version: | rhgs-3.1 | CC: | amukherj, nbalacha, rgowdapp, rhs-bugs, spalai, spandura, storage-qa-internal, vbellur |
Target Milestone: | --- | Keywords: | Triaged, ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | dht-rca-unknown, dht-must-fix | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | 991402 | Environment: | |
Last Closed: | Type: | Bug | |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 991402 | ||
Bug Blocks: | 1286180 |
Comment 2
Nithya Balachandran
2016-06-24 08:18:11 UTC
I think when the commit operation is done, glusterd might've rebooted all the bricks as the volfile has changed. That might've caused the ENOTCONN. @Atin, If my volume is a plain distribute of 3 bricks b1, b2, b3 and If I remove b3 and commit the operation would the bricks b1, b2 are rebooted? regards, Raghavendra (In reply to Raghavendra G from comment #3) > I think when the commit operation is done, glusterd might've rebooted all > the bricks as the volfile has changed. That might've caused the ENOTCONN. > > @Atin, > > If my volume is a plain distribute of 3 bricks b1, b2, b3 and If I remove b3 > and commit the operation would the bricks b1, b2 are rebooted? No, the bricks aren't rebooted in this case. I've checked that both from code and a quick test [1] [1] https://paste.fedoraproject.org/386164/17642214/ > > regards, > Raghavendra Following is an hypothesis as to why this issue might've happened
>[2013-08-02 10:42:18.062268] W [fuse-bridge.c:5103:fuse_migrate_fd]
> 0-glusterfs-fuse: syncop_fsync failed (Transport endpoint is not connected) on
> fd (0x1d9338c)(basefd:0x1d9338c basefd-inode.gfid:7a315be7-683f-
> 4a4c-b6d6-85936bde21a1) (old-subvolume:vol_dis_rep-0-new-subvolume:vol_dis_rep-1)
As we can see from the logs above, fsync during graph switch failed. A possible hypothesis is that the removed brick was killed even before the clients were given a chance to "react" (like flushing the cached-writes in old-graph - which fd-migration does) to remove-brick operation. Since, fuse was not able to "migrate" the fd, application continuity is broken. A possible fix should terminate the brick process only after _all_ clients are given a chance to migrate the fds.
|