Bug 1486320
Summary: | client graph does not reload if the actual server node participating in mount is unrechable | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rahul Hinduja <rhinduja> |
Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> |
Status: | CLOSED UPSTREAM | QA Contact: | Bala Konda Reddy M <bmekala> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | rhgs-3.3 | CC: | kaushal, nchilaka, rhinduja, rhs-bugs, storage-qa-internal, vbellur |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-11-06 14:12:00 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Rahul Hinduja
2017-08-29 13:09:44 UTC
RCA: This is a design issue from day 1. Clients get notified about the graph change through the local glusterd and in any case if we end up changing the topology of the volume where one of the glusterd instance is down (in this case the local glusterd of the client), then the client will never get the notification and hence will not be aware of the change of the brick and that's the reason why the I/O will not hit on the new brick which is replaced till the client is remounted back. Unfortunately this problem gets exposed through replace brick because of its commit force flavour where one of the node can be completely offline. Kaushal - any thoughts about how this can be addressed in GD2? This would be solved if clients (re)connect to other volfile servers when the initial connection is lost. Most of the changes for this would be needed in glusterfsd, with very minimal changes in GD2 (or GD1). Glusterfsd's would need changes to do the following, - First connect to the given volfile server, fetch volfile and start. - Get a list of the other servers in the TSP. Keep this list updated regularly. - When the volfile server connection is lost, connect to one of the other servers. GD2 would need one change to add a new RPC for getting a list of servers in the pool. Alternatively, if glusterfsd were to implement a ReST client, it could just call the existing peer list API. *** Bug 1639566 has been marked as a duplicate of this bug. *** |