Bug 1262964
Summary: | Cannot access volume when network down | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Huy VU <huy.vu> | ||||||
Component: | replicate | Assignee: | Ravishankar N <ravishankar> | ||||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | |||||||
Severity: | low | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 3.6.3 | CC: | amukherj, bugs, huy.vu, pkarampu, ravishankar | ||||||
Target Milestone: | --- | Keywords: | Reopened, Triaged | ||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-06-22 10:31:46 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Huy VU
2015-09-14 18:56:31 UTC
Please provide description of the problem along with the logs, having a one liner statement in the bug doesn't help developers to get to know the exact issue. If the network is down, pretty much nothing can work. Closing this bug. If you wish to provide the requested information and/or can explain why you still think it's a bug in gluster when the network is down the you may reopen this bug by changing the Status to New. Created attachment 1073653 [details]
gluster logs from node1
gluster log from node1
Created attachment 1073654 [details]
gluster logs from node2
gluster logs from node2. NIC was brought down manually at around 9am.
Description of problem: Version-Release number of selected component (if applicable): glusterfs-api-3.6.5-1.el6.x86_64 glusterfs-server-3.6.5-1.el6.x86_64 glusterfs-3.6.5-1.el6.x86_64 glusterfs-cli-3.6.5-1.el6.x86_64 glusterfs-fuse-3.6.5-1.el6.x86_64 glusterfs-libs-3.6.5-1.el6.x86_64 How reproducible: Steps to Reproduce: 1.Create a 2-node to replicate a volume. 2.Verify that replication works both ways 3.Bring down the NIC on node 2 using the command: ifconfig eth0 down 4.Access the gluster volume on either node Actual results: Any command line command that accesses the gluster volume freezes for about 30 seconds. After about 30 seconds, the command proceeds as expected. Any change to a file on the volume on any node while the NIC of node 2 is down can cause the volume to be split brained Split brain behaviour did not resolve itself after the NIC of node 2 was returned to service (even after 1 hour of doing so) Expected results: Access of the volume on either node should not be impeded when The NIC of node 2 was brought down Changes to a file on either node that do not result in conflicts should not cause split brain after the NIC returns to service Additional info: (In reply to Huy VU from comment #0) > Description of problem: > > > Version-Release number of selected component (if applicable): > > > How reproducible: > > > Steps to Reproduce: > 1. > 2. > 3. > > Actual results: > > > Expected results: > > > Additional info: Sorry about lack of info. I didn't think I pressed the Save Changes button. Please see additional info below. Information has been provided. Please review. Manually launching heal on node 1: [root@huysnpmvm10 glusterd]# gluster volume heal gv0 Launching heal operation to perform index self heal on volume gv0 has been successful Use heal info commands to check status [root@huysnpmvm10 glusterd]# gluster volume heal gv0 info Brick huysnpmvm10:/data/brick1/gv0/ / - Is in split-brain /testfile.txt Number of entries: 2 Brick huysnpmvm11:/data/brick1/gv0/ / - Is in split-brain Number of entries: 1 [root@huysnpmvm10 glusterd]# gluster volume heal gv0 Launching heal operation to perform index self heal on volume gv0 has been successful Use heal info commands to check status [root@huysnpmvm10 glusterd]# gluster volume heal gv0 info Brick huysnpmvm10:/data/brick1/gv0/ / - Is in split-brain /testfile.txt Number of entries: 2 Brick huysnpmvm11:/data/brick1/gv0/ / - Is in split-brain Number of entries: 1 Logs from glfsheal-gv0.log: [2015-09-15 13:38:51.181240] I [dht-shared.c:337:dht_init_regex] 0-gv0-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$ [2015-09-15 13:38:51.185147] I [glfs-master.c:93:notify] 0-gfapi: New graph 68757973-6e70-6d76-6d31-302d34353433 (0) coming up [2015-09-15 13:38:51.185284] I [client.c:2280:notify] 0-gv0-client-0: parent translators are ready, attempting connect on transport [2015-09-15 13:38:51.185725] I [client.c:2280:notify] 0-gv0-client-1: parent translators are ready, attempting connect on transport [2015-09-15 13:38:51.186722] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-gv0-client-0: changing port to 49152 (from 0) [2015-09-15 13:38:51.187540] I [client-handshake.c:1413:select_server_supported_programs] 0-gv0-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-09-15 13:38:51.187632] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-gv0-client-1: changing port to 49152 (from 0) [2015-09-15 13:38:51.188093] I [client-handshake.c:1200:client_setvolume_cbk] 0-gv0-client-0: Connected to gv0-client-0, attached to remote volume '/data/brick1/gv0'. [2015-09-15 13:38:51.188150] I [client-handshake.c:1210:client_setvolume_cbk] 0-gv0-client-0: Server and Client lk-version numbers are not same, reopening the fds [2015-09-15 13:38:51.188294] I [MSGID: 108005] [afr-common.c:3686:afr_notify] 0-gv0-replicate-0: Subvolume 'gv0-client-0' came back up; going online. [2015-09-15 13:38:51.188598] I [client-handshake.c:188:client_set_lk_version_cbk] 0-gv0-client-0: Server lk version = 1 [2015-09-15 13:38:51.188871] I [client-handshake.c:1413:select_server_supported_programs] 0-gv0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2015-09-15 13:38:51.189454] I [client-handshake.c:1200:client_setvolume_cbk] 0-gv0-client-1: Connected to gv0-client-1, attached to remote volume '/data/brick1/gv0'. [2015-09-15 13:38:51.189692] I [client-handshake.c:1210:client_setvolume_cbk] 0-gv0-client-1: Server and Client lk-version numbers are not same, reopening the fds [2015-09-15 13:38:51.198011] I [client-handshake.c:188:client_set_lk_version_cbk] 0-gv0-client-1: Server lk version = 1 [2015-09-15 13:38:53.210926] I [afr-self-heal-entry.c:561:afr_selfheal_entry_do] 0-gv0-replicate-0: performing entry selfheal on 00000000-0000-0000-0000-000000000001 [2015-09-15 13:38:53.215174] E [afr-self-heal-entry.c:246:afr_selfheal_detect_gfid_and_type_mismatch] 0-gv0-replicate-0: Gfid mismatch detected for <00000000-0000-0000-0000-000000000001/testfile.txt>, e2a9f622-2038-40fd-b133-7093c9953db5 on gv0-client-1 and 241e0aea-c6b3-43f1-bd3b-6c59bfd40bcf on gv0-client-0. Skipping conservative merge on the file. [2015-09-15 13:38:53.218606] W [afr-common.c:1803:afr_discover_done] 0-gv0-replicate-0: no read subvols for / [2015-09-15 13:38:53.218811] I [afr-common.c:1491:afr_local_discovery_cbk] 0-gv0-replicate-0: selecting local read_child gv0-client-0 [2015-09-15 13:38:53.219146] W [afr-common.c:1803:afr_discover_done] 0-gv0-replicate-0: no read subvols for / [2015-09-15 13:38:53.219551] I [glfs-resolve.c:836:__glfs_active_subvol] 0-gv0: switched to graph 68757973-6e70-6d76-6d31-302d34353433 (0) (In reply to Huy VU from comment #5) > Description of problem: > > > Version-Release number of selected component (if applicable): > glusterfs-api-3.6.5-1.el6.x86_64 > glusterfs-server-3.6.5-1.el6.x86_64 > glusterfs-3.6.5-1.el6.x86_64 > glusterfs-cli-3.6.5-1.el6.x86_64 > glusterfs-fuse-3.6.5-1.el6.x86_64 > glusterfs-libs-3.6.5-1.el6.x86_64 > > > How reproducible: > > > Steps to Reproduce: > 1.Create a 2-node to replicate a volume. > 2.Verify that replication works both ways > 3.Bring down the NIC on node 2 using the command: ifconfig eth0 down > 4.Access the gluster volume on either node > Volume info: Node 1: [root@huysnpmvm10 glusterd]# gluster volume info Volume Name: gv0 Type: Replicate Volume ID: 2d189bdb-d657-4d7c-9556-6e7676b35ea3 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.35.29.222:/data/brick1/gv0 Brick2: 10.35.29.223:/data/brick1/gv0 Node 2: [root@huysnpmvm11 glusterd]# gluster volume info Volume Name: gv0 Type: Replicate Volume ID: 2d189bdb-d657-4d7c-9556-6e7676b35ea3 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.35.29.222:/data/brick1/gv0 Brick2: 10.35.29.223:/data/brick1/gv0 > Actual results: > Any command line command that accesses the gluster volume freezes for about > 30 seconds. > After about 30 seconds, the command proceeds as expected. > Any change to a file on the volume on any node while the NIC of node 2 is > down can cause the volume to be split brained > Split brain behaviour did not resolve itself after the NIC of node 2 was > returned to service (even after 1 hour of doing so) > > > Expected results: > Access of the volume on either node should not be impeded when The NIC of > node 2 was brought down > Changes to a file on either node that do not result in conflicts should not > cause split brain after the NIC returns to service > > > Additional info: Ravi, Could you provide your inputs here? Thanks, Atin 1. Regarding "Changes to a file on either node that do not result in conflicts should not cause split brain after the NIC returns to service" : The steps that are described are the very steps that result in the split-brain of a file because the mount on each node can see itself and not the other. Once a file gets into a split-brained state, there is no way of automagically getting out of it. You need manual intervention to resolve split-brains. For gluster 3.6 or lower, use https://github.com/gluster/glusterdocs/blob/master/Troubleshooting/split-brain.md to resolve split-brains from the back-end bricks. For 3.7 upwards, you can use the gluster CLI commands from the server (or) a combination of get/seetfattr commands from the mount to resolve split-brain. Usage is documented at https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.md 2. Regarding the 30 seconds hang, I think that is expected behaviour (ping-timeout?). Feel free to re-assign to appropriate component if it is a bug. (In reply to Ravishankar N from comment #11) > 1. Regarding "Changes to a file on either node that do not result in > conflicts should not cause split brain after the NIC returns to service" : > > The steps that are described are the very steps that result in the > split-brain of a file because the mount on each node can see itself and not > the other. Once a file gets into a split-brained state, there is no way of > automagically getting out of it. You need manual intervention to resolve > split-brains. > > For gluster 3.6 or lower, use > https://github.com/gluster/glusterdocs/blob/master/Troubleshooting/split- > brain.md to resolve split-brains from the back-end bricks. > > For 3.7 upwards, you can use the gluster CLI commands from the server (or) > a combination of get/seetfattr commands from the mount to resolve > split-brain. Usage is documented at > https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal- > info-and-split-brain-resolution.md > > > 2. Regarding the 30 seconds hang, I think that is expected behaviour > (ping-timeout?). Feel free to re-assign to appropriate component if it is a > bug. Thanks Ravi for the explanation. Closing this bug as this is an expected behaviour. Ravi, Thank you for your explanation. (In reply to Ravishankar N from comment #11) > 1. Regarding "Changes to a file on either node that do not result in > conflicts should not cause split brain after the NIC returns to service" : > > The steps that are described are the very steps that result in the > split-brain of a file because the mount on each node can see itself and not So that I understand this clearly: A single change to a file while one node in the cluster is down will result in split-brain when the node recovers. Is this true? > the other. Once a file gets into a split-brained state, there is no way of > automagically getting out of it. You need manual intervention to resolve > split-brains. > > For gluster 3.6 or lower, use > https://github.com/gluster/glusterdocs/blob/master/Troubleshooting/split- > brain.md to resolve split-brains from the back-end bricks. > > For 3.7 upwards, you can use the gluster CLI commands from the server (or) > a combination of get/seetfattr commands from the mount to resolve > split-brain. Usage is documented at > https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal- > info-and-split-brain-resolution.md > > > 2. Regarding the 30 seconds hang, I think that is expected behaviour > (ping-timeout?). Feel free to re-assign to appropriate component if it is a > bug. Again what we are saying then is that if a node in the cluster goes down, all clients will see 30 second hang? Is this true? Hi VU, Writing to the same file from the clients on both nodes where the node can only see itself and not the other can result in split-brains. Is that not what you did? If not I may have misunderstood the steps. (In reply to Ravishankar N from comment #14) > Hi VU, > > Writing to the same file from the clients on both nodes where the node can > only see itself and not the other can result in split-brains. Is that not > what you did? If not I may have misunderstood the steps. Ravi, I am sorry for not making the steps clearer. I used vi to add a few lines to the file on one node while the NIC card of the other node was forced down. Then I brought the NIC card up. That was enough to cause split brain. I am also interested in knowing why there was a 30 second hang on both nodes when the NIC card was brought down. NOTE: I tested directly on the two nodes. i.e. the vi command was run directly on node 1. I don't think this should have any bearing on the behaviour. (In reply to Huy VU from comment #15) > (In reply to Ravishankar N from comment #14) > > Hi VU, > > > > Writing to the same file from the clients on both nodes where the node can > > only see itself and not the other can result in split-brains. Is that not > > what you did? If not I may have misunderstood the steps. > > Ravi, > I am sorry for not making the steps clearer. > > I used vi to add a few lines to the file on one node while the NIC card of > the other node was forced down. Then I brought the NIC card up. That was > enough to cause split brain. Ah when you edit using vi, it creates a new swap file (with different gfid) and renames it to the original file. But when node2 comes up, it should be healed from node 1. But instead it is trying to do a conservative merge, which means some kind of modification was done from the mount on node2 when its eth0 was down. But you say that isn't the case. Let me see the logs and figure out. > > I am also interested in knowing why there was a 30 second hang on both nodes > when the NIC card was brought down. When you brought the interface down, I'm guessing the mount on node 1 is not notified immediately (unlike a case when the brick process is killed etc in which case the mount immediately gets a disconnect event for that brick), So it waits until network,ping-timeout value (42 seconds by default). > > NOTE: I tested directly on the two nodes. i.e. the vi command was run > directly on node 1. I don't think this should have any bearing on the > behaviour. So I see that the file has been edited from node 2 as well. `grep -rne "renaming /testfile.txt" mnt-glusterd.log` shows entries from the mount log of both nodes, leading to gfid split-brain. Is that right? (In reply to Huy VU from comment #15) > (In reply to Ravishankar N from comment #14) > > Hi VU, > > > > Writing to the same file from the clients on both nodes where the node can > > only see itself and not the other can result in split-brains. Is that not > > what you did? If not I may have misunderstood the steps. > > Ravi, > I am sorry for not making the steps clearer. > > I used vi to add a few lines to the file on one node while the NIC card of > the other node was forced down. Then I brought the NIC card up. That was > enough to cause split brain. > > I am also interested in knowing why there was a 30 second hang on both nodes > when the NIC card was brought down. glusterd's ping time out value is 30 secs. So in case of any node/network going faulty the other parties may not get a disconnect notification back as tcp keep alive never guarantees that. Its the application's responsibilities to have heart beats to detect such failures. In this case after 30 secs the time out happened and during that interval you observed the hang which is expected. Hope this clarifies the point here. Ravi, since I do not see this behaviour as an issue. moving the component back to replicate > > NOTE: I tested directly on the two nodes. i.e. the vi command was run > directly on node 1. I don't think this should have any bearing on the > behaviour. (In reply to Ravishankar N from comment #17) > So I see that the file has been edited from node 2 as well. > > `grep -rne "renaming /testfile.txt" mnt-glusterd.log` shows entries from > the mount log of both nodes, leading to gfid split-brain. Is that right? Hello Ravi, I did a number of tests at different times. Some tests had me editing the file on node 1; some one node 2; some on both. When you grep for 'renaming', please do so on both sets of logs (node 1's and node 2's) and compare the timestamps of the logs. (In reply to Atin Mukherjee from comment #18) > (In reply to Huy VU from comment #15) > > (In reply to Ravishankar N from comment #14) > > > Hi VU, > > > > > > Writing to the same file from the clients on both nodes where the node can > > > only see itself and not the other can result in split-brains. Is that not > > > what you did? If not I may have misunderstood the steps. > > > > Ravi, > > I am sorry for not making the steps clearer. > > > > I used vi to add a few lines to the file on one node while the NIC card of > > the other node was forced down. Then I brought the NIC card up. That was > > enough to cause split brain. > > > > I am also interested in knowing why there was a 30 second hang on both nodes > > when the NIC card was brought down. > glusterd's ping time out value is 30 secs. So in case of any node/network > going faulty the other parties may not get a disconnect notification back as > tcp keep alive never guarantees that. Its the application's responsibilities > to have heart beats to detect such failures. In this case after 30 secs the > time out happened and during that interval you observed the hang which is > expected. > Hope this clarifies the point here. > > Ravi, > > since I do not see this behaviour as an issue. moving the component back to > replicate > > > > NOTE: I tested directly on the two nodes. i.e. the vi command was run > > directly on node 1. I don't think this should have any bearing on the > > behaviour. Atin, If this behaviour is as intended then so be it; you can close this bug. However, I would think it's an area of improvement. The current behaviour can be described as synchronous replication. If there is a control flag for us to choose asynchronous replication that would improve performance tremendously. (In reply to Huy VU from comment #19) > (In reply to Ravishankar N from comment #17) > > So I see that the file has been edited from node 2 as well. > > > > `grep -rne "renaming /testfile.txt" mnt-glusterd.log` shows entries from > > the mount log of both nodes, leading to gfid split-brain. Is that right? > > Hello Ravi, > > I did a number of tests at different times. Some tests had me editing the > file on node 1; some one node 2; some on both. Hi VU, I know that the time stamps are different. If you do modification operations from different nodes while each node cannot see the other, it will result in split-brain. Are you consistently able to repro the issue with the steps you described? (I am not). If yes then please upload logs from that fresh test set-up (makes debugging easier). Here is what I tried: 1. Create a 1x2 volume and mount on both nodes, create a file on the mount from any mount. 2. Bring eth0 down on node2 3. Edit the file from node1's mount 4. Bring back node2. 5. Launch heal. No split-brain observed. Note that if you do any modifications on node-2 (even to another file) between steps 2 and 4, the parent directories end up in entry split-brain and a conservative merge is attempted, which fails to heal the file edited on step 3 due to gfid mismatch. This is what I think happened in your case. When you grep for 'renaming', > please do so on both sets of logs (node 1's and node 2's) and compare the > timestamps of the logs. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |