Bug 1229226
| Summary: | Gluster split-brain not logged and data integrity not enforced | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Dustin Black <dblack> |
| Component: | replicate | Assignee: | Ravishankar N <ravishankar> |
| Status: | CLOSED EOL | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.7.0 | CC: | amukherj, bugs, jdarcy, ssampat |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | AFR | ||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-03-08 11:01:40 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1223758, 1224709 | ||
|
Description
Dustin Black
2015-06-08 10:04:28 UTC
FWIW, I tried a bunch of iptables tricks and couldn't find a way to reproduce this on a single node. It does seem specific to a two-node (or at least two-glusterd) configuration. OK, I lied. Previously, I had been cutting off access to each brick sequentially, with client unmounts and remounts in between. This time, I mounted twice simultaneously, and cut off each client's connection to one brick. Something like this (your port numbers may vary).
> iptables -t mangle -I OUTPUT -p tcp --sport 1020 --dport 49152 -j DROP
> iptables -t mangle -I OUTPUT -p tcp --sport 1002 --dport 49153 -j DROP
With this, I got into a state where *one* client could still read the file from the still-connected brick without error. Interestingly, it was not symmetric; the other client did report EIO, as it should. Xattrs do show pending operations for each other, and "heal info" shows split-brain from both sides.
As I wrote this, the state changed yet again. Now both clients correctly return EIO. This strongly suggests that some state is being cached improperly on the clients, but not infinitely. The plot thickens.
I can consistently reproduce this state now. Just as consistently, it persists until I utter this familiar incantation:
# echo 3 > /proc/sys/vm/drop_caches
As far as I can tell, we don't even *get* the read until we do this. Therefore we can't fail it. Instead, the kernel returns the version that we had written previously. We could prevent that by checking for split-brain on open, but we don't seem to do that. Perhaps this is related to the fact that NFS might not do an open before a read, so the emphasis has been on checking in the read path - which we don't get to in this case. Just a theory. In any case, maybe there are some clues that someone more familiar with AFR can pursue.
*** Bug 1220347 has been marked as a duplicate of this bug. *** (In reply to Jeff Darcy from comment #3) > I can consistently reproduce this state now. Just as consistently, it > persists until I utter this familiar incantation: > > # echo 3 > /proc/sys/vm/drop_caches > > As far as I can tell, we don't even *get* the read until we do this. > Therefore we can't fail it. Instead, the kernel returns the version that we > had written previously. We could prevent that by checking for split-brain > on open, but we don't seem to do that. Perhaps this is related to the fact > that NFS might not do an open before a read, so the emphasis has been on > checking in the read path - which we don't get to in this case. Just a > theory. In any case, maybe there are some clues that someone more familiar > with AFR can pursue. So interestingly, I tried the drop caches a few different ways previously (at different points in the reproducer process), and it didn't help. I'm going to try again and see if maybe I missed something before... For my original reproducer, if I insert the cache drop where I logically think it should go in step 5:
5. Correct the network split and stat the file from the client:
#!/bin/bash
exe() { echo "\$ $@" ; "$@" ; }
if [ $HOSTNAME == "n1" ]; then
echo "Correcting network split with iptables..."
exe iptables -F OUTPUT
echo "Dropping caches due to BZ 1229226..."
echo 3 > /proc/sys/vm/drop_caches
echo "Statting file002 to induce heal..."
exe stat /rhgs/client/rep01/file002
else
echo "Wrong host!"
fi
It does _not_ correct the problem.
It also doesn't help if I put the cache drop in step 2 just after modifying the file.
(In reply to Dustin Black from comment #6) > It does _not_ correct the problem. Nevermind; ignore me. Too little sleep... Dropping the caches before reading the file after the split is resolved does work. The 'ls' command still completes without error, but a 'cat' results in the expected EIO. This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release. |