Bug 765265 (GLUSTER-3533)
Summary: | Go read-only if quorum not met | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Jeff Darcy <jdarcy> |
Component: | replicate | Assignee: | Jeff Darcy <jdarcy> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Raghavendra Bhat <rabhat> |
Severity: | low | Docs Contact: | |
Priority: | medium | ||
Version: | mainline | CC: | amarts, gluster-bugs, vijay |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.4.0 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-07-24 17:27:08 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | glusterfs-3.3.0qa45 | Category: | --- |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 817967 |
Description
Jeff Darcy
2011-09-09 15:45:13 UTC
FWIW, I've pushed my local patch for this to Gerrit. It's not fully baked, but probably does a better job than mere words can to explain where I think we need to go with this. http://review.gluster.com/#change,473 Some questions: 1) Given that one of the use cases for AFR is to handle N-1 failures, would it be better to make this behavior optional? Or have the quorum number configurable with a default value of 1? 2) How do we expect to handle split-brains that may arise out of the situation where a modify FOP is allowed and child down(s) is/are sensed later before the modify FOP reaches the server? The chance of this happening is not very low given that we require ping-timeout interval to determine a server being unreachable unless a rpc disconnection is sensed. (In reply to comment #2) > Some questions: > > 1) Given that one of the use cases for AFR is to handle N-1 failures, would it > be better to make this behavior optional? Or have the quorum number > configurable with a default value of 1? > > 2) How do we expect to handle split-brains that may arise out of the situation > where a modify FOP is allowed and child down(s) is/are sensed later before the > modify FOP reaches the server? The chance of this happening is not very low > given that we require ping-timeout interval to determine a server being > unreachable unless a rpc disconnection is sensed. (1) Yes, it absolutely should be optional. Joe actually suggested there should be three options: no quorum enforcement, quorum enforcement for writes, quorum enforcement for everything. (2) If a modify FOP is allowed (according to quorum rules) and subsequently fails, that ends up being the same case as if quorum had never been enforced. This does mean there's still a slight chance of split brain, but it should be much reduced - the window is approximately 30 seconds to detect a partition vs. potentially hours (even days) that the partition might persist. CHANGE: http://review.gluster.com/473 (Change-Id: I2f123ef93989862aa796903a45682981d5d7fc3c) merged in master by Vijay Bellur (vijay) Checked with glusterfs-3.3.0qa45 and quorum enforcement works properly with EROFS error being propagated for any modify operations. root@hyperspace:/mnt/client# gluster volume set mirror quorum-type auto Set volume successful root@hyperspace:/mnt/client# gluster volume set mirror quorum-count 2 Set volume successful root@hyperspace:/mnt/client# cd root@hyperspace:~# cd - /mnt/client root@hyperspace:/mnt/client# root@hyperspace:/mnt/client# ls root@hyperspace:/mnt/client# dd if=/dev/urandom of=k bs=10k count=22 dd: opening `k': Read-only file system root@hyperspace:/mnt/client# ls root@hyperspace:/mnt/client# ls root@hyperspace:/mnt/client# dd if=k of=/tmp/kkk bs=10k count=22 dd: opening `k': No such file or directory root@hyperspace:/mnt/client# touch new touch: cannot touch `new': Read-only file system root@hyperspace:/mnt/client# gluster volume info mirror Volume Name: mirror Type: Replicate Volume ID: 3382aaa7-37d0-4fab-bd3c-dc9a7a350acf Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: hyperspace:/mnt/sda7/export3 Brick2: hyperspace:/mnt/sda8/export3 Brick3: hyperspace:/mnt/sda7/last35 Options Reconfigured: cluster.quorum-type: auto cluster.quorum-count: 2 features.lock-heal: on features.quota: on features.limit-usage: /:22GB diagnostics.latency-measurement: on diagnostics.count-fop-hits: on geo-replication.indexing: on performance.stat-prefetch: on root@hyperspace:/mnt/client# |