Bug 1404982
Summary: | VM pauses due to storage I/O error, when one of the data brick is down with arbiter volume/replica volume | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | RamaKasturi <knarra> | |
Component: | arbiter | Assignee: | Ravishankar N <ravishankar> | |
Status: | CLOSED ERRATA | QA Contact: | RamaKasturi <knarra> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.2 | CC: | amukherj, pkarampu, rcyriac, rhinduja, rhs-bugs, sasundar, storage-qa-internal | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.2.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.8.4-10 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1406224 (view as bug list) | Environment: | ||
Last Closed: | 2017-03-23 05:57:02 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1277939, 1351528, 1398331, 1406224, 1408171 |
Description
RamaKasturi
2016-12-15 10:04:48 UTC
sosreports can be found in the link below: ================================================ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/HC/1404982/ Thanks Kasturi for providing the setup for testing and thanks Satheesaran for providing virsh based commands for re-creating the issue. The isse is due to a race between inode_refresh_done() and __afr_set_in_flight_sb_status() that occurs when I/O is going on and a brick is brought down or up. When the brick goes up/ comes down, inode refresh is triggered in the write transaction and sets the correct data/metadata readable and event_generation in inode_refresh_done(). But before it can proceed to the write FOP, __afr_set_in_flight_sb_status() from another writev cbk resets the event_generation. When the first write (that follows the inode refresh) gets the event_gen in afr_inode_get_readable(), it gets zero because of which it fails the write with EIO. While ignoring event_generation seems to fix the issue ----------------------------------------------------------- diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c index 60bae18..2f32e44 100644 --- a/xlators/cluster/afr/src/afr-common.c +++ b/xlators/cluster/afr/src/afr-common.c @@ -1089,7 +1089,7 @@ afr_txn_refresh_done (call_frame_t *frame, xlator_t *this, int err) &event_generation, local->transaction.type); - if (ret == -EIO || !event_generation) { + if (ret == -EIO){ /* No readable subvolume even after refresh ==> splitbrain.*/ if (!priv->fav_child_policy) { err = -EIO; ----------------------------------------------------------- I need to convince myself that ignoring event gen in afr_txn_refresh_done() is for reads (there is no prob in ignoring it for writes)does not have any repercussions. upstream mainline patch http://review.gluster.org/16205 posted for review. I have seen the same issue with replica 3 volume as well, and updated the bug summary accordingly will verify this bug once the bug https://bugzilla.redhat.com/show_bug.cgi?id=1400057 is fixed. with out this bug fix i see that there are some entries which still remains in the heal info and does not go away. verified and works fine with build glusterfs-3.8.4-11.el7rhgs.x86_64. With arbiter volume: ========================= 1) Deployed HC stack on arbiter volumes. 2) created a vm attached a disk from data arbiter volume. 3) mounted the disk at /mnt/testdata 4) started writing I/O. 5) Once the I/O starts brought down the first data brick in the volume I/O did not stop and vm did not go to pause state. volume info from arbiter volume: =================================== [root@rhsqa-grafton4 ~]# gluster volume info data Volume Name: data Type: Replicate Volume ID: b37f7c59-c9e3-4b04-97fe-39b4d462d5c1 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: 10.70.36.82:/rhgs/brick2/data Brick2: 10.70.36.83:/rhgs/brick2/data Brick3: 10.70.36.84:/rhgs/brick2/data (arbiter) Options Reconfigured: auth.ssl-allow: 10.70.36.84,10.70.36.82,10.70.36.83 server.ssl: on client.ssl: on cluster.granular-entry-heal: on user.cifs: off network.ping-timeout: 30 performance.strict-o-direct: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular performance.low-prio-threads: 32 features.shard-block-size: 4MB storage.owner-gid: 36 storage.owner-uid: 36 cluster.data-self-heal-algorithm: full features.shard: on cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: off cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet performance.readdir-ahead: on nfs.disable: on with replicate volume: ========================= 1) Deployed HC stack on replicate volumes. 2) created a vm attached a disk from data volume. 3) mounted the disk at /mnt/testdata 4) started writing I/O. 5) Once the I/O starts brought down the first data brick in the volume I/O did not stop and vm did not go to pause state. volume info for replicate volume: ===================================== [root@rhsqa-grafton1 ~]# gluster volume info data Volume Name: data Type: Replicate Volume ID: 29d01e0f-bec3-4e68-bbef-4011d95fea4a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.70.36.79:/rhgs/brick2/data Brick2: 10.70.36.80:/rhgs/brick2/data Brick3: 10.70.36.81:/rhgs/brick2/data Options Reconfigured: auth.ssl-allow: 10.70.36.80,10.70.36.79,10.70.36.81 server.ssl: on client.ssl: on cluster.use-compound-fops: on cluster.granular-entry-heal: on user.cifs: off network.ping-timeout: 30 performance.strict-o-direct: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular performance.low-prio-threads: 32 features.shard-block-size: 4MB storage.owner-gid: 36 storage.owner-uid: 36 cluster.data-self-heal-algorithm: full features.shard: on cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: off cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet performance.readdir-ahead: on nfs.disable: on Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |