1367285 – EINVAL errors for write, when there are write stalls before and a lookup post a rebalance of the file

Bug 1367285 - EINVAL errors for write, when there are write stalls before and a lookup post a rebalance of the file

Summary: EINVAL errors for write, when there are write stalls before and a lookup post...

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	3.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Raghavendra G
QA Contact:
Docs Contact:
URL:
Whiteboard:	dht-multiple-migration, dht-qe-3.2, d...
Depends On:	1059687 1286150 1367266 1463907
Blocks:	1035040
TreeView+	depends on / blocked

Reported:	2016-08-16 06:35 UTC by Raghavendra G
Modified:	2017-08-29 05:20 UTC (History)
CC List:	20 users (show)
Fixed In Version:
Clone Of:	1367266
Environment:
Last Closed:	2017-08-29 05:19:06 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Raghavendra G 2016-08-16 06:35:44 UTC

+++ This bug was initially created as a clone of Bug #1367266 +++

+++ This bug was initially created as a clone of Bug #1286150 +++

+++ This bug was initially created as a clone of Bug #1059687 +++

+++ This bug was initially created as a clone of Bug #1054782 +++

2) File missing
---------------

The root cause of this issue is due to the following scenario (triggered due to the way we were reproducing the bug, but can happen anyway),
- Application starts a write on a file
- Remove-brick(rebalance) is triggered on the sub volume where the file resides
- During the actual file migration by rebalance there is no write IOs happening (this happens with RHOS as the source of the file being copied is from the web, hence there are write stalls, observable using some fop logging on writes)
- Post the migration (which takes about 3-5 seconds on the RHOS setup), say a lookup on the same file is triggered (which we were triggering to check if the file is growing in size using a ls -l on the file, hence the statement on triggered during reproduction of the bug), then the files caches sub volume changes to the new sub volume, but the fd that we hold is of the older sub volume
- A write by the application post this, triggers a fd, sub vol  mismatch due to the above step resulting in EINVAL from the layers below (logs as below seen in the client mount logs)

In (2) there is no data corruption, as an error is sent back to the application and glance in this case decides to remove this image from its store as there is a failure in create_image.

This is simulated in a simple bash open file case as well.

(2) will be forked into a separate bug for further analysis on possible fixes.

For (1) the patch up stream is now posted here and awaiting acceptance, https://code.engineering.redhat.com/gerrit/#/c/19107/1

--- Additional comment from RHEL Product and Program Management on 2014-01-30 16:15:27 MVT ---

Since this issue was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from Shalaka on 2014-01-31 11:09:12 MVT ---

Please add doc text for this known issue.

--- Additional comment from Shalaka on 2014-02-19 15:13:26 MVT ---

Edited the doc text.

--- Additional comment from Shyamsundar on 2014-02-20 10:09:40 MVT ---

Doc text looks good.

--- Additional comment from John Skeoch on 2014-02-27 05:12:49 MVT ---

User srangana's account has been closed

--- Additional comment from John Skeoch on 2014-02-27 05:14:17 MVT ---

User srangana's account has been closed

--- Additional comment from John Skeoch on 2014-03-31 06:35:20 MVT ---

User vraman's account has been closed

--- Additional comment from Susant Kumar Palai on 2015-11-27 17:04:23 MVT ---

Cloning this to 3.1. To be fixed in future release.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-11-27 07:04:59 EST ---

This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-08-09 07:19:07 EDT ---

Since this bug has been approved for the RHGS 3.2.0 release of Red Hat Gluster Storage 3, through release flag 'rhgs-3.2.0+', and through the Internal Whiteboard entry of '3.2.0', the Target Release is being automatically set to 'RHGS 3.2.0'

Comment 1 Niels de Vos 2016-09-12 05:38:04 UTC

All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html

Comment 2 Nithya Balachandran 2017-08-29 05:19:06 UTC

This will be fixed in 3.12 . Closing this BZ with resolution Deferred.

Comment 3 Nithya Balachandran 2017-08-29 05:20:36 UTC

This will be fixed by: https://review.gluster.org/#/c/17995/

Note You need to log in before you can comment on or make changes to this bug.

amukherj
asriram
bugs
grajaiya
kaushal
nbalacha
nlevinki
rcyriac
rgowdapp
rhinduja
rhs-bugs
rwheeler
sankarshan
sasundar
smanjara
smohan
spalai
srangana
storage-qa-internal
vbellur