1170942 – More than redundancy bricks down, leads to the persistent write return IO error, then the whole file can not be read/write any longer, even all bricks going up

Bug 1170942 - More than redundancy bricks down, leads to the persistent write return IO error, then the whole file can not be read/write any longer, even all bricks going up

Summary: More than redundancy bricks down, leads to the persistent write return IO err...

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	disperse
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Xavi Hernandez
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-12-05 07:32 UTC by jiademing.dd
Modified:	2017-08-24 07:19 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-08-24 07:19:21 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description jiademing.dd 2014-12-05 07:32:08 UTC

Description of problem:
   More than redundancy bricks down, leads to the persistent write return IO error.We can accept this result, but after that,  the whole file can not be read/write any longer, even all bricks going up.

Version-Release number of selected component (if applicable):
3.6.1

How reproducible:


Steps to Reproduce:
1.I create a distribute-disperse volume test
Volume Name: test
Type: Distributed-Disperse
Volume ID: 17149c08-fba6-4061-892f-f815aecff1c9
Status: Started
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: node-1:/sda
Brick2: node-1:/sdb
Brick3: node-1:/sdc
Brick4: node-2:/sda
Brick5: node-2:/sdb
Brick6: node-2:/sdc
2.I use dd if=/dev/zero of=/mountpoint/test.bak bs=1M, I know the test.bak on Brick4, Brick5 and Brick6. 
3. When persistent write, I kill Brick4, persistent write is normal.After that, I kill Brick5, then mountpoint return IO error.

Actual results:
The whole file can not be read/write any longer, even all bricks going up.


Expected results:
After bricks going up, we can read the data that wrote before IO error.

Additional info:

Comment 1 jiademing.dd 2014-12-05 07:59:25 UTC

Judging from the disperse source code, it doesn't have a good mechanism to keep consistency when write or update block error, like the XFS's journal that can provide a transaction for every write or update.

What's your meaning? Do you will have some work for this in the furture?@Xavier Hernandez

                                                                           Thanks.

Comment 2 Ashish Pandey 2017-08-24 07:19:21 UTC

Since the bug was raised, the similar test has been done several times and it is working fine.

I think this issue has been fixed. In case you face this issue again please open a new bug. Closing this bug for now.

Note You need to log in before you can comment on or make changes to this bug.