Bug 1170942 - More than redundancy bricks down, leads to the persistent write return IO error, then the whole file can not be read/write any longer, even all bricks going up
Summary: More than redundancy bricks down, leads to the persistent write return IO err...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: GlusterFS
Classification: Community
Component: disperse
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Xavi Hernandez
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-12-05 07:32 UTC by jiademing.dd
Modified: 2017-08-24 07:19 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-24 07:19:21 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description jiademing.dd 2014-12-05 07:32:08 UTC
Description of problem:
   More than redundancy bricks down, leads to the persistent write return IO error.We can accept this result, but after that,  the whole file can not be read/write any longer, even all bricks going up.

Version-Release number of selected component (if applicable):
3.6.1

How reproducible:


Steps to Reproduce:
1.I create a distribute-disperse volume test
Volume Name: test
Type: Distributed-Disperse
Volume ID: 17149c08-fba6-4061-892f-f815aecff1c9
Status: Started
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: node-1:/sda
Brick2: node-1:/sdb
Brick3: node-1:/sdc
Brick4: node-2:/sda
Brick5: node-2:/sdb
Brick6: node-2:/sdc
2.I use dd if=/dev/zero of=/mountpoint/test.bak bs=1M, I know the test.bak on Brick4, Brick5 and Brick6. 
3. When persistent write, I kill Brick4, persistent write is normal.After that, I kill Brick5, then mountpoint return IO error.

Actual results:
The whole file can not be read/write any longer, even all bricks going up.


Expected results:
After bricks going up, we can read the data that wrote before IO error.

Additional info:

Comment 1 jiademing.dd 2014-12-05 07:59:25 UTC
Judging from the disperse source code, it doesn't have a good mechanism to keep consistency when write or update block error, like the XFS's journal that can provide a transaction for every write or update.

What's your meaning? Do you will have some work for this in the furture?@Xavier Hernandez

                                                                           Thanks.

Comment 2 Ashish Pandey 2017-08-24 07:19:21 UTC
Since the bug was raised, the similar test has been done several times and it is working fine.

I think this issue has been fixed. In case you face this issue again please open a new bug. Closing this bug for now.


Note You need to log in before you can comment on or make changes to this bug.