Description of problem: More than redundancy bricks down, leads to the persistent write return IO error.We can accept this result, but after that, the whole file can not be read/write any longer, even all bricks going up. Version-Release number of selected component (if applicable): 3.6.1 How reproducible: Steps to Reproduce: 1.I create a distribute-disperse volume test Volume Name: test Type: Distributed-Disperse Volume ID: 17149c08-fba6-4061-892f-f815aecff1c9 Status: Started Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: node-1:/sda Brick2: node-1:/sdb Brick3: node-1:/sdc Brick4: node-2:/sda Brick5: node-2:/sdb Brick6: node-2:/sdc 2.I use dd if=/dev/zero of=/mountpoint/test.bak bs=1M, I know the test.bak on Brick4, Brick5 and Brick6. 3. When persistent write, I kill Brick4, persistent write is normal.After that, I kill Brick5, then mountpoint return IO error. Actual results: The whole file can not be read/write any longer, even all bricks going up. Expected results: After bricks going up, we can read the data that wrote before IO error. Additional info:
Judging from the disperse source code, it doesn't have a good mechanism to keep consistency when write or update block error, like the XFS's journal that can provide a transaction for every write or update. What's your meaning? Do you will have some work for this in the furture?@Xavier Hernandez Thanks.
Since the bug was raised, the similar test has been done several times and it is working fine. I think this issue has been fixed. In case you face this issue again please open a new bug. Closing this bug for now.