Description of problem: ======================= Volume start is failing with " Commit failed" when brick is added to stopped state volume. Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.5-5 How reproducible: ================= 100% Steps to Reproduce: =================== 1.Have one node cluster 2.Create a volume of type Distributed (1*1) // ** DON'T START THE VOLUME ** 3.Add a new brick 4.Start the volume now // it will fail Actual results: =============== Volume start is failing with "Commit failed" Expected results: ================ Volume start should work without any issue. Additional info:
Here goes the RCA for the same: the add-brick code path has caused a regression when brick(s) are added into a volume which is not started. The issue here is although the add-brick throws up a success message it doesn't generate the volfiles because of which at the time of start brick process in volume start the __server_getspec at glusterd fails since the brick volfile doesn't exist.
Additional info to reproduce the issue: ======================================= Update node from 3.1.1 build to 3.1.2 Then follow the remaining steps specified in description section.
Add-brick implementation has been changed to v3 framework from glusterfs-3.7.6 in upstream. if cluster is running a version equal to or less than GLUSTERFS_3_7_5 , we will fall back into the older implementation. This bug is in the fall back code, where it complete the commit phase without creating the volfiles for newly added brick.
The fix is already part of rhgs-3.1.2 as per comment 6, moving it to ON_QA
Verified this bug using the build - glusterfs-3.8.4-5.el7rhgs.x86_64 Fix is working good, reported issue no more exist. Moving to verified state.