This bug is somewhat similar to Bug 1624738(only exception - heketi-cli blockvolume list doesnt display the ghost block device IDs) and https://bugzilla.redhat.com/show_bug.cgi?id=1634745 Description of problem: ++++++++++++++++++++++++++ TC being run = CNS-1285 - Target side failures -Brick failure on block hosting volume # oc get pods NAME READY STATUS RESTARTS AGE cirrosblock1-1-9r7lv 1/1 Running 0 1h glusterblock-storage-provisioner-dc-1-cvbx9 1/1 Running 3 4d glusterfs-storage-g4slk 1/1 Running 2 4d glusterfs-storage-jc66v 1/1 Running 0 3h glusterfs-storage-rz6zt 1/1 Running 2 4d glusterfs-storage-z22n9 1/1 Running 2 4d heketi-storage-1-6fwjq 1/1 Running 3 4d glusterfs-storage-g4slk Steps Performed ----------------- 1. Created 10 BVs(block pvcs) on a BHV(vol_2b5bc5e6bd4036c82e9e93846d92e13f) of 100GB size and it was successful. Free size left on BHV=70GB. 2. Started a loop to create 10 more block pvcs (name - new-block{1..10} for i in {1..10}; do ./pvc-create.sh new-block$i 3; date ; sleep 2; done Start time = Wed Oct 3 19:12:42 IST 2018 end time = Wed Oct 3 19:13:02 IST 2018 3. Immediately after starting Step#2, killed 2 bricks of the BHV. 4. As expected, the pvcs stayed in pending state as 2 bricks were down. Issue seen in current build: ============================= 5. Chekced in heketi logs, "ghost" BV ids were still getting created and this ultimately brought down the free space of the BHV vol_2b5bc5e6bd4036c82e9e93846d92e13f from 70GB to 1 GB 6. Once the first BHV had only 1 GB free due to the creation of numerous Ghost IDs(as seen from heketi logs), a new BHV(vol_cf75072b5ef16c3b8e85f5fd3b4cab58) was created. 7. The pending pvc requests were then fulfilled from the new BHV and all went to BOUND state, even though the first BHV still has 2 bricks DOWN. 8. The Free space at gluster backend mismatches with the free space of the first BHV in heketi. 9. For BHV vol_2b5bc5e6bd4036c82e9e93846d92e13f , 2 pods show 32GB used but one pod- glusterfs-storage-g4slk shows 23GB. This is also a bug mismatch between the 3 bricks of the same volume. Observations for similar test steps in earlier versions of heketi =================================================================== Once 2 bricks of a BHV were killed, the pvc requests used to stay in pending state till the BHV bricks were again brought back up. No inflight or ghost "BV IDS" used to be seen in heketi dump. Also, a second BHV was never created as the free space of first BHV(whose 2 bricks are down) was still intact. Version-Release number of selected component (if applicable): +++++++++++++++++++++++++++++++++++++++++++++++++ OC version = v3.11.15 Heketi version from heketi pod = ++++++++ sh-4.2# rpm -qa|grep heketi heketi-client-7.0.0-13.el7rhgs.x86_64 heketi-7.0.0-13.el7rhgs.x86_64 Heketi client version from master node +++++ # rpm -qa|grep heketi heketi-client-7.0.0-13.el7rhgs.x86_64 Gluster version ++++++ sh-4.2# rpm -qa|grep gluster glusterfs-libs-3.12.2-18.1.el7rhgs.x86_64 glusterfs-3.12.2-18.1.el7rhgs.x86_64 glusterfs-api-3.12.2-18.1.el7rhgs.x86_64 python2-gluster-3.12.2-18.1.el7rhgs.x86_64 glusterfs-fuse-3.12.2-18.1.el7rhgs.x86_64 glusterfs-server-3.12.2-18.1.el7rhgs.x86_64 gluster-block-0.2.1-27.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-18.1.el7rhgs.x86_64 glusterfs-cli-3.12.2-18.1.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-18.1.el7rhgs.x86_64 sh-4.2# rpm -qa|grep tcmu-runner tcmu-runner-1.2.0-25.el7rhgs.x86_64 sh-4.2# How reproducible: ++++++++++ Tried only once as of now. Steps to Reproduce: +++++++++++++++++ 1. Create a few block pvcs and check the free size of the BHV 2. Start a loop to create around 10 pvcs 3. immediately kill 2 bricks of the BHV and check the pvc status. 4. Check heketi logs for the spurious BV's still getting created and entries being present in heketi db dump. #heketi-cli server operations info 5. Check the used space of the BHV from heketi - its shows all space used up. 5. Check that a new BHV is created and the pending pvcs are carved from this new BHV Actual results: +++++++++++ Since the 2 bricks of a BHV are down, the BHV is getting filled up with "Ghost" BV ids and ultimately a new BHV is getting created to service the pending pvc. Thus, the space is first BHV is assumed to be full though on gluster backend it still refelcts enough free space. Expected results: ++++++++++++ As we used to see in older runs of these Test cases, the pvcs used to stay in Pending state and no spurious BVs were seen in heketi db dump. Thus, the space of the original BHV was kept intact. Once the bricks were brought ONLINE, the pending pvcs were carved out from the same first BHV.
Proposing this bug as a blocker. QE feels this is a release blocker as this comes under heketi stability category. We've had several unhappy customers for the same reason and providing a clean up script isn't really helping.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0286
I see that test case has already been added to this. So setting the qe_test_coverage flag to '+'