Bug 1634745 - Used space in BHV exceeds the size of the total block devices when tcmu-runner is DOWN on 1 node , during pvc creation
Summary: Used space in BHV exceeds the size of the total block devices when tcmu-runne...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: heketi
Version: ocs-3.11
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 3.11.1
Assignee: John Mulligan
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On: 1641668
Blocks: OCS-3.11.1-devel-triage-done 1644154
TreeView+ depends on / blocked
 
Reported: 2018-10-01 14:05 UTC by Neha Berry
Modified: 2019-02-07 10:22 UTC (History)
11 users (show)

Fixed In Version: heketi-8.0.0-4.el7rhgs
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-07 03:41:00 UTC
Embargoed:


Attachments (Terms of Use)
heketilogs_jan10 (5.29 MB, text/plain)
2019-01-10 08:54 UTC, krishnaram Karthick
no flags Details
dbdump_jan10 (127.32 KB, text/plain)
2019-01-10 09:01 UTC, krishnaram Karthick
no flags Details
dmdump attached as part of comment 33 (50.31 KB, application/zip)
2019-01-21 09:53 UTC, krishnaram Karthick
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2019:0286 0 None None None 2019-02-07 03:41:26 UTC

Description Neha Berry 2018-10-01 14:05:57 UTC
Used space in BHV exceeds the size of the total block devices when tcmu-runner is DOWN on 1 node , during pvc creation

This bug is somewhat similar to Bug 1624738(only exception - heketi-cli blockvolume list doesnt display the ghost block device IDs)

Description of problem:
++++++++++++++++++++++++++
We were testing the fix of BZ#1476730 in 3.10.0-862.14.4.el7.x86_64. 

While running a loop to create only 4 block pvcs of size 3GB each (a BHV with 1 BV of 3 GB already exists), we immediately killed tcmu-runner service on one of the 3 glusterfs pods.
As expected, the pvcs stayed in pending state. Checked from heketi-logs that it kept trying to create BVs, ultimately consuming all the space of the 100G BHV. 
The tcmu-runner service was restored successfully, but we saw following two mismatches:

1.heketi-cli volume list displays 1 BHV but gluster v list displays 2 BHVs

2. Even though only one 3GB BV exists on a 100 GB BHV, the Free space in the BHV is displayed as only 2GB

3. the heketi-cli db dump lists IDs of many BVs which are not present in either heketi-cli/gluster backend


4. The df -kh on the glusterfs pods still show 94G free on the same BHV whereas heketi shows only 2 GB free


5. heketi-cli blockvolume list displays 1 BHV but gluster-block list displays 2 BVs


6. Even when the tcmu-runner service is UP and all three glusterfs pods are UP now, the pending 4 pvc creations never completed.


7. heketi logs still keep showing error messages for BV creates and db dump shows innumerable ghost IDS.

detailed outputs provided in next comment




Version-Release number of selected component (if applicable):
+++++++++++++++++++++++++++++++++++++++++++++++++

OC version = v3.11.15

Heketi version from heketi pod =
++++++++
sh-4.2# rpm -qa|grep heketi
heketi-client-7.0.0-13.el7rhgs.x86_64
heketi-7.0.0-13.el7rhgs.x86_64



Heketi client version from master node 
+++++
# rpm -qa|grep heketi
heketi-client-7.0.0-13.el7rhgs.x86_64
 

Gluster version
++++++

sh-4.2# rpm -qa|grep gluster
glusterfs-libs-3.12.2-18.1.el7rhgs.x86_64
glusterfs-3.12.2-18.1.el7rhgs.x86_64
glusterfs-api-3.12.2-18.1.el7rhgs.x86_64
python2-gluster-3.12.2-18.1.el7rhgs.x86_64
glusterfs-fuse-3.12.2-18.1.el7rhgs.x86_64
glusterfs-server-3.12.2-18.1.el7rhgs.x86_64
gluster-block-0.2.1-27.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-18.1.el7rhgs.x86_64
glusterfs-cli-3.12.2-18.1.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-18.1.el7rhgs.x86_64


sh-4.2# rpm -qa|grep tcmu-runner
tcmu-runner-1.2.0-25.el7rhgs.x86_64
sh-4.2# 




How reproducible:
++++++++++
Tried only once as of now.

Steps to Reproduce:
+++++++++++++++++
1. Create a block pvc , which in turn creates a heketi BHV and a heketi BV 
2. Start a for loop to create 4 more pvcs - block2 - block5
3. While block2 pvc creation is not yet completed, kill the tcmu-runner process in one gluster pod  - Mon Oct  1 11:54:43 UTC 2018
4. The pvc creations go in pending state. Check heketi logs
5. Bring up the services in the pod - Mon Oct  1 12:03:53 UTC 2018
6. Check for mismatches and other issues. It is seen that ghost device IDs are listed in heketi-cli db dump and the db seems to have become inconsistent





Actual results:
+++++++++++

when we killed tcmu-runner on one pod while pvc creation is in progress(equivlent to having only 2 pods available for BV creation), the process resulted in a lot of mismatches and the content in heketi and gluster backend have become inconsistent.

Expected results:
++++++++++++

Till the time tcmu-runner was down, the pvc creations should stay in pending state. But once restored, the 4 pvcs should have been created. No mismatch should have been seen in the gluster and hekti command outputs for both BHV and BVs.

Comment 25 krishnaram Karthick 2019-01-10 08:48:33 UTC
The issue reported in the bug in still seen and there is no change in behavior after moving to the heketi container image which has the fix.

steps:
1) on block hosting volume with a capacity of 100GB, created pvcs in a loop
2) on one of the gluster pods (glusterfs-storage-vlrs7), stopped tcmu-runner service
3) after all pvc went into pending state, tcmu-runner service was recovered

heketi version:
oc rsh heketi-storage-1-fttfj 
sh-4.2# rpm -qa | grep 'heketi'
heketi-client-8.0.0-7.el7rhgs.x86_64
heketi-8.0.0-7.el7rhgs.x86_64


At this point, there is no free size in the block hosting volume although all the 5 block devices are of size 1 gb. 

heketi-cli volume info 40f08344b8f2c469f62577d11960e390
Name: vol_40f08344b8f2c469f62577d11960e390
Size: 100
Volume Id: 40f08344b8f2c469f62577d11960e390
Cluster Id: 9f295b5ab4a965e22f6d24cf73970f01
Mount: 10.70.47.92:vol_40f08344b8f2c469f62577d11960e390
Mount Options: backup-volfile-servers=10.70.47.196,10.70.46.72
Block: true
Free Size: 0
Reserved Size: 2
Block Hosting Restriction: (none)
Block Volumes: [4412ec956dcac182a6fa1ab92bec9cb4 4d73f4cb2b18e04712d52d30549366e2 a466f48423d82c61ccae9f8a5e00987d aae914c61ed80b3a13ffa748a8434ccd c7723101c4ebad953ec7ae0cd064697a]
Durability Type: replicate
Distributed+Replica: 3

New blockvolume creation fails. 

#heketi-cli blockvolume create --size=1
Error: Failed to allocate new block volume: No space

df -kh output indeed confirms that the free space is available 
10.70.47.92:vol_40f08344b8f2c469f62577d11960e390  100G  6.1G   94G   7% /mnt

Moving the bug back to assigned. This issue is fairly staright forward to reproduce, so I'm not sharing the setup details. heketi logs shall be attached.

Comment 26 krishnaram Karthick 2019-01-10 08:54:05 UTC
Created attachment 1519674 [details]
heketilogs_jan10

Comment 27 krishnaram Karthick 2019-01-10 09:01:13 UTC
Created attachment 1519675 [details]
dbdump_jan10

Comment 28 Michael Adam 2019-01-10 10:22:11 UTC
@Karthick,

since the auto-cleanup is mentioned as the fix for this, it's not expected to immediately be fixed after the situation has occurred, but instead:

1) after the timeout interval for the auto-cleanup loop which defaults to one hour, so it could take up to an hour.
2) after manually running `heketi-cli  server operations cleanup`.

Please do the following:

1) capture the db (by doing `heketi-cli db dump`)
2) then run `heketi-cli server operations cleanup`
3) check whether the problem is gone

Thanks - Michael

Comment 29 Michael Adam 2019-01-11 16:24:23 UTC
(In reply to krishnaram Karthick from comment #25)
> The issue reported in the bug in still seen and there is no change in
> behavior after moving to the heketi container image which has the fix.
> 
> steps:
> 1) on block hosting volume with a capacity of 100GB, created pvcs in a loop
> 2) on one of the gluster pods (glusterfs-storage-vlrs7), stopped tcmu-runner
> service
> 3) after all pvc went into pending state, tcmu-runner service was recovered
> 
> heketi version:
> oc rsh heketi-storage-1-fttfj 
> sh-4.2# rpm -qa | grep 'heketi'
> heketi-client-8.0.0-7.el7rhgs.x86_64
> heketi-8.0.0-7.el7rhgs.x86_64
> 
> 
> At this point, there is no free size in the block hosting volume although
> all the 5 block devices are of size 1 gb. 

I had another look at the attached DB (just realized that you had a DB attached...)
I actually took it, imported it locally, and ran heketi and a few tools and tests.

It does not seem to align at all with your statement above:

There are 81 block volumes, distributed across 2 BHVs.
Furthermore there are two pending BV create ops, one
with a BHV create. 

The size counts of the two existing BHVs are perfectly fine:

```
$ heketi-cli volume info 40f08344b8f2c469f62577d11960e390                                                                                                      
Name: vol_40f08344b8f2c469f62577d11960e390
Size: 100                                                                                                                                                                                                                                     
Volume Id: 40f08344b8f2c469f62577d11960e390
Cluster Id: 9f295b5ab4a965e22f6d24cf73970f01
Mount: 10.70.47.92:vol_40f08344b8f2c469f62577d11960e390
Mount Options: backup-volfile-servers=10.70.47.196,10.70.46.72
Block: true
Free Size: 52
Reserved Size: 2
Block Hosting Restriction: (none)
Block Volumes: [005250ab116fa72c5f592628f05677e0 016c4e4c1cfc6bfb0ad53dd83d49c95e 0180ea974272467f4a6bf0e9ade75cb1 06a7abdfc54bec432adf409ee07d5313 083fee8254ad5e0f6e2277b77addcf5f 0d72aaa9a76cd0f80cf1d05425d36985 1e2b3c191c5376f1b49056d23
e164c0c 1e8edad9a2067c87bcca57e3efb2c263 21018efda8c64aa39836e253fea1178f 2a5cb67910af9cee0fb83f84a5c453d6 2b6809aec55b2282258242c33bdef83b 2f37509ea5eda3e41345cd764e795a40 31fdc02be250a8079f2f64723d836fa7 335306dfdb3af39edf3afa80840521a4
37a14ce1512f4044b152f5fb6f5a746e 4412ec956dcac182a6fa1ab92bec9cb4 45f218daf5ed1e1ee2a4bba222a6c08c 47165cd1fd28983cfca968f984d6894c 4d73f4cb2b18e04712d52d30549366e2 53b8616186c72fb51522fdceeff83485 57f64c520f70838578ec2e986188ef0a 5b3492350e5f16d824893142cad5e4e4 63bd1118bf9d36af763f26558a51a3e2 6ec88940bfef9f8bdc6cf90dbc04b0cf 8eeaab313771d363bcad7fa984be6217 9518e0f1df331269b42c60bd3f29673e 96e6ec12a562c944792a0b84391ad861 a466f48423d82c61ccae9f8a5e00987d a4f31951ecda299f
d75b9900be6063bb aae914c61ed80b3a13ffa748a8434ccd b43a291c5431c636d2b71f0650ff3fba b4d67653ef5842ddfdf21e30971e3485 b683056c344c27be9a40ab96f50ce15d bed791367f2f8bca01c3e2defa746a0a c3aa0eb0be9f6f69a38aa1441174a3f9 c7723101c4ebad953ec7ae0cd064697a c9b73c67b700844d5349638709ea4d9d c9e275973e46c8ffa7420a2b774ba7fd cbce9bb51ed4a1c3122122847b20be26 d6b31aaf3724006d63d9ca4fba7b84a9 dbb5c5071de071df26c650a0757ece31 e203fe042b12836da36954dd97c1c2cb e545041bbcb079cf797c2036a923a414 eab9bddd9b6941ee1a6ff22e4241960e faec1345516ffd6fcb9eb6c2a9e0c5ed fe7b46575778911a780d6a719117f1e5]                                                                                                                                          
Durability Type: replicate                                                                                                                                                                                                                    
Distributed+Replica: 3

$ heketi-cli volume info 40f08344b8f2c469f62577d11960e390
Name: vol_40f08344b8f2c469f62577d11960e390
Size: 100
Volume Id: 40f08344b8f2c469f62577d11960e390
Cluster Id: 9f295b5ab4a965e22f6d24cf73970f01
Mount: 10.70.47.92:vol_40f08344b8f2c469f62577d11960e390
Mount Options: backup-volfile-servers=10.70.47.196,10.70.46.72
Block: true
Free Size: 52
Reserved Size: 2
Block Hosting Restriction: (none)
Block Volumes: [005250ab116fa72c5f592628f05677e0 016c4e4c1cfc6bfb0ad53dd83d49c95e 0180ea974272467f4a6bf0e9ade75cb1 06a7abdfc54bec432adf409ee07d5313 083fee8254ad5e0f6e2277b77addcf5f 0d72aaa9a76cd0f80cf1d05425d36985 1e2b3c191c5376f1b49056d23
e164c0c 1e8edad9a2067c87bcca57e3efb2c263 21018efda8c64aa39836e253fea1178f 2a5cb67910af9cee0fb83f84a5c453d6 2b6809aec55b2282258242c33bdef83b 2f37509ea5eda3e41345cd764e795a40 31fdc02be250a8079f2f64723d836fa7 335306dfdb3af39edf3afa80840521a4
37a14ce1512f4044b152f5fb6f5a746e 4412ec956dcac182a6fa1ab92bec9cb4 45f218daf5ed1e1ee2a4bba222a6c08c 47165cd1fd28983cfca968f984d6894c 4d73f4cb2b18e04712d52d30549366e2 53b8616186c72fb51522fdceeff83485 57f64c520f70838578ec2e986188ef0a 5b349235
0e5f16d824893142cad5e4e4 63bd1118bf9d36af763f26558a51a3e2 6ec88940bfef9f8bdc6cf90dbc04b0cf 8eeaab313771d363bcad7fa984be6217 9518e0f1df331269b42c60bd3f29673e 96e6ec12a562c944792a0b84391ad861 a466f48423d82c61ccae9f8a5e00987d a4f31951ecda299f
d75b9900be6063bb aae914c61ed80b3a13ffa748a8434ccd b43a291c5431c636d2b71f0650ff3fba b4d67653ef5842ddfdf21e30971e3485 b683056c344c27be9a40ab96f50ce15d bed791367f2f8bca01c3e2defa746a0a c3aa0eb0be9f6f69a38aa1441174a3f9 c7723101c4ebad953ec7ae0c
d064697a c9b73c67b700844d5349638709ea4d9d c9e275973e46c8ffa7420a2b774ba7fd cbce9bb51ed4a1c3122122847b20be26 d6b31aaf3724006d63d9ca4fba7b84a9 dbb5c5071de071df26c650a0757ece31 e203fe042b12836da36954dd97c1c2cb e545041bbcb079cf797c2036a923a414
 eab9bddd9b6941ee1a6ff22e4241960e faec1345516ffd6fcb9eb6c2a9e0c5ed fe7b46575778911a780d6a719117f1e5]
Durability Type: replicate
Distributed+Replica: 3

```

If I run heketi here, then after a while the two pending ops are gone.

If the server (heketi) was not restarted, then it would take some longer
until the cleanup would be triggered. Hence the recommendation to run
cleanup manually to see if this was in principle ok.


I guess this is what happened:
==============================

When you brought down the tcmu-runner service, a few BV create requests
went into pending state. They all nominally already subtract space from the
BHVs. But not so that the whole space of the BHV should be consumed.
(Since due to throttling, we should at most have 8 BV create requests
pending at a time.)

When you re-enabled the tcmu-runner service, the pending BV creates
that were pending and not failed, were continuing due to their heketi
internal retry loop, and would have succeeded. (The BV with BHV create 
request seems to have gone into failed state though.) Kube's retry 
would probly also keep filing new BV create request.

I don't have a full explanation why you would have seen that 0 free space.
A dump of the DB at the time when the zero free size was seen might
have been helpful. The dumps provided seem to be from some later time
when the system had already (mostly) recovered.

The GOOD news is that the system has recovered!

I would leave it up to JohnM to comment here too, maybe he
has an explanation of the 0 free size.



karthick:

In case you run this again, could you please do the following:

1) capture the DB right before you re-enable the tcmu-runner
2) capture the db shortly after you re-enable the tcmu-runner
3) run the `heketi-cli server operations cleanup` after enabling the tcmu-runner

Cheers - Michael


> heketi-cli volume info 40f08344b8f2c469f62577d11960e390
> Name: vol_40f08344b8f2c469f62577d11960e390
> Size: 100
> Volume Id: 40f08344b8f2c469f62577d11960e390
> Cluster Id: 9f295b5ab4a965e22f6d24cf73970f01
> Mount: 10.70.47.92:vol_40f08344b8f2c469f62577d11960e390
> Mount Options: backup-volfile-servers=10.70.47.196,10.70.46.72
> Block: true
> Free Size: 0
> Reserved Size: 2
> Block Hosting Restriction: (none)
> Block Volumes: [4412ec956dcac182a6fa1ab92bec9cb4
> 4d73f4cb2b18e04712d52d30549366e2 a466f48423d82c61ccae9f8a5e00987d
> aae914c61ed80b3a13ffa748a8434ccd c7723101c4ebad953ec7ae0cd064697a]
> Durability Type: replicate
> Distributed+Replica: 3
> 
> New blockvolume creation fails. 
> 
> #heketi-cli blockvolume create --size=1
> Error: Failed to allocate new block volume: No space
> 
> df -kh output indeed confirms that the free space is available 
> 10.70.47.92:vol_40f08344b8f2c469f62577d11960e390  100G  6.1G   94G   7% /mnt
> 
> Moving the bug back to assigned. This issue is fairly staright forward to
> reproduce, so I'm not sharing the setup details. heketi logs shall be
> attached.

Comment 30 krishnaram Karthick 2019-01-16 12:34:46 UTC
@Michael, 

I did have a look at the system after couple of hours and noticed that the auto-cleanup had triggered and the clean up had taken place. 

1) All pvc in pending state had got completed and volumes were created
2) No inconsistency in db

However, an additional BHV was created. This could be probably due to the fact that the first BHV had run out of space and for the creation of subsequent block devices a new BHV was created. 

I'm leaving the needinfo flag for me to get the details you've requested for.

Comment 31 John Mulligan 2019-01-16 14:03:48 UTC
We discussed the reason for an "extra" BHV earlier today and I think I have a understanding of why that happened. Note that timing is a factor here.

I'm visualizing the block hosting volume with 3 pre-existing block volumes as such:
[ooo___]

Successfully allocating another block volume gives us:
[oooo__]

Now assume that tcmu running is stopped and gluster-block commands begin to fail. Because g-block is "broken" in such a way that heketi can't roll back the creation it can't confirm the space is released and thus the operation remains pending and the space allocated for these failed block volumes remain in the db. Two more block requests come in that fail:
[ooooxx]

Now we have a full BHV and the next request for a block volume will not fit in the existing BHV and heketi will allocate a new one:
[ooooxx]
[x_____] (bhv pending)

As long as gluster-block commands fail heketi will either reject the request because there's a new pending BHV create or create and then fail (and leave pending) block + bhv creates:
[ooooxx]
[x_____] x N (bhv pending)

Now let's assume that tcmu runner is returned to a working state but clean up has not run yet and a new block request has been sent. Because the 1st BHV is full and all others are pending/failed heketi will create a new bhv+block volume but this one will succeed:

[ooooxx]
[x_____] x N (bhv pending)
[o_____]

Then let's assume cleanup runs and completes all cleanups successfully. We'd have a layout that now looks like:
[oooo__]
[o_____]

That is, two BHVs both with free space on them.

Comment 33 krishnaram Karthick 2019-01-21 09:52:38 UTC
(In reply to Michael Adam from comment #29)
> (In reply to krishnaram Karthick from comment #25)
> > The issue reported in the bug in still seen and there is no change in
> > behavior after moving to the heketi container image which has the fix.
> > 
> > steps:
> > 1) on block hosting volume with a capacity of 100GB, created pvcs in a loop
> > 2) on one of the gluster pods (glusterfs-storage-vlrs7), stopped tcmu-runner
> > service
> > 3) after all pvc went into pending state, tcmu-runner service was recovered
> > 
> > heketi version:
> > oc rsh heketi-storage-1-fttfj 
> > sh-4.2# rpm -qa | grep 'heketi'
> > heketi-client-8.0.0-7.el7rhgs.x86_64
> > heketi-8.0.0-7.el7rhgs.x86_64
> > 
> > 
> > At this point, there is no free size in the block hosting volume although
> > all the 5 block devices are of size 1 gb. 
> 
> I had another look at the attached DB (just realized that you had a DB
> attached...)
> I actually took it, imported it locally, and ran heketi and a few tools and
> tests.
> 
> It does not seem to align at all with your statement above:
> 
> There are 81 block volumes, distributed across 2 BHVs.
> Furthermore there are two pending BV create ops, one
> with a BHV create. 
> 
> The size counts of the two existing BHVs are perfectly fine:
> 
> ```
> $ heketi-cli volume info 40f08344b8f2c469f62577d11960e390                   
> 
> Name: vol_40f08344b8f2c469f62577d11960e390
> Size: 100                                                                   
> 
> Volume Id: 40f08344b8f2c469f62577d11960e390
> Cluster Id: 9f295b5ab4a965e22f6d24cf73970f01
> Mount: 10.70.47.92:vol_40f08344b8f2c469f62577d11960e390
> Mount Options: backup-volfile-servers=10.70.47.196,10.70.46.72
> Block: true
> Free Size: 52
> Reserved Size: 2
> Block Hosting Restriction: (none)
> Block Volumes: [005250ab116fa72c5f592628f05677e0
> 016c4e4c1cfc6bfb0ad53dd83d49c95e 0180ea974272467f4a6bf0e9ade75cb1
> 06a7abdfc54bec432adf409ee07d5313 083fee8254ad5e0f6e2277b77addcf5f
> 0d72aaa9a76cd0f80cf1d05425d36985 1e2b3c191c5376f1b49056d23
> e164c0c 1e8edad9a2067c87bcca57e3efb2c263 21018efda8c64aa39836e253fea1178f
> 2a5cb67910af9cee0fb83f84a5c453d6 2b6809aec55b2282258242c33bdef83b
> 2f37509ea5eda3e41345cd764e795a40 31fdc02be250a8079f2f64723d836fa7
> 335306dfdb3af39edf3afa80840521a4
> 37a14ce1512f4044b152f5fb6f5a746e 4412ec956dcac182a6fa1ab92bec9cb4
> 45f218daf5ed1e1ee2a4bba222a6c08c 47165cd1fd28983cfca968f984d6894c
> 4d73f4cb2b18e04712d52d30549366e2 53b8616186c72fb51522fdceeff83485
> 57f64c520f70838578ec2e986188ef0a 5b3492350e5f16d824893142cad5e4e4
> 63bd1118bf9d36af763f26558a51a3e2 6ec88940bfef9f8bdc6cf90dbc04b0cf
> 8eeaab313771d363bcad7fa984be6217 9518e0f1df331269b42c60bd3f29673e
> 96e6ec12a562c944792a0b84391ad861 a466f48423d82c61ccae9f8a5e00987d
> a4f31951ecda299f
> d75b9900be6063bb aae914c61ed80b3a13ffa748a8434ccd
> b43a291c5431c636d2b71f0650ff3fba b4d67653ef5842ddfdf21e30971e3485
> b683056c344c27be9a40ab96f50ce15d bed791367f2f8bca01c3e2defa746a0a
> c3aa0eb0be9f6f69a38aa1441174a3f9 c7723101c4ebad953ec7ae0cd064697a
> c9b73c67b700844d5349638709ea4d9d c9e275973e46c8ffa7420a2b774ba7fd
> cbce9bb51ed4a1c3122122847b20be26 d6b31aaf3724006d63d9ca4fba7b84a9
> dbb5c5071de071df26c650a0757ece31 e203fe042b12836da36954dd97c1c2cb
> e545041bbcb079cf797c2036a923a414 eab9bddd9b6941ee1a6ff22e4241960e
> faec1345516ffd6fcb9eb6c2a9e0c5ed fe7b46575778911a780d6a719117f1e5]          
> 
> Durability Type: replicate                                                  
> 
> Distributed+Replica: 3
> 
> $ heketi-cli volume info 40f08344b8f2c469f62577d11960e390
> Name: vol_40f08344b8f2c469f62577d11960e390
> Size: 100
> Volume Id: 40f08344b8f2c469f62577d11960e390
> Cluster Id: 9f295b5ab4a965e22f6d24cf73970f01
> Mount: 10.70.47.92:vol_40f08344b8f2c469f62577d11960e390
> Mount Options: backup-volfile-servers=10.70.47.196,10.70.46.72
> Block: true
> Free Size: 52
> Reserved Size: 2
> Block Hosting Restriction: (none)
> Block Volumes: [005250ab116fa72c5f592628f05677e0
> 016c4e4c1cfc6bfb0ad53dd83d49c95e 0180ea974272467f4a6bf0e9ade75cb1
> 06a7abdfc54bec432adf409ee07d5313 083fee8254ad5e0f6e2277b77addcf5f
> 0d72aaa9a76cd0f80cf1d05425d36985 1e2b3c191c5376f1b49056d23
> e164c0c 1e8edad9a2067c87bcca57e3efb2c263 21018efda8c64aa39836e253fea1178f
> 2a5cb67910af9cee0fb83f84a5c453d6 2b6809aec55b2282258242c33bdef83b
> 2f37509ea5eda3e41345cd764e795a40 31fdc02be250a8079f2f64723d836fa7
> 335306dfdb3af39edf3afa80840521a4
> 37a14ce1512f4044b152f5fb6f5a746e 4412ec956dcac182a6fa1ab92bec9cb4
> 45f218daf5ed1e1ee2a4bba222a6c08c 47165cd1fd28983cfca968f984d6894c
> 4d73f4cb2b18e04712d52d30549366e2 53b8616186c72fb51522fdceeff83485
> 57f64c520f70838578ec2e986188ef0a 5b349235
> 0e5f16d824893142cad5e4e4 63bd1118bf9d36af763f26558a51a3e2
> 6ec88940bfef9f8bdc6cf90dbc04b0cf 8eeaab313771d363bcad7fa984be6217
> 9518e0f1df331269b42c60bd3f29673e 96e6ec12a562c944792a0b84391ad861
> a466f48423d82c61ccae9f8a5e00987d a4f31951ecda299f
> d75b9900be6063bb aae914c61ed80b3a13ffa748a8434ccd
> b43a291c5431c636d2b71f0650ff3fba b4d67653ef5842ddfdf21e30971e3485
> b683056c344c27be9a40ab96f50ce15d bed791367f2f8bca01c3e2defa746a0a
> c3aa0eb0be9f6f69a38aa1441174a3f9 c7723101c4ebad953ec7ae0c
> d064697a c9b73c67b700844d5349638709ea4d9d c9e275973e46c8ffa7420a2b774ba7fd
> cbce9bb51ed4a1c3122122847b20be26 d6b31aaf3724006d63d9ca4fba7b84a9
> dbb5c5071de071df26c650a0757ece31 e203fe042b12836da36954dd97c1c2cb
> e545041bbcb079cf797c2036a923a414
>  eab9bddd9b6941ee1a6ff22e4241960e faec1345516ffd6fcb9eb6c2a9e0c5ed
> fe7b46575778911a780d6a719117f1e5]
> Durability Type: replicate
> Distributed+Replica: 3
> 
> ```
> 
> If I run heketi here, then after a while the two pending ops are gone.
> 
> If the server (heketi) was not restarted, then it would take some longer
> until the cleanup would be triggered. Hence the recommendation to run
> cleanup manually to see if this was in principle ok.
> 
> 
> I guess this is what happened:
> ==============================
> 
> When you brought down the tcmu-runner service, a few BV create requests
> went into pending state. They all nominally already subtract space from the
> BHVs. But not so that the whole space of the BHV should be consumed.
> (Since due to throttling, we should at most have 8 BV create requests
> pending at a time.)
> 
> When you re-enabled the tcmu-runner service, the pending BV creates
> that were pending and not failed, were continuing due to their heketi
> internal retry loop, and would have succeeded. (The BV with BHV create 
> request seems to have gone into failed state though.) Kube's retry 
> would probly also keep filing new BV create request.
> 
> I don't have a full explanation why you would have seen that 0 free space.
> A dump of the DB at the time when the zero free size was seen might
> have been helpful. The dumps provided seem to be from some later time
> when the system had already (mostly) recovered.
> 
> The GOOD news is that the system has recovered!
> 
> I would leave it up to JohnM to comment here too, maybe he
> has an explanation of the 0 free size.
> 
> 
> 
> karthick:
> 
> In case you run this again, could you please do the following:
> 
> 1) capture the DB right before you re-enable the tcmu-runner

Available as file 'right_before_tcmu-runner_re-enabled' in the attached zip file

> 2) capture the db shortly after you re-enable the tcmu-runner

Available as file 'shortly_after_tcmu-runner_re-enabled' in the attached zip file

> 3) run the `heketi-cli server operations cleanup` after enabling the
> tcmu-runner

After recovering from all the failures and running this command, we have 2 BHV with correct used space capacity. But the additional BHV created is unsolicited. dbdump available as file 'dbdump_after_server_operations_cleanup'

Also please note that gluster-blockd had failed after failing tcmu-runner and I had to start gluster-blockd too. I've attached a dbdump after gluster-blockd was enabled too. This is available as file 'dbdump_aftergluster-blockdstart'

Thanks,
Karthick

> 
> Cheers - Michael
> 
>

Comment 34 krishnaram Karthick 2019-01-21 09:53:44 UTC
Created attachment 1522066 [details]
dmdump attached as part of comment 33

Comment 38 Michael Adam 2019-01-22 13:58:32 UTC
To summarize:

* The used space issue is fixed.
* With the way gluster-block currently deals with the tcmu-runner being down, this is as good as it gets. It's working as designed.
  (You will get the 2nd BHV for the reasons explained, but not worse than that.)

I agree it may be surprising, and unexpected, but it is not the topic of this BZ, and to avoid this, we'd need to change gluster-block in a way not 100% clear to me yet.
I suggest if you want to raise it, we need to create a BZ on gluster-block.

Karthick, can we please move this back to ON_QA?

Comment 39 krishnaram Karthick 2019-01-23 07:17:22 UTC
(In reply to Michael Adam from comment #38)
> To summarize:
> 
> * The used space issue is fixed.
> * With the way gluster-block currently deals with the tcmu-runner being
> down, this is as good as it gets. It's working as designed.
>   (You will get the 2nd BHV for the reasons explained, but not worse than
> that.)

Agree. 

> 
> I agree it may be surprising, and unexpected, but it is not the topic of
> this BZ, and to avoid this, we'd need to change gluster-block in a way not
> 100% clear to me yet.
> I suggest if you want to raise it, we need to create a BZ on gluster-block.
> 
> Karthick, can we please move this back to ON_QA?

Yes, please move it back to ON_QA. I'll raise a separate bug for the additional block hosting volume issue.

Comment 40 Michael Adam 2019-01-23 09:40:17 UTC
Thanks Karthick.
Doing as discussed.

Comment 43 errata-xmlrpc 2019-02-07 03:41:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0286


Note You need to log in before you can comment on or make changes to this bug.