Bug 1662141 - On scaling gluster pods to '2' and deleting pvcs, heketi intermittently lists fewer volumes than the total volumes present with Error: Id not found
Summary: On scaling gluster pods to '2' and deleting pvcs, heketi intermittently lists...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: heketi
Version: ocs-3.11
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: OCS 3.11.1
Assignee: John Mulligan
QA Contact: vinutha
URL:
Whiteboard:
Depends On:
Blocks: 1644160
TreeView+ depends on / blocked
 
Reported: 2018-12-26 11:48 UTC by vinutha
Modified: 2019-02-08 08:55 UTC (History)
11 users (show)

Fixed In Version: heketi-8.0.0-8.el7rhgs
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-07 03:41:01 UTC
Embargoed:


Attachments (Terms of Use)
Logs for comment 14 (668.39 KB, text/plain)
2019-01-21 17:24 UTC, krishnaram Karthick
no flags Details
Logs for comment 18 (216.59 KB, application/zip)
2019-01-22 16:13 UTC, krishnaram Karthick
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2019:0286 0 None None None 2019-02-07 03:41:04 UTC

Description vinutha 2018-12-26 11:48:15 UTC
Description of problem:
Scaled down gluster pods to '2' and deleted some file pvcs and some block pvcs. The PVCs are deleted as expected. The associated pv is in 'Failed' and 'Released' state respectively. The heketi volume list command sometimes lists the entire volume list and sometimes lists a few volumes and terminates with Error: Id not found

Version-Release number of selected component (if applicable):
sh-4.2# rpm -qa| grep heketi
heketi-8.0.0-5.el7rhgs.x86_64
heketi-client-8.0.0-5.el7rhgs.x86_64

sh-4.2# rpm -qa| grep gluster
glusterfs-server-3.12.2-32.el7rhgs.x86_64
gluster-block-0.2.1-30.el7rhgs.x86_64
glusterfs-api-3.12.2-32.el7rhgs.x86_64
glusterfs-cli-3.12.2-32.el7rhgs.x86_64
python2-gluster-3.12.2-32.el7rhgs.x86_64
glusterfs-fuse-3.12.2-32.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-32.el7rhgs.x86_64
glusterfs-libs-3.12.2-32.el7rhgs.x86_64
glusterfs-3.12.2-32.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-32.el7rhgs.x86_64

# rpm -qa| grep openshift
openshift-ansible-docs-3.11.43-1.git.0.fa69a02.el7.noarch
atomic-openshift-excluder-3.11.43-1.git.0.647ac05.el7.noarch
atomic-openshift-hyperkube-3.11.43-1.git.0.647ac05.el7.x86_64
atomic-openshift-node-3.11.43-1.git.0.647ac05.el7.x86_64
openshift-ansible-playbooks-3.11.43-1.git.0.fa69a02.el7.noarch
openshift-ansible-3.11.43-1.git.0.fa69a02.el7.noarch
atomic-openshift-clients-3.11.43-1.git.0.647ac05.el7.x86_64
atomic-openshift-3.11.43-1.git.0.647ac05.el7.x86_64
openshift-ansible-roles-3.11.43-1.git.0.fa69a02.el7.noarch
atomic-openshift-docker-excluder-3.11.43-1.git.0.647ac05.el7.noarch



How reproducible:
Tried once on this setup 

Steps to Reproduce:
1. Created a OCP + OCS 3.11.51-2 setup with latest OCS builds having 60 file and 40 block volumes. All are Replica 3 volumes.

2. Scaled down gluster pods to '2' by removing the label 

3. Deleted 20 block pvc in loop. The PVC is deleted, the associated PV is in 'Released' state.

4. After step 3 is completed, deleted 10 file pvc in loop. The PVC is deleted, the corresponding PV is in 'Failed' state. 

5. The expectation is that all volumes should list from heketi. but sometimes heketi lists the entire volume list and sometimes it lists fewer volumes and terminates with Error: Id not found.  

# heketi-cli  volume list 
Id:044d8ab5e095468ee8cfdd104ab8174e    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl46_068d19d1-0777-11e9-8bdf-005056a53ec9
Id:094841502f1b419f50a1152e3a339b12    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl25_97dce1c4-0776-11e9-8bdf-005056a53ec9
Id:0a50d40ead3539c76f7261712707ec65    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_ngnix-fl2_66fecdc2-079b-11e9-8bdf-005056a53ec9
Id:1035dd900650e8f7c116553c986318bd    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl16_6b2e0f80-0776-11e9-8bdf-005056a53ec9
Id:1573d1ffe3aa0b3a4d3217c429c89429    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl11_5091e212-0776-11e9-8bdf-005056a53ec9
Id:191533290fc0d605592a5178addcaeae    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl21_845367ba-0776-11e9-8bdf-005056a53ec9
Id:1ef9306efebf1e2c9a549d38adfe54bf    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_v1_32043217-0775-11e9-8bdf-005056a53ec9
Id:27ead2edebb997e9c685a7dab691407a    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl20_7f507b44-0776-11e9-8bdf-005056a53ec9
Id:280abdc49de924166415f1bc2f583636    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_ngnix-fl4_6bf15ebe-079b-11e9-8bdf-005056a53ec9
Id:2acfe8fbab7f0c3d45172f9108db3cf8    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl8_3df04a93-0776-11e9-8bdf-005056a53ec9
Id:2bcc6f22bb6693d9c6c68c5520834e0f    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl15_6626fc46-0776-11e9-8bdf-005056a53ec9
Id:2c3a17ada265edea087c4f05dcbb212c    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl45_01686865-0777-11e9-8bdf-005056a53ec9
Id:318e065437cf152a6eae3626a94a9b60    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl38_dc7ff0c2-0776-11e9-8bdf-005056a53ec9
Id:3262ca4b0da18e9558632a9bf2f26979    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl29_aa8b6da0-0776-11e9-8bdf-005056a53ec9
Id:34688c9967ce14a07dfc2a532d392fdc    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl9_4357344c-0776-11e9-8bdf-005056a53ec9
Id:36e49e6a07fc4dc0d078390025c83874    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl35_c9a7b514-0776-11e9-8bdf-005056a53ec9
Id:384d547b0358cb9a26a2581cd996a495    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl4_2940ff44-0776-11e9-8bdf-005056a53ec9
Id:3a9393d4e515ce77733de5ba18f022bf    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl42_f2ec7cec-0776-11e9-8bdf-005056a53ec9
Id:4746d225a1be8a23416abb2a8792d23a    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl41_edd0a228-0776-11e9-8bdf-005056a53ec9
Id:4822507f5c56551361e1236b07d0ce6a    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_pgsql_postgresql-001_f0b31ff0-079b-11e9-8bdf-005056a53ec9
Id:496ebb72d96deada6f32f60de00214c3    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl34_c35035ec-0776-11e9-8bdf-005056a53ec9
Id:52f0e3a8f33213b9741d86d7ced9a786    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl33_bd97f61f-0776-11e9-8bdf-005056a53ec9
Id:6115ae3c6def2a1d7388ddf37324d67b    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl13_5c19988f-0776-11e9-8bdf-005056a53ec9
Id:615063414284307516f7f3742d293534    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl26_9bd5e846-0776-11e9-8bdf-005056a53ec9
Id:615fa04aedffcec94f1651dabc9fed3f    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl14_61bc7b72-0776-11e9-8bdf-005056a53ec9
Id:625a1c7c13175aeee72f8086873ad3fa    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl44_fc7eb1f4-0776-11e9-8bdf-005056a53ec9
Id:669b63b2516b5a160d49703e4ce3dc27    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl23_8e6b912b-0776-11e9-8bdf-005056a53ec9
Id:673ad932d59794910310330167ad7db0    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_ngnix-fl_61bb80da-079b-11e9-8bdf-005056a53ec9
Id:6a75f50f3b24deffb0751c0ef8fc7631    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl37_d75fa467-0776-11e9-8bdf-005056a53ec9
Id:6b518a57ab38086d367e6cf37358436a    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:vol_6b518a57ab38086d367e6cf37358436a [block]
Id:71de56d5a0c05bb564310829c2c972bd    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl5_2e51484f-0776-11e9-8bdf-005056a53ec9
Id:789e4c4446d2095a30a3d1afd8a5f012    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl2_1e1cc25e-0776-11e9-8bdf-005056a53ec9
Id:78abd2cbd071d7a196ea6d2d291d3647    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl24_937879b5-0776-11e9-8bdf-005056a53ec9
Id:7929ca796e7999caf9d0e223917f4bac    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl28_a57905d4-0776-11e9-8bdf-005056a53ec9
Id:7e19a9da3892f074741c63deddef8543    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl49_160f3b8a-0777-11e9-8bdf-005056a53ec9
Id:803e932693af6442ed89a309a47af07b    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl39_e41dd96f-0776-11e9-8bdf-005056a53ec9
Id:80a8b2d9e3f22f8ac07c4455fba9c9dc    Cluster:3efdba58f26dbd5038fbecbc6b092690    Name:dept-qe_glusterfs_vl3_2446d99c-0776-11e9-8bdf-005056a53ec9
Error: Id not found

============================================================================
When heketi does not list all volumes :
# heketi-cli  volume list | wc -l
Error: Id not found
37

# heketi-cli  volume list| wc -l
Error: Id not found
14


When heketi lists all the volumes :
# heketi-cli  volume list | wc -l
60


Actual results:
heketi sometimes lists all the volumes but sometimes terminates with Error: Id not found.

Expected results:
heketi should list all volumes at all times. 

Additional info:
Heketi logs, sosreports will be attached.

Comment 10 John Mulligan 2019-01-09 16:20:10 UTC
https://github.com/heketi/heketi/pull/1486

Comment 14 krishnaram Karthick 2019-01-21 17:23:00 UTC
The issue reported in the bug is still seen. with a bulk block volume deletion and killing of gluster-blockd, we end up in the state reported in the bug.

Moving the bug back to assigned. heketi logs shall be attached. 

cli output:
-----------
[root@dhcp46-173 ~]# heketi-cli blockvolume list
Id:084c2c9c44cb39962a6ef6ff7596faf2    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_084c2c9c44cb39962a6ef6ff7596faf2
Id:17fed15e5dc077efd7c2749ccda68d78    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_17fed15e5dc077efd7c2749ccda68d78
Id:22942a9cabdbbb36bd35223c56238947    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_22942a9cabdbbb36bd35223c56238947
Id:353e9ac661e741537b910c4d00635eec    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_353e9ac661e741537b910c4d00635eec
Id:39b6ce5a67311cea9ead0d036327e87f    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_39b6ce5a67311cea9ead0d036327e87f
Id:3d82b36fe4855cff97b56e3438822157    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_3d82b36fe4855cff97b56e3438822157
Id:4c90d293fe8b33339291af8ec8e801d5    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_4c90d293fe8b33339291af8ec8e801d5
Id:4d73f4cb2b18e04712d52d30549366e2    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_4d73f4cb2b18e04712d52d30549366e2
Error: Id not found
[root@dhcp46-173 ~]# heketi-cli blockvolume list
Id:17fed15e5dc077efd7c2749ccda68d78    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_17fed15e5dc077efd7c2749ccda68d78
Id:1a8e8a234e9272d922495be94f5bba96    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_1a8e8a234e9272d922495be94f5bba96
Id:22942a9cabdbbb36bd35223c56238947    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_22942a9cabdbbb36bd35223c56238947
Id:353e9ac661e741537b910c4d00635eec    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_353e9ac661e741537b910c4d00635eec
Id:3d82b36fe4855cff97b56e3438822157    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_3d82b36fe4855cff97b56e3438822157
Id:49b58be113b68a92adf395016d84c047    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_49b58be113b68a92adf395016d84c047
Id:4c90d293fe8b33339291af8ec8e801d5    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_4c90d293fe8b33339291af8ec8e801d5
Id:4d73f4cb2b18e04712d52d30549366e2    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_4d73f4cb2b18e04712d52d30549366e2
Id:6844996c4836d0dcfdff72482d2c4ed0    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_6844996c4836d0dcfdff72482d2c4ed0
Id:76b9c6631e4fdfeb0ccbe733dfe001f2    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_76b9c6631e4fdfeb0ccbe733dfe001f2
Id:9762bcaf0540d91ebcb1fb4372acc53d    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_9762bcaf0540d91ebcb1fb4372acc53d
Id:97c3cb3248d400aee107ec77e136d540    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_97c3cb3248d400aee107ec77e136d540
Id:a31948dece49a5e27eef4cffcca5a88f    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_a31948dece49a5e27eef4cffcca5a88f
Id:a4be311e815441f4ed504095c1abacc2    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_a4be311e815441f4ed504095c1abacc2
Id:ba01f86e4d0bf575e946e3a9b8dee9fb    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_ba01f86e4d0bf575e946e3a9b8dee9fb
Id:c1de75cd423bb77b93574d6fca7f4ab2    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_c1de75cd423bb77b93574d6fca7f4ab2
Id:d40846b160b52e833994ef09991df5a9    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_d40846b160b52e833994ef09991df5a9
Id:d654891466bea163c1e67866cf73e999    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_d654891466bea163c1e67866cf73e999
Id:de3988bacbfaeb19de392dcf8f30226b    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_de3988bacbfaeb19de392dcf8f30226b
Id:e00e90321c6f72b27193431a6c8a9131    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_e00e90321c6f72b27193431a6c8a9131
Id:e0338fb2f56612398025a485ce85da6c    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_e0338fb2f56612398025a485ce85da6c
Id:ee01cf500af8029d73f5b6153ca1740f    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_ee01cf500af8029d73f5b6153ca1740f
Id:ef726b18a97bc7ce6e4786b136258204    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_ef726b18a97bc7ce6e4786b136258204
Id:fcde6c0d951824668f22115bfe656885    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_fcde6c0d951824668f22115bfe656885

heketi-cli blockvolume list
Id:084c2c9c44cb39962a6ef6ff7596faf2    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_084c2c9c44cb39962a6ef6ff7596faf2
Id:17fed15e5dc077efd7c2749ccda68d78    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_17fed15e5dc077efd7c2749ccda68d78
Id:1a8e8a234e9272d922495be94f5bba96    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_1a8e8a234e9272d922495be94f5bba96
Id:22942a9cabdbbb36bd35223c56238947    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_22942a9cabdbbb36bd35223c56238947
Id:353e9ac661e741537b910c4d00635eec    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_353e9ac661e741537b910c4d00635eec
Id:39b6ce5a67311cea9ead0d036327e87f    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_39b6ce5a67311cea9ead0d036327e87f
Id:3d82b36fe4855cff97b56e3438822157    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_3d82b36fe4855cff97b56e3438822157
Id:49b58be113b68a92adf395016d84c047    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_49b58be113b68a92adf395016d84c047
Id:4c90d293fe8b33339291af8ec8e801d5    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_4c90d293fe8b33339291af8ec8e801d5
Id:4d73f4cb2b18e04712d52d30549366e2    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_4d73f4cb2b18e04712d52d30549366e2
Id:6844996c4836d0dcfdff72482d2c4ed0    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_6844996c4836d0dcfdff72482d2c4ed0
Id:76b9c6631e4fdfeb0ccbe733dfe001f2    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_76b9c6631e4fdfeb0ccbe733dfe001f2
Id:9762bcaf0540d91ebcb1fb4372acc53d    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_9762bcaf0540d91ebcb1fb4372acc53d
Id:97c3cb3248d400aee107ec77e136d540    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_97c3cb3248d400aee107ec77e136d540
Id:9df81844fa722e5466f25c509052e0aa    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_9df81844fa722e5466f25c509052e0aa
Id:a31948dece49a5e27eef4cffcca5a88f    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_a31948dece49a5e27eef4cffcca5a88f
Id:a4be311e815441f4ed504095c1abacc2    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_a4be311e815441f4ed504095c1abacc2
Id:ba01f86e4d0bf575e946e3a9b8dee9fb    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_ba01f86e4d0bf575e946e3a9b8dee9fb
Id:c1de75cd423bb77b93574d6fca7f4ab2    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_c1de75cd423bb77b93574d6fca7f4ab2
Id:d40846b160b52e833994ef09991df5a9    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_d40846b160b52e833994ef09991df5a9
Id:d654891466bea163c1e67866cf73e999    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_d654891466bea163c1e67866cf73e999
Id:de3988bacbfaeb19de392dcf8f30226b    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_de3988bacbfaeb19de392dcf8f30226b
Id:dfe9625e343a1569046c56386ef9e056    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_dfe9625e343a1569046c56386ef9e056
Id:e0338fb2f56612398025a485ce85da6c    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_e0338fb2f56612398025a485ce85da6c
Id:ee01cf500af8029d73f5b6153ca1740f    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_ee01cf500af8029d73f5b6153ca1740f
Id:ef726b18a97bc7ce6e4786b136258204    Cluster:9f295b5ab4a965e22f6d24cf73970f01    Name:blockvol_ef726b18a97bc7ce6e4786b136258204
Error: Id not found

Comment 15 krishnaram Karthick 2019-01-21 17:24:32 UTC
Created attachment 1522225 [details]
Logs for comment 14

Comment 16 John Mulligan 2019-01-22 13:05:01 UTC
Could you please provide a db dump from the heketi instance producing this error? Thanks.

Comment 17 Michael Adam 2019-01-22 13:49:43 UTC
Is this supposed to be "FailedQA" ?

Comment 19 krishnaram Karthick 2019-01-22 16:13:03 UTC
Created attachment 1522451 [details]
Logs for comment 18

Comment 20 John Mulligan 2019-01-22 18:35:45 UTC
Looking over the provided logs and db dump I think I understand what's happening in this case. It looks a lot like the symptom we saw earlier but is a bit different.
Previously, heketi was producing the "Id not found" errors even on static, non-changing, db contents because it was failing to properly account for the different combinations of items in a the pending operations entries.
It appears that in this case the error is more akin to a race condition where the state of the system is being actively changed while the "blockvolume list" commands are being run. A snippet from the logs:

[negroni] Started DELETE /blockvolumes/d544e383c1b28be580a73fe736f25e9a
[negroni] Started GET /blockvolumes
[negroni] Started GET /queue/80f1ca3173dbc8e0b67feb6be34a128a
[negroni] Completed 500 Internal Server Error in 92.518µs
[negroni] Completed 200 OK in 34.860633ms
[negroni] Started GET /blockvolumes/090135bcbcd5e26ed788c41394dcf885
[negroni] Completed 200 OK in 1.602692ms
[negroni] Started GET /blockvolumes/0c25795f72d359b4325b47f96c906b3c
[negroni] Completed 200 OK in 220.389µs
... {snip} ...
[negroni] Started GET /blockvolumes/afe734276e763c7040c4b2ad1bce25aa
[negroni] Completed 200 OK in 728.303µs
[negroni] Started GET /blockvolumes/b8ebd044fd71a4d7b7d892dec61b502b
[negroni] Completed 200 OK in 2.39265ms
[negroni] Started GET /blockvolumes/bb52e453714674e2e617be504eff16c6
[heketi] INFO 2019/01/22 16:05:41 executing removal of block volume d544e383c1b28be580a73fe736f25e9a in op:aa51b4303e2675489381dfd475bc8df1
[negroni] Completed 200 OK in 537.099µs
[negroni] Started GET /blockvolumes/bcaad44ab21d23475a62b556b51ed153
[negroni] Completed 200 OK in 199.69µs
[negroni] Started GET /blockvolumes/bd6b709c034cb27a834cdc30d946c536
[negroni] Completed 200 OK in 1.679767ms
[negroni] Started GET /blockvolumes/bdddb7a3d6f9fbda1b3ae439e9b4883a
[negroni] Completed 200 OK in 294.397µs
[negroni] Started GET /blockvolumes/bec2c63501a767ba918e2f8d2d7095a3
[negroni] Completed 200 OK in 637.171µs
[negroni] Started GET /blockvolumes/c3dca0b61ed3cad42ea2cb05c417c421
[negroni] Completed 200 OK in 656.152µs
[negroni] Started GET /blockvolumes/c43fb112b19ad6f075b190ab71d88db8
[negroni] Completed 200 OK in 666.531µs
[negroni] Started GET /blockvolumes/c54a77a615a0869c6ed75c37cf4462dd
[negroni] Completed 200 OK in 329.015µs
[negroni] Started GET /blockvolumes/c5db2d6c3bbe9af49f7444489d961f56
[negroni] Completed 200 OK in 271.309µs
[negroni] Started GET /blockvolumes/c769f40b15b0d799de1a21a83fe6d825
[negroni] Completed 200 OK in 555.506µs
[negroni] Started GET /blockvolumes/c93b1541ec6a079f4819b82319f5f20f
[negroni] Completed 200 OK in 244.558µs
[negroni] Started GET /blockvolumes/ccfc8e53a5a0d26c361c24bd6d7b48ad
[negroni] Completed 200 OK in 746.983µs
[negroni] Started GET /blockvolumes/d28aabbcd6aaafff033505940f2f214f
[negroni] Completed 200 OK in 550.168µs
[negroni] Started GET /blockvolumes/d2ce0ac04e640fc3e8c9efeaba19a528
[negroni] Completed 200 OK in 216.753µs
[negroni] Started GET /blockvolumes/d544e383c1b28be580a73fe736f25e9a
[negroni] Completed 404 Not Found in 203.82µs
[negroni] Started DELETE /blockvolumes/d544e383c1b28be580a73fe736f25e9a
[kubeexec] ERROR 2019/01/22 16:05:41 heketi/pkg/remoteexec/kube/exec.go:85:kube.ExecCommands: Failed to run command [gluster-block delete vol_40f08344b8f2c469f62577d11960e390/blockvol_d544e383c1b28be580a73fe736f25e9a --json] on [pod:glusterfs-storage-8kn8b c:glusterfs ns:app-storage (from host:dhcp47-92.lab.eng.blr.redhat.com selector:glusterfs-node)]: Err[command terminated with exit code 255]: Stdout []: Stderr [Connection failed. Please check if gluster-block daemon is operational.
]           
[cmdexec] ERROR 2019/01/22 16:05:41 heketi/executors/cmdexec/block_volume.go:120:cmdexec.(*CmdExecutor).BlockVolumeDestroy: Unable to parse output from block volume delete: blockvol_d544e383c1b28be580a73fe736f25e9a
[heketi] ERROR 2019/01/22 16:05:41 heketi/apps/glusterfs/block_volume_entry.go:315:glusterfs.(*BlockVolumeEntry).destroyFromHost: Unable to delete volume: command terminated with exit code 255



The "Started GET" lines are all part of the apis called by "blockvolume list" but at the same time there are ongoing DELETEs coming in.
Specifically we can see the GET of /blockvolumes/d544e383c1b28be580a73fe736f25e9a fails with a 404 error at the same time it is trying to be deleted. The reason for the 404 is that heketi marks the item pending before trying to delete it, and the GET apis treat pending items equivalent to missing and thus return a 404. However, this behavior is instrinsic to the way the list commands for volumes and blockvolumes work (and have worked in heketi)

Because it requests a list of volume ids (GET /blockvolumes) and then calls a (GET /blockvolumes/<ID>) on each sequentially there's always a potential for the content of the db to change between these api calls (as they are done in separate transactions).

This particular behavior can be easily reproduced by doing something like the following:
victim=$(./heketi-cli blockvolume list | python -c 'import sys; import random; l=sys.stdin.readlines(); print(random.choice(l).split()[0]).split(":")[-1]' )  ;  ./heketi-cli blockvolume delete $victim & ./heketi-cli  blockvolume list

This produces the "Id not found" error for me about 4 out of 5 times.

We could choose to try and hide all "Id not found" type errors from listings but doing that could hide other real bugs.
We could restructure how we do listings to force all data to come from one single api request/response (and thus one transaction) but that would be a major change that I would feel uncomfortable doing at this time.

Comment 21 Raghavendra Talur 2019-01-22 20:31:52 UTC
I have looked at the logs after reading John's comments and I agree with the analysis.

Volumes which are pending deletion are not eligible for volume info and are not listed in volume listing.

The API itself does not have any problem because it lists only those volumes which are not pending deletion. Problem seems to be only with how the CLI is dealing with the list and errors that are encountered with volumeInfo failure. @Michael and @John, would it be better if we adapted CLI to ignore such errors in listing? Say at line  https://github.com/heketi/heketi/blob/master/client/cli/go/cmds/volume.go#L491

Comment 24 Michael Adam 2019-01-24 13:30:52 UTC
Thanks. doing.

Comment 28 errata-xmlrpc 2019-02-07 03:41:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0286


Note You need to log in before you can comment on or make changes to this bug.