Fedora Account System
Red Hat Associate
Red Hat Customer
Description of problem: Scaled down gluster pods to '2' and deleted some file pvcs and some block pvcs. The PVCs are deleted as expected. The associated pv is in 'Failed' and 'Released' state respectively. The heketi volume list command sometimes lists the entire volume list and sometimes lists a few volumes and terminates with Error: Id not found Version-Release number of selected component (if applicable): sh-4.2# rpm -qa| grep heketi heketi-8.0.0-5.el7rhgs.x86_64 heketi-client-8.0.0-5.el7rhgs.x86_64 sh-4.2# rpm -qa| grep gluster glusterfs-server-3.12.2-32.el7rhgs.x86_64 gluster-block-0.2.1-30.el7rhgs.x86_64 glusterfs-api-3.12.2-32.el7rhgs.x86_64 glusterfs-cli-3.12.2-32.el7rhgs.x86_64 python2-gluster-3.12.2-32.el7rhgs.x86_64 glusterfs-fuse-3.12.2-32.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-32.el7rhgs.x86_64 glusterfs-libs-3.12.2-32.el7rhgs.x86_64 glusterfs-3.12.2-32.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-32.el7rhgs.x86_64 # rpm -qa| grep openshift openshift-ansible-docs-3.11.43-1.git.0.fa69a02.el7.noarch atomic-openshift-excluder-3.11.43-1.git.0.647ac05.el7.noarch atomic-openshift-hyperkube-3.11.43-1.git.0.647ac05.el7.x86_64 atomic-openshift-node-3.11.43-1.git.0.647ac05.el7.x86_64 openshift-ansible-playbooks-3.11.43-1.git.0.fa69a02.el7.noarch openshift-ansible-3.11.43-1.git.0.fa69a02.el7.noarch atomic-openshift-clients-3.11.43-1.git.0.647ac05.el7.x86_64 atomic-openshift-3.11.43-1.git.0.647ac05.el7.x86_64 openshift-ansible-roles-3.11.43-1.git.0.fa69a02.el7.noarch atomic-openshift-docker-excluder-3.11.43-1.git.0.647ac05.el7.noarch How reproducible: Tried once on this setup Steps to Reproduce: 1. Created a OCP + OCS 3.11.51-2 setup with latest OCS builds having 60 file and 40 block volumes. All are Replica 3 volumes. 2. Scaled down gluster pods to '2' by removing the label 3. Deleted 20 block pvc in loop. The PVC is deleted, the associated PV is in 'Released' state. 4. After step 3 is completed, deleted 10 file pvc in loop. The PVC is deleted, the corresponding PV is in 'Failed' state. 5. The expectation is that all volumes should list from heketi. but sometimes heketi lists the entire volume list and sometimes it lists fewer volumes and terminates with Error: Id not found. # heketi-cli volume list Id:044d8ab5e095468ee8cfdd104ab8174e Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl46_068d19d1-0777-11e9-8bdf-005056a53ec9 Id:094841502f1b419f50a1152e3a339b12 Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl25_97dce1c4-0776-11e9-8bdf-005056a53ec9 Id:0a50d40ead3539c76f7261712707ec65 Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_ngnix-fl2_66fecdc2-079b-11e9-8bdf-005056a53ec9 Id:1035dd900650e8f7c116553c986318bd Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl16_6b2e0f80-0776-11e9-8bdf-005056a53ec9 Id:1573d1ffe3aa0b3a4d3217c429c89429 Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl11_5091e212-0776-11e9-8bdf-005056a53ec9 Id:191533290fc0d605592a5178addcaeae Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl21_845367ba-0776-11e9-8bdf-005056a53ec9 Id:1ef9306efebf1e2c9a549d38adfe54bf Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_v1_32043217-0775-11e9-8bdf-005056a53ec9 Id:27ead2edebb997e9c685a7dab691407a Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl20_7f507b44-0776-11e9-8bdf-005056a53ec9 Id:280abdc49de924166415f1bc2f583636 Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_ngnix-fl4_6bf15ebe-079b-11e9-8bdf-005056a53ec9 Id:2acfe8fbab7f0c3d45172f9108db3cf8 Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl8_3df04a93-0776-11e9-8bdf-005056a53ec9 Id:2bcc6f22bb6693d9c6c68c5520834e0f Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl15_6626fc46-0776-11e9-8bdf-005056a53ec9 Id:2c3a17ada265edea087c4f05dcbb212c Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl45_01686865-0777-11e9-8bdf-005056a53ec9 Id:318e065437cf152a6eae3626a94a9b60 Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl38_dc7ff0c2-0776-11e9-8bdf-005056a53ec9 Id:3262ca4b0da18e9558632a9bf2f26979 Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl29_aa8b6da0-0776-11e9-8bdf-005056a53ec9 Id:34688c9967ce14a07dfc2a532d392fdc Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl9_4357344c-0776-11e9-8bdf-005056a53ec9 Id:36e49e6a07fc4dc0d078390025c83874 Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl35_c9a7b514-0776-11e9-8bdf-005056a53ec9 Id:384d547b0358cb9a26a2581cd996a495 Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl4_2940ff44-0776-11e9-8bdf-005056a53ec9 Id:3a9393d4e515ce77733de5ba18f022bf Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl42_f2ec7cec-0776-11e9-8bdf-005056a53ec9 Id:4746d225a1be8a23416abb2a8792d23a Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl41_edd0a228-0776-11e9-8bdf-005056a53ec9 Id:4822507f5c56551361e1236b07d0ce6a Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_pgsql_postgresql-001_f0b31ff0-079b-11e9-8bdf-005056a53ec9 Id:496ebb72d96deada6f32f60de00214c3 Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl34_c35035ec-0776-11e9-8bdf-005056a53ec9 Id:52f0e3a8f33213b9741d86d7ced9a786 Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl33_bd97f61f-0776-11e9-8bdf-005056a53ec9 Id:6115ae3c6def2a1d7388ddf37324d67b Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl13_5c19988f-0776-11e9-8bdf-005056a53ec9 Id:615063414284307516f7f3742d293534 Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl26_9bd5e846-0776-11e9-8bdf-005056a53ec9 Id:615fa04aedffcec94f1651dabc9fed3f Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl14_61bc7b72-0776-11e9-8bdf-005056a53ec9 Id:625a1c7c13175aeee72f8086873ad3fa Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl44_fc7eb1f4-0776-11e9-8bdf-005056a53ec9 Id:669b63b2516b5a160d49703e4ce3dc27 Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl23_8e6b912b-0776-11e9-8bdf-005056a53ec9 Id:673ad932d59794910310330167ad7db0 Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_ngnix-fl_61bb80da-079b-11e9-8bdf-005056a53ec9 Id:6a75f50f3b24deffb0751c0ef8fc7631 Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl37_d75fa467-0776-11e9-8bdf-005056a53ec9 Id:6b518a57ab38086d367e6cf37358436a Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:vol_6b518a57ab38086d367e6cf37358436a [block] Id:71de56d5a0c05bb564310829c2c972bd Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl5_2e51484f-0776-11e9-8bdf-005056a53ec9 Id:789e4c4446d2095a30a3d1afd8a5f012 Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl2_1e1cc25e-0776-11e9-8bdf-005056a53ec9 Id:78abd2cbd071d7a196ea6d2d291d3647 Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl24_937879b5-0776-11e9-8bdf-005056a53ec9 Id:7929ca796e7999caf9d0e223917f4bac Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl28_a57905d4-0776-11e9-8bdf-005056a53ec9 Id:7e19a9da3892f074741c63deddef8543 Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl49_160f3b8a-0777-11e9-8bdf-005056a53ec9 Id:803e932693af6442ed89a309a47af07b Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl39_e41dd96f-0776-11e9-8bdf-005056a53ec9 Id:80a8b2d9e3f22f8ac07c4455fba9c9dc Cluster:3efdba58f26dbd5038fbecbc6b092690 Name:dept-qe_glusterfs_vl3_2446d99c-0776-11e9-8bdf-005056a53ec9 Error: Id not found ============================================================================ When heketi does not list all volumes : # heketi-cli volume list | wc -l Error: Id not found 37 # heketi-cli volume list| wc -l Error: Id not found 14 When heketi lists all the volumes : # heketi-cli volume list | wc -l 60 Actual results: heketi sometimes lists all the volumes but sometimes terminates with Error: Id not found. Expected results: heketi should list all volumes at all times. Additional info: Heketi logs, sosreports will be attached.
https://github.com/heketi/heketi/pull/1486
The issue reported in the bug is still seen. with a bulk block volume deletion and killing of gluster-blockd, we end up in the state reported in the bug. Moving the bug back to assigned. heketi logs shall be attached. cli output: ----------- [root@dhcp46-173 ~]# heketi-cli blockvolume list Id:084c2c9c44cb39962a6ef6ff7596faf2 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_084c2c9c44cb39962a6ef6ff7596faf2 Id:17fed15e5dc077efd7c2749ccda68d78 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_17fed15e5dc077efd7c2749ccda68d78 Id:22942a9cabdbbb36bd35223c56238947 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_22942a9cabdbbb36bd35223c56238947 Id:353e9ac661e741537b910c4d00635eec Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_353e9ac661e741537b910c4d00635eec Id:39b6ce5a67311cea9ead0d036327e87f Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_39b6ce5a67311cea9ead0d036327e87f Id:3d82b36fe4855cff97b56e3438822157 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_3d82b36fe4855cff97b56e3438822157 Id:4c90d293fe8b33339291af8ec8e801d5 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_4c90d293fe8b33339291af8ec8e801d5 Id:4d73f4cb2b18e04712d52d30549366e2 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_4d73f4cb2b18e04712d52d30549366e2 Error: Id not found [root@dhcp46-173 ~]# heketi-cli blockvolume list Id:17fed15e5dc077efd7c2749ccda68d78 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_17fed15e5dc077efd7c2749ccda68d78 Id:1a8e8a234e9272d922495be94f5bba96 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_1a8e8a234e9272d922495be94f5bba96 Id:22942a9cabdbbb36bd35223c56238947 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_22942a9cabdbbb36bd35223c56238947 Id:353e9ac661e741537b910c4d00635eec Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_353e9ac661e741537b910c4d00635eec Id:3d82b36fe4855cff97b56e3438822157 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_3d82b36fe4855cff97b56e3438822157 Id:49b58be113b68a92adf395016d84c047 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_49b58be113b68a92adf395016d84c047 Id:4c90d293fe8b33339291af8ec8e801d5 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_4c90d293fe8b33339291af8ec8e801d5 Id:4d73f4cb2b18e04712d52d30549366e2 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_4d73f4cb2b18e04712d52d30549366e2 Id:6844996c4836d0dcfdff72482d2c4ed0 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_6844996c4836d0dcfdff72482d2c4ed0 Id:76b9c6631e4fdfeb0ccbe733dfe001f2 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_76b9c6631e4fdfeb0ccbe733dfe001f2 Id:9762bcaf0540d91ebcb1fb4372acc53d Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_9762bcaf0540d91ebcb1fb4372acc53d Id:97c3cb3248d400aee107ec77e136d540 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_97c3cb3248d400aee107ec77e136d540 Id:a31948dece49a5e27eef4cffcca5a88f Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_a31948dece49a5e27eef4cffcca5a88f Id:a4be311e815441f4ed504095c1abacc2 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_a4be311e815441f4ed504095c1abacc2 Id:ba01f86e4d0bf575e946e3a9b8dee9fb Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_ba01f86e4d0bf575e946e3a9b8dee9fb Id:c1de75cd423bb77b93574d6fca7f4ab2 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_c1de75cd423bb77b93574d6fca7f4ab2 Id:d40846b160b52e833994ef09991df5a9 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_d40846b160b52e833994ef09991df5a9 Id:d654891466bea163c1e67866cf73e999 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_d654891466bea163c1e67866cf73e999 Id:de3988bacbfaeb19de392dcf8f30226b Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_de3988bacbfaeb19de392dcf8f30226b Id:e00e90321c6f72b27193431a6c8a9131 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_e00e90321c6f72b27193431a6c8a9131 Id:e0338fb2f56612398025a485ce85da6c Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_e0338fb2f56612398025a485ce85da6c Id:ee01cf500af8029d73f5b6153ca1740f Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_ee01cf500af8029d73f5b6153ca1740f Id:ef726b18a97bc7ce6e4786b136258204 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_ef726b18a97bc7ce6e4786b136258204 Id:fcde6c0d951824668f22115bfe656885 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_fcde6c0d951824668f22115bfe656885 heketi-cli blockvolume list Id:084c2c9c44cb39962a6ef6ff7596faf2 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_084c2c9c44cb39962a6ef6ff7596faf2 Id:17fed15e5dc077efd7c2749ccda68d78 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_17fed15e5dc077efd7c2749ccda68d78 Id:1a8e8a234e9272d922495be94f5bba96 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_1a8e8a234e9272d922495be94f5bba96 Id:22942a9cabdbbb36bd35223c56238947 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_22942a9cabdbbb36bd35223c56238947 Id:353e9ac661e741537b910c4d00635eec Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_353e9ac661e741537b910c4d00635eec Id:39b6ce5a67311cea9ead0d036327e87f Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_39b6ce5a67311cea9ead0d036327e87f Id:3d82b36fe4855cff97b56e3438822157 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_3d82b36fe4855cff97b56e3438822157 Id:49b58be113b68a92adf395016d84c047 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_49b58be113b68a92adf395016d84c047 Id:4c90d293fe8b33339291af8ec8e801d5 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_4c90d293fe8b33339291af8ec8e801d5 Id:4d73f4cb2b18e04712d52d30549366e2 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_4d73f4cb2b18e04712d52d30549366e2 Id:6844996c4836d0dcfdff72482d2c4ed0 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_6844996c4836d0dcfdff72482d2c4ed0 Id:76b9c6631e4fdfeb0ccbe733dfe001f2 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_76b9c6631e4fdfeb0ccbe733dfe001f2 Id:9762bcaf0540d91ebcb1fb4372acc53d Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_9762bcaf0540d91ebcb1fb4372acc53d Id:97c3cb3248d400aee107ec77e136d540 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_97c3cb3248d400aee107ec77e136d540 Id:9df81844fa722e5466f25c509052e0aa Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_9df81844fa722e5466f25c509052e0aa Id:a31948dece49a5e27eef4cffcca5a88f Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_a31948dece49a5e27eef4cffcca5a88f Id:a4be311e815441f4ed504095c1abacc2 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_a4be311e815441f4ed504095c1abacc2 Id:ba01f86e4d0bf575e946e3a9b8dee9fb Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_ba01f86e4d0bf575e946e3a9b8dee9fb Id:c1de75cd423bb77b93574d6fca7f4ab2 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_c1de75cd423bb77b93574d6fca7f4ab2 Id:d40846b160b52e833994ef09991df5a9 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_d40846b160b52e833994ef09991df5a9 Id:d654891466bea163c1e67866cf73e999 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_d654891466bea163c1e67866cf73e999 Id:de3988bacbfaeb19de392dcf8f30226b Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_de3988bacbfaeb19de392dcf8f30226b Id:dfe9625e343a1569046c56386ef9e056 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_dfe9625e343a1569046c56386ef9e056 Id:e0338fb2f56612398025a485ce85da6c Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_e0338fb2f56612398025a485ce85da6c Id:ee01cf500af8029d73f5b6153ca1740f Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_ee01cf500af8029d73f5b6153ca1740f Id:ef726b18a97bc7ce6e4786b136258204 Cluster:9f295b5ab4a965e22f6d24cf73970f01 Name:blockvol_ef726b18a97bc7ce6e4786b136258204 Error: Id not found
Created attachment 1522225 [details] Logs for comment 14
Could you please provide a db dump from the heketi instance producing this error? Thanks.
Is this supposed to be "FailedQA" ?
Created attachment 1522451 [details] Logs for comment 18
Looking over the provided logs and db dump I think I understand what's happening in this case. It looks a lot like the symptom we saw earlier but is a bit different. Previously, heketi was producing the "Id not found" errors even on static, non-changing, db contents because it was failing to properly account for the different combinations of items in a the pending operations entries. It appears that in this case the error is more akin to a race condition where the state of the system is being actively changed while the "blockvolume list" commands are being run. A snippet from the logs: [negroni] Started DELETE /blockvolumes/d544e383c1b28be580a73fe736f25e9a [negroni] Started GET /blockvolumes [negroni] Started GET /queue/80f1ca3173dbc8e0b67feb6be34a128a [negroni] Completed 500 Internal Server Error in 92.518µs [negroni] Completed 200 OK in 34.860633ms [negroni] Started GET /blockvolumes/090135bcbcd5e26ed788c41394dcf885 [negroni] Completed 200 OK in 1.602692ms [negroni] Started GET /blockvolumes/0c25795f72d359b4325b47f96c906b3c [negroni] Completed 200 OK in 220.389µs ... {snip} ... [negroni] Started GET /blockvolumes/afe734276e763c7040c4b2ad1bce25aa [negroni] Completed 200 OK in 728.303µs [negroni] Started GET /blockvolumes/b8ebd044fd71a4d7b7d892dec61b502b [negroni] Completed 200 OK in 2.39265ms [negroni] Started GET /blockvolumes/bb52e453714674e2e617be504eff16c6 [heketi] INFO 2019/01/22 16:05:41 executing removal of block volume d544e383c1b28be580a73fe736f25e9a in op:aa51b4303e2675489381dfd475bc8df1 [negroni] Completed 200 OK in 537.099µs [negroni] Started GET /blockvolumes/bcaad44ab21d23475a62b556b51ed153 [negroni] Completed 200 OK in 199.69µs [negroni] Started GET /blockvolumes/bd6b709c034cb27a834cdc30d946c536 [negroni] Completed 200 OK in 1.679767ms [negroni] Started GET /blockvolumes/bdddb7a3d6f9fbda1b3ae439e9b4883a [negroni] Completed 200 OK in 294.397µs [negroni] Started GET /blockvolumes/bec2c63501a767ba918e2f8d2d7095a3 [negroni] Completed 200 OK in 637.171µs [negroni] Started GET /blockvolumes/c3dca0b61ed3cad42ea2cb05c417c421 [negroni] Completed 200 OK in 656.152µs [negroni] Started GET /blockvolumes/c43fb112b19ad6f075b190ab71d88db8 [negroni] Completed 200 OK in 666.531µs [negroni] Started GET /blockvolumes/c54a77a615a0869c6ed75c37cf4462dd [negroni] Completed 200 OK in 329.015µs [negroni] Started GET /blockvolumes/c5db2d6c3bbe9af49f7444489d961f56 [negroni] Completed 200 OK in 271.309µs [negroni] Started GET /blockvolumes/c769f40b15b0d799de1a21a83fe6d825 [negroni] Completed 200 OK in 555.506µs [negroni] Started GET /blockvolumes/c93b1541ec6a079f4819b82319f5f20f [negroni] Completed 200 OK in 244.558µs [negroni] Started GET /blockvolumes/ccfc8e53a5a0d26c361c24bd6d7b48ad [negroni] Completed 200 OK in 746.983µs [negroni] Started GET /blockvolumes/d28aabbcd6aaafff033505940f2f214f [negroni] Completed 200 OK in 550.168µs [negroni] Started GET /blockvolumes/d2ce0ac04e640fc3e8c9efeaba19a528 [negroni] Completed 200 OK in 216.753µs [negroni] Started GET /blockvolumes/d544e383c1b28be580a73fe736f25e9a [negroni] Completed 404 Not Found in 203.82µs [negroni] Started DELETE /blockvolumes/d544e383c1b28be580a73fe736f25e9a [kubeexec] ERROR 2019/01/22 16:05:41 heketi/pkg/remoteexec/kube/exec.go:85:kube.ExecCommands: Failed to run command [gluster-block delete vol_40f08344b8f2c469f62577d11960e390/blockvol_d544e383c1b28be580a73fe736f25e9a --json] on [pod:glusterfs-storage-8kn8b c:glusterfs ns:app-storage (from host:dhcp47-92.lab.eng.blr.redhat.com selector:glusterfs-node)]: Err[command terminated with exit code 255]: Stdout []: Stderr [Connection failed. Please check if gluster-block daemon is operational. ] [cmdexec] ERROR 2019/01/22 16:05:41 heketi/executors/cmdexec/block_volume.go:120:cmdexec.(*CmdExecutor).BlockVolumeDestroy: Unable to parse output from block volume delete: blockvol_d544e383c1b28be580a73fe736f25e9a [heketi] ERROR 2019/01/22 16:05:41 heketi/apps/glusterfs/block_volume_entry.go:315:glusterfs.(*BlockVolumeEntry).destroyFromHost: Unable to delete volume: command terminated with exit code 255 The "Started GET" lines are all part of the apis called by "blockvolume list" but at the same time there are ongoing DELETEs coming in. Specifically we can see the GET of /blockvolumes/d544e383c1b28be580a73fe736f25e9a fails with a 404 error at the same time it is trying to be deleted. The reason for the 404 is that heketi marks the item pending before trying to delete it, and the GET apis treat pending items equivalent to missing and thus return a 404. However, this behavior is instrinsic to the way the list commands for volumes and blockvolumes work (and have worked in heketi) Because it requests a list of volume ids (GET /blockvolumes) and then calls a (GET /blockvolumes/<ID>) on each sequentially there's always a potential for the content of the db to change between these api calls (as they are done in separate transactions). This particular behavior can be easily reproduced by doing something like the following: victim=$(./heketi-cli blockvolume list | python -c 'import sys; import random; l=sys.stdin.readlines(); print(random.choice(l).split()[0]).split(":")[-1]' ) ; ./heketi-cli blockvolume delete $victim & ./heketi-cli blockvolume list This produces the "Id not found" error for me about 4 out of 5 times. We could choose to try and hide all "Id not found" type errors from listings but doing that could hide other real bugs. We could restructure how we do listings to force all data to come from one single api request/response (and thus one transaction) but that would be a major change that I would feel uncomfortable doing at this time.
I have looked at the logs after reading John's comments and I agree with the analysis. Volumes which are pending deletion are not eligible for volume info and are not listed in volume listing. The API itself does not have any problem because it lists only those volumes which are not pending deletion. Problem seems to be only with how the CLI is dealing with the list and errors that are encountered with volumeInfo failure. @Michael and @John, would it be better if we adapted CLI to ignore such errors in listing? Say at line https://github.com/heketi/heketi/blob/master/client/cli/go/cmds/volume.go#L491
Thanks. doing.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0286