Bug 1652922 - Attempts at block pvc delete throw 'Delete Block Volume Build Failed' but eventually all deletes succeed
Summary: Attempts at block pvc delete throw 'Delete Block Volume Build Failed' but eve...
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: heketi
Version: ocs-3.11
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: ---
Assignee: John Mulligan
QA Contact: Neha Berry
Depends On:
TreeView+ depends on / blocked
Reported: 2018-11-23 15:10 UTC by Neha Berry
Modified: 2019-03-11 04:52 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2019-02-11 22:43:37 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Neha Berry 2018-11-23 15:10:05 UTC
All attempts at block pvc delete throw 'Delete Block Volume Build Failed' but eventually delete succeeds

Description of problem:

In OCS 3.11.1 build, while deleting block pvcs, we see some error messages in heketi logs and also sometimes in gluster-blockd logs . But the delete finally succeeds.

We wanted to confirm what could be the reason for  so many "Error" log messages for even a single pvc deletion, even when deletion succeeds. 

Attempted almost 4-5 block deletions and following things are noticed:

1. Even though block deletion is quite fast , for a single pvc delete request, we see multiple error messages like below in heketi:
[negroni] Started DELETE /blockvolumes/4032fca686696705df44e42128747a98
[heketi] ERROR 2018/11/22 17:58:36 /src/github.com/heketi/heketi/apps/glusterfs/operations.go:941: Pending block volume 4032fca686696705df44e42128747a98 can not be deleted


2.  We also saw following error message on gluster-blockd side , once after a create and once after delete.

"glusterBlockVolumeInit(e56279db502865659c524b763c9957bf) failed [at block_svc_routines.c+4974 :<block_list_cli_1_svc_st>]"

[2018-11-22 09:57:05.757325] INFO: create request, volume=vol_e56279db502865659c524b763c9957bf blockname=blk_glusterfs_block50_f0ac8d19-ee3c-11e8-9bfd-0a580a820008 blockhosts=,, filename=1f403680-a878-4d08-b636-1ae851e9279e authmode=1 passwd=bbebb862-30f9-4798-84fb-9f860f8d7740 size=3221225472 [at block_svc_routines.c+4270 :<block_create_common>]
[2018-11-22 09:57:07.859370] INFO: command exit code, 0 [at block_svc_routines.c+4446 :<block_create_common>]
[2018-11-22 09:58:36.383064] ERROR: glusterBlockVolumeInit(9afa8e97bae9f92bdb19376c320fbcb3) failed [at block_svc_routines.c+4974 :<block_list_cli_1_svc_st>]
[2018-11-22 09:59:01.714822] ERROR: glusterBlockVolumeInit(e56279db502865659c524b763c9957bf) failed [at block_svc_routines.c+4974 :<block_list_cli_1_svc_st>]

[2018-11-22 10:14:49.681918] INFO: command exit code, 0 [at block_svc_routines.c+4446 :<block_create_common>]
[2018-11-22 10:16:06.034960] INFO: delete request, blockname=blk_glusterfs_file1_6f8f6a2d-ee3f-11e8-a6eb-0a580a810007 filename=f40d4303-4dfc-4838-b153-50ae1f86c94c [at block_svc_routines.c+4689 :<block_delete_1_svc_st>]
[2018-11-22 10:16:09.802075] INFO: command exit code, 0 [at block_svc_routines.c+4720 :<block_delete_1_svc_st>]
[2018-11-22 10:17:03.289160] ERROR: glusterBlockVolumeInit(e56279db502865659c524b763c9957bf) failed [at block_svc_routines.c+4974 :<block_list_cli_1_svc_st>]


3. Even though delete block volume succeeds, we still see some more "Started Delete" requests for the same volume after successful delete. This then errors out saying "Completed 404 Not Found in 91.097µs"
What could be the reason for re-trying delete even after delete is successful. Similar behavior was seen 

[kubeexec] DEBUG 2018/11/22 17:58:41 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:246: Host: dhcp47-20.lab.eng.blr.redhat.com Pod: glusterfs-storage-fkmqh Command: bash -c "set -o pipefail && gluster-block delete vol_e56279db502865659c524b763c9957bf/blk_glusterfs_block52_6ff02eac-ee7f-11e8-a6eb-0a580a810007 --json |tee /dev/stderr"
Result: { "SUCCESSFUL ON": [ "", "", "" ], "RESULT": "SUCCESS" }
[negroni] Started DELETE /blockvolumes/4032fca686696705df44e42128747a98
[negroni] Started GET /queue/a06268cce654c493d44efcab2d471ee4
[negroni] Completed 200 OK in 172.054µs
[heketi] INFO 2018/11/22 17:58:41 Delete Block Volume succeeded
[asynchttp] INFO 2018/11/22 17:58:41 asynchttp.go:292: Completed job a06268cce654c493d44efcab2d471ee4 in 5.047771833s
[heketi] ERROR 2018/11/22 17:58:41 /src/github.com/heketi/heketi/apps/glusterfs/operations_manage.go:89: Delete Block Volume Build Failed: Id not found
[negroni] Completed 500 Internal Server Error in 163.907849ms
[negroni] Started DELETE /blockvolumes/4032fca686696705df44e42128747a98
[negroni] Completed 404 Not Found in 202.273µs
[negroni] Started GET /queue/a06268cce654c493d44efcab2d471ee4
[negroni] Completed 204 No Content in 170.067µs
[negroni] Started DELETE /blockvolumes/4032fca686696705df44e42128747a98
[negroni] Completed 404 Not Found in 200.545µs
[negroni] Started DELETE /blockvolumes/4032fca686696705df44e42128747a98
[negroni] Completed 404 Not Found in 91.097µs

Version-Release number of selected component (if applicable):

  # oc rsh glusterfs-storage-dnqkj rpm -qa|grep gluster

# oc rsh heketi-storage-1-6xdhf rpm -qa|grep heketi

#oc version = 3.11.16

How reproducible:

Steps to Reproduce:
1. Create block pvcs and check the heketi and gluster-block logs. Check for meesages similar to "glusterBlockVolumeInit  failed "
2. Delete block pvc and check heketi and gluster-block logs. Check for innumerable " Delete Block Volume Build Failed: The target exists, contains other items, or is in use."

Comment 3 John Mulligan 2018-11-28 17:57:25 UTC
From Heketi's perspective it appears to be generally doing the correct thing. What happens is that clients are requesting a delete of the the same volume, repeatedly, before that volume is deleted by the first request. Every request to delete the volume that is already being deleted gets the "Build Failed: The target exists, contains other items, or is in use." message.

This makes me wonder if there could be an issue where the block provisioniner "spams" heketi with many requests to delete the same volume.

Can we either get the block provisioner logs from this same time frame, or reproduce again and this time capture both the block provisioner logs and the heketi logs?

Note You need to log in before you can comment on or make changes to this bug.