Bug 1654703 - Block hosting volume deletion via heketi-cli failed with error "target is busy" but deleted from gluster backend
Summary: Block hosting volume deletion via heketi-cli failed with error "target is bus...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rhgs-server-container
Version: ocs-3.11
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 3.11.z Batch Update 2
Assignee: Saravanakumar
QA Contact: Nitin Goyal
URL:
Whiteboard:
: 1665462 (view as bug list)
Depends On: 1669020
Blocks: 1644171
TreeView+ depends on / blocked
 
Reported: 2018-11-29 13:12 UTC by Manisha Saini
Modified: 2019-04-24 18:16 UTC (History)
27 users (show)

Fixed In Version: rhgs-server-container-3.11.2-3
Doc Type: Bug Fix
Doc Text:
Earlier, a block hosting volume stop request used to fail to detach a brick instance from the running parent brick process. This resulted in further deletion request for the block hosting volume to fail with "resource busy" error. Due to this, heketi had stale block hosting volume entry maintained in its database. With this fix, block hosting volume stop request would successfully detach brick instance from its running parent brick process.
Clone Of:
: 1668190 1669020 (view as bug list)
Environment:
Last Closed: 2019-03-27 06:05:20 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0667 0 None None None 2019-03-27 06:05:24 UTC

Description Manisha Saini 2018-11-29 13:12:22 UTC
Description of problem:

While testing with block hosting volumes,post deleting the volume from heketi cli,few volumes failed to delete with message "target is busy".However if we login to the gluster pod post that,the volume gets successfully deleted from backend. This leads to mismatch in heketi volume etries and gluster backend.
On gluster pod,though the volume is deleted,but bricks were still mounted.



Version-Release number of selected component (if applicable):

openshift_storage_glusterfs_heketi_image='brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/ocs/rhgs-volmanager-rhel7:3.11.1-1'
   
openshift_storage_glusterfs_block_image='brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/ocs/rhgs-gluster-block-prov-rhel7:3.11.1-1'
    
openshift_storage_glusterfs_image='brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/ocs/rhgs-server-rhel7:3.11.1-1'

# oc rsh heketi-storage-1-chhcs 
rsh-4.2# rpm -qa | grep heketi
heketi-client-8.0.0-1.el7rhgs.x86_64
heketi-8.0.0-1.el7rhgs.x86_64

# oc version
oc v3.11.43
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://dhcp47-138.lab.eng.blr.redhat.com:8443
openshift v3.11.43
kubernetes v1.11.0+d4cacc0

How reproducible:
Hit 2-3 times

Steps to Reproduce:

1.Create 50 block devices pvc's (5GB) hosting on 3 block hosting volumes 

2.Create 50 cirros app pods from these block pvc's.

3.Delete the app pods and dc

4.Delete all the pvc's

5.Check for #heketi-cli blockvolume list (It shows 0 entry since all pvc's are deleted)

[root@dhcp47-138 scripts]# heketi-cli blockvolume list 
[root@dhcp47-138 scripts]# 

6.Start deleting blockhosting volume

========
# heketi-cli volume list 
Id:06f28604994dd5e87fbd85871968db4b    Cluster:3b834980ae06d3950765eaf0c7bc20a1    Name:vol_06f28604994dd5e87fbd85871968db4b [block]
Id:4a25a5dfdd217c589222e638a21fc3e9    Cluster:3b834980ae06d3950765eaf0c7bc20a1    Name:vol_4a25a5dfdd217c589222e638a21fc3e9 [block]
Id:8f6e417522ea27ce3cbe194bac337499    Cluster:3b834980ae06d3950765eaf0c7bc20a1    Name:vol_8f6e417522ea27ce3cbe194bac337499 [block]
Id:cd94e0c50b94551ced09244c198d58b1    Cluster:3b834980ae06d3950765eaf0c7bc20a1    Name:heketidbstorage

=======
[root@dhcp47-138 scripts]# heketi-cli volume delete 06f28604994dd5e87fbd85871968db4b
Error: umount: /var/lib/heketi/mounts/vg_dca41af8b5c15e419f66928440c4d9d6/brick_f225251573ef845f5591e578a1f10652: target is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))



[root@dhcp47-138 scripts]# heketi-cli volume delete 4a25a5dfdd217c589222e638a21fc3e9
Error: umount: /var/lib/heketi/mounts/vg_55d43508331a4ba298eee11d6f3c39a1/brick_a40130d2d380979ecce853a5e40308f2: target is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))


[root@dhcp47-138 scripts]# heketi-cli volume delete 8f6e417522ea27ce3cbe194bac337499
Volume 8f6e417522ea27ce3cbe194bac337499 deleted

==============

7.Check for heketi-cli volume list and volume present in gluster backend

======
# heketi-cli volume list
Id:06f28604994dd5e87fbd85871968db4b    Cluster:3b834980ae06d3950765eaf0c7bc20a1    Name:vol_06f28604994dd5e87fbd85871968db4b [block]
Id:4a25a5dfdd217c589222e638a21fc3e9    Cluster:3b834980ae06d3950765eaf0c7bc20a1    Name:vol_4a25a5dfdd217c589222e638a21fc3e9 [block]
Id:cd94e0c50b94551ced09244c198d58b1    Cluster:3b834980ae06d3950765eaf0c7bc20a1    Name:heketidbstorage

# gluster v list
heketidbstorage

======


Actual results:

heketi-cli fails to delete the volume but the volume is deleted from gluster backend causing mismatch in volume entries

Expected results:
There should not be any mismatch in volume entries netween gluster backend and heketi db


Additional info:

Comment 12 Atin Mukherjee 2019-01-03 05:38:45 UTC
We need to reproduce and collect the lsof output and share back, otherwise we don't have sufficient information to debug this problem. Also make sure that gluster rpm version is reported in the comment.

Comment 30 RamaKasturi 2019-01-22 07:38:53 UTC
I am hitting the same issue in my setup too where deleting block hosting volumes from heketi-cli throws the error below but got deleted from gluster backend.

[root@dhcp46-220 ~]# heketi-cli volume delete 037a62a428cc34f540102d91333b8f34
Error: umount: /var/lib/heketi/mounts/vg_c921126c92d975b25d5903a5afdc4214/brick_0a37b52d11601de61a4d11c3a4281c99: target is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))

Id:037a62a428cc34f540102d91333b8f34    Cluster:ee2fbd7973e8396a23811d54dd8ed985    Name:vol_037a62a428cc34f540102d91333b8f34 [block]
Id:3dc699b513a2ce7b3a75f401fa0390bc    Cluster:ee2fbd7973e8396a23811d54dd8ed985    Name:heketidbstorage
Id:aa77fe2a513ff7008142b103b6360af9    Cluster:ee2fbd7973e8396a23811d54dd8ed985    Name:vol_aa77fe2a513ff7008142b103b6360af9 [block]


sh-4.2# gluster volume list
heketidbstorage
vol_aa77fe2a513ff7008142b103b6360af9

Comment 32 RamaKasturi 2019-01-22 12:19:36 UTC
Hello Mohit,

   Is there a workaround available for this issue ?

Thanks
kasturi

Comment 35 Mohit Agrawal 2019-01-24 07:17:53 UTC
Hi Kasturi,

  I think it is difficult to provide any workaround to avoid the same.

Thanks,
Mohit Agrawal

Comment 42 Atin Mukherjee 2019-02-04 14:32:00 UTC
I have modified the text slightly.

Comment 49 Elvir Kuric 2019-02-08 10:27:48 UTC
*** Bug 1665462 has been marked as a duplicate of this bug. ***

Comment 66 Anjana KD 2019-03-19 05:56:04 UTC
 Hello Atin,

Could you please provide a bug fix doc text (CCFR--> Format) and change the doctype too.

Comment 70 Atin Mukherjee 2019-03-25 14:45:57 UTC
I've updated the doc text to highlight this bug has been fixed and can be captured in the release notes.

Comment 72 Anjana KD 2019-03-25 16:50:47 UTC
have made minor changes. Kindly review and update the flag.

Comment 74 errata-xmlrpc 2019-03-27 06:05:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0667


Note You need to log in before you can comment on or make changes to this bug.