Bug 1461647

Summary: volume delete in heketi fails
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: krishnaram Karthick <kramdoss>
Component: heketiAssignee: Humble Chirammal <hchiramm>
Status: CLOSED ERRATA QA Contact: krishnaram Karthick <kramdoss>
Severity: high Docs Contact:
Priority: unspecified    
Version: cns-3.6CC: hchiramm, kramdoss, madam, mliyazud, pprakash, rhs-bugs, rtalur, storage-qa-internal, vinug
Target Milestone: ---Keywords: Reopened
Target Release: CNS 3.6   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-11 07:07:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1451602    
Bug Blocks: 1445448    

Description krishnaram Karthick 2017-06-15 04:52:05 UTC
Description of problem:

Clean up of bricks seems to have failed when pvc delete request was made. As a result, there is a mismatch between heketi-cli and gluster vol list

heketi-cli volume list
Id:1b1807803494b5758aa4d50c12bbeffd    Cluster:c480c9218e06639fae360206cb3dd35c    Name:vol_1b1807803494b5758aa4d50c12bbeffd
Id:d2aec01c6684e54b6af8609f1942afb8    Cluster:c480c9218e06639fae360206cb3dd35c    Name:vol_d2aec01c6684e54b6af8609f1942afb8
Id:d75d08e2bee4282e01d6ae741a85c310    Cluster:c480c9218e06639fae360206cb3dd35c    Name:vol_d75d08e2bee4282e01d6ae741a85c310
Id:d774e393d622b3679c353544a3be312d    Cluster:c480c9218e06639fae360206cb3dd35c    Name:heketidbstorage

sh-4.2# gluster vol list
heketidbstorage
vol_d75d08e2bee4282e01d6ae741a85c310

When one of the volume was tried to delete manually, it fails with the following error.

[root@dhcp46-68 yum.repos.d]# heketi-cli volume delete d2aec01c6684e54b6af8609f1942afb8
Error: database is in read-only mode

Please note that volume deletion was successful earlier on a freshly created volume (lvs were deleted successfully).

snippet of heketi log:

[kubeexec] ERROR 2017/06/15 04:26:10 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:247: Failed to run command [umount /var/lib/heketi/mounts/vg_143748d92cb97267ad492ea6468fd709/brick_d096f7c35eb07
128c1530f057df1ccda] on glusterfs-06tt3: Err[command terminated with exit code 32]: Stdout []: Stderr [umount: /var/lib/heketi/mounts/vg_143748d92cb97267ad492ea6468fd709/brick_d096f7c35eb07128c1530f057df1ccda: t
arget is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
]
[sshexec] ERROR 2017/06/15 04:26:10 /src/github.com/heketi/heketi/executors/sshexec/brick.go:134: Unable to execute command on glusterfs-06tt3: umount: /var/lib/heketi/mounts/vg_143748d92cb97267ad492ea6468fd709/
brick_d096f7c35eb07128c1530f057df1ccda: target is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
[kubeexec] ERROR 2017/06/15 04:26:10 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:247: Failed to run command [umount /var/lib/heketi/mounts/vg_4d7456ad6d536fae35432796265a24db/brick_4a98800162558
b62b859d6736238a99f] on glusterfs-2t1t4: Err[command terminated with exit code 32]: Stdout []: Stderr [umount: /var/lib/heketi/mounts/vg_4d7456ad6d536fae35432796265a24db/brick_4a98800162558b62b859d6736238a99f: t
arget is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
]
[sshexec] ERROR 2017/06/15 04:26:10 /src/github.com/heketi/heketi/executors/sshexec/brick.go:134: Unable to execute command on glusterfs-2t1t4: umount: /var/lib/heketi/mounts/vg_4d7456ad6d536fae35432796265a24db/
brick_4a98800162558b62b859d6736238a99f: target is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
[kubeexec] ERROR 2017/06/15 04:26:10 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:247: Failed to run command [umount /var/lib/heketi/mounts/vg_5ae8497770f3cda6af559c109a3325c6/brick_c1e2139ffd057
1e31fc0353068e2998e] on glusterfs-6ffjk: Err[command terminated with exit code 32]: Stdout []: Stderr [umount: /var/lib/heketi/mounts/vg_5ae8497770f3cda6af559c109a3325c6/brick_c1e2139ffd0571e31fc0353068e2998e: t
arget is busy.


Version-Release number of selected component (if applicable):
rpm -qa | grep 'heketi'
heketi-client-5.0.0-1.el7rhgs.x86_64
rhgs3/rhgs-volmanager-rhel7:3.3.0-1

How reproducible:
1/1

Steps to Reproduce:
1. create 3 volumes (have at least one 2x3 vol type)
2. enable brick multiplexing
3. restart the volume
4. gluster pod restart
5. delete the pvc

Actual results:
volume is not cleaned up

Expected results:
proper deletion and clean up of volume

Additional info:

Comment 2 Humble Chirammal 2017-06-15 05:29:45 UTC
Karthick, Is it with brick multiplex on ? If yes, this is a known issue.

Comment 3 Humble Chirammal 2017-06-15 05:31:28 UTC
(In reply to Humble Chirammal from comment #2)
> Karthick, Is it with brick multiplex on ? If yes, this is a known issue.

Can you please list the version of gluster container here? Also please refer:

https://bugzilla.redhat.com/show_bug.cgi?id=1444749

Comment 4 Humble Chirammal 2017-06-15 05:53:08 UTC
krk, can you also trace the references as https://bugzilla.redhat.com/show_bug.cgi?id=1444749#c18 and c#19 ?

Comment 5 krishnaram Karthick 2017-06-15 05:55:26 UTC
gluster version:

glusterfs-client-xlators-3.8.4-27.el7rhgs.x86_64
glusterfs-cli-3.8.4-27.el7rhgs.x86_64
glusterfs-server-3.8.4-27.el7rhgs.x86_64
glusterfs-libs-3.8.4-27.el7rhgs.x86_64
glusterfs-3.8.4-27.el7rhgs.x86_64
glusterfs-api-3.8.4-27.el7rhgs.x86_64
glusterfs-fuse-3.8.4-27.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-27.el7rhgs.x86_64
gluster-block-0.2-3.el7rhgs.x86_64

logs are available here --> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1461647/

Comment 6 Humble Chirammal 2017-06-15 06:38:58 UTC
I logged into this setup and I can see the brick path is still referenced which cause the umount to fail.

sh-4.2# ls -l /proc/464/fd/|grep d096f7c35
lr-x------. 1 root root 64 Jun 15 06:34 32 -> /var/lib/heketi/mounts/vg_143748d92cb97267ad492ea6468fd709/brick_d096f7c35eb07128c1530f057df1ccda/brick

Comment 7 Humble Chirammal 2017-06-15 06:43:56 UTC
(In reply to Humble Chirammal from comment #6)
> I logged into this setup and I can see the brick path is still referenced
> which cause the umount to fail.
> 
> sh-4.2# ls -l /proc/464/fd/|grep d096f7c35
> lr-x------. 1 root root 64 Jun 15 06:34 32 ->
> /var/lib/heketi/mounts/vg_143748d92cb97267ad492ea6468fd709/
> brick_d096f7c35eb07128c1530f057df1ccda/brick

https://bugzilla.redhat.com/show_bug.cgi?id=1444749#c19 has 4 references and one of them is same reference as above.

Comment 8 Raghavendra Talur 2017-06-15 07:25:19 UTC
from the container 
 rpm -qa | grep gluster
glusterfs-client-xlators-3.8.4-27.el7rhgs.x86_64
glusterfs-cli-3.8.4-27.el7rhgs.x86_64
glusterfs-server-3.8.4-27.el7rhgs.x86_64
glusterfs-libs-3.8.4-27.el7rhgs.x86_64
glusterfs-3.8.4-27.el7rhgs.x86_64
glusterfs-api-3.8.4-27.el7rhgs.x86_64
glusterfs-fuse-3.8.4-27.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-27.el7rhgs.x86_64
gluster-block-0.2-3.el7rhgs.x86_64


Mohit has confirmed that the fix for this bug was available in 27 but due to another bug the fix won't work. The other bug is fixed in build 28 and would unblock the QE from testing this bug.

I have made this depend on the other bug.

Comment 10 Humble Chirammal 2017-07-25 08:09:48 UTC
Changing to ON_QA based on the irc disucsion with karthick.

Comment 11 krishnaram Karthick 2017-07-25 08:13:49 UTC
This issue is not seen with the RHGS image - rhgs3/rhgs-server-rhel7:3.3.0-7
volume delete works as expected.

Moving the bug to verified.

Comment 21 errata-xmlrpc 2017-10-11 07:07:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2879