1461647 – volume delete in heketi fails

Bug 1461647 - volume delete in heketi fails

Summary: volume delete in heketi fails

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	heketi
Sub Component:
Version:	cns-3.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	CNS 3.6
Assignee:	Humble Chirammal
QA Contact:	krishnaram Karthick
Docs Contact:
URL:
Whiteboard:
Depends On:	1451602
Blocks:	1445448
TreeView+	depends on / blocked

Reported:	2017-06-15 04:52 UTC by krishnaram Karthick
Modified:	2018-11-16 07:15 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-10-11 07:07:22 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2017:2879	0	normal	SHIPPED_LIVE	heketi bug fix and enhancement update	2017-10-11 11:07:06 UTC

Description krishnaram Karthick 2017-06-15 04:52:05 UTC

Description of problem:

Clean up of bricks seems to have failed when pvc delete request was made. As a result, there is a mismatch between heketi-cli and gluster vol list

heketi-cli volume list
Id:1b1807803494b5758aa4d50c12bbeffd    Cluster:c480c9218e06639fae360206cb3dd35c    Name:vol_1b1807803494b5758aa4d50c12bbeffd
Id:d2aec01c6684e54b6af8609f1942afb8    Cluster:c480c9218e06639fae360206cb3dd35c    Name:vol_d2aec01c6684e54b6af8609f1942afb8
Id:d75d08e2bee4282e01d6ae741a85c310    Cluster:c480c9218e06639fae360206cb3dd35c    Name:vol_d75d08e2bee4282e01d6ae741a85c310
Id:d774e393d622b3679c353544a3be312d    Cluster:c480c9218e06639fae360206cb3dd35c    Name:heketidbstorage

sh-4.2# gluster vol list
heketidbstorage
vol_d75d08e2bee4282e01d6ae741a85c310

When one of the volume was tried to delete manually, it fails with the following error.

[root@dhcp46-68 yum.repos.d]# heketi-cli volume delete d2aec01c6684e54b6af8609f1942afb8
Error: database is in read-only mode

Please note that volume deletion was successful earlier on a freshly created volume (lvs were deleted successfully).

snippet of heketi log:

[kubeexec] ERROR 2017/06/15 04:26:10 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:247: Failed to run command [umount /var/lib/heketi/mounts/vg_143748d92cb97267ad492ea6468fd709/brick_d096f7c35eb07
128c1530f057df1ccda] on glusterfs-06tt3: Err[command terminated with exit code 32]: Stdout []: Stderr [umount: /var/lib/heketi/mounts/vg_143748d92cb97267ad492ea6468fd709/brick_d096f7c35eb07128c1530f057df1ccda: t
arget is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
]
[sshexec] ERROR 2017/06/15 04:26:10 /src/github.com/heketi/heketi/executors/sshexec/brick.go:134: Unable to execute command on glusterfs-06tt3: umount: /var/lib/heketi/mounts/vg_143748d92cb97267ad492ea6468fd709/
brick_d096f7c35eb07128c1530f057df1ccda: target is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
[kubeexec] ERROR 2017/06/15 04:26:10 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:247: Failed to run command [umount /var/lib/heketi/mounts/vg_4d7456ad6d536fae35432796265a24db/brick_4a98800162558
b62b859d6736238a99f] on glusterfs-2t1t4: Err[command terminated with exit code 32]: Stdout []: Stderr [umount: /var/lib/heketi/mounts/vg_4d7456ad6d536fae35432796265a24db/brick_4a98800162558b62b859d6736238a99f: t
arget is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
]
[sshexec] ERROR 2017/06/15 04:26:10 /src/github.com/heketi/heketi/executors/sshexec/brick.go:134: Unable to execute command on glusterfs-2t1t4: umount: /var/lib/heketi/mounts/vg_4d7456ad6d536fae35432796265a24db/
brick_4a98800162558b62b859d6736238a99f: target is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
[kubeexec] ERROR 2017/06/15 04:26:10 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:247: Failed to run command [umount /var/lib/heketi/mounts/vg_5ae8497770f3cda6af559c109a3325c6/brick_c1e2139ffd057
1e31fc0353068e2998e] on glusterfs-6ffjk: Err[command terminated with exit code 32]: Stdout []: Stderr [umount: /var/lib/heketi/mounts/vg_5ae8497770f3cda6af559c109a3325c6/brick_c1e2139ffd0571e31fc0353068e2998e: t
arget is busy.


Version-Release number of selected component (if applicable):
rpm -qa | grep 'heketi'
heketi-client-5.0.0-1.el7rhgs.x86_64
rhgs3/rhgs-volmanager-rhel7:3.3.0-1

How reproducible:
1/1

Steps to Reproduce:
1. create 3 volumes (have at least one 2x3 vol type)
2. enable brick multiplexing
3. restart the volume
4. gluster pod restart
5. delete the pvc

Actual results:
volume is not cleaned up

Expected results:
proper deletion and clean up of volume

Additional info:

Comment 2 Humble Chirammal 2017-06-15 05:29:45 UTC

Karthick, Is it with brick multiplex on ? If yes, this is a known issue.

Comment 3 Humble Chirammal 2017-06-15 05:31:28 UTC

(In reply to Humble Chirammal from comment #2)
> Karthick, Is it with brick multiplex on ? If yes, this is a known issue.

Can you please list the version of gluster container here? Also please refer:

https://bugzilla.redhat.com/show_bug.cgi?id=1444749

Comment 4 Humble Chirammal 2017-06-15 05:53:08 UTC

krk, can you also trace the references as https://bugzilla.redhat.com/show_bug.cgi?id=1444749#c18 and c#19 ?

Comment 5 krishnaram Karthick 2017-06-15 05:55:26 UTC

gluster version:

glusterfs-client-xlators-3.8.4-27.el7rhgs.x86_64
glusterfs-cli-3.8.4-27.el7rhgs.x86_64
glusterfs-server-3.8.4-27.el7rhgs.x86_64
glusterfs-libs-3.8.4-27.el7rhgs.x86_64
glusterfs-3.8.4-27.el7rhgs.x86_64
glusterfs-api-3.8.4-27.el7rhgs.x86_64
glusterfs-fuse-3.8.4-27.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-27.el7rhgs.x86_64
gluster-block-0.2-3.el7rhgs.x86_64

logs are available here --> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1461647/

Comment 6 Humble Chirammal 2017-06-15 06:38:58 UTC

I logged into this setup and I can see the brick path is still referenced which cause the umount to fail.

sh-4.2# ls -l /proc/464/fd/|grep d096f7c35
lr-x------. 1 root root 64 Jun 15 06:34 32 -> /var/lib/heketi/mounts/vg_143748d92cb97267ad492ea6468fd709/brick_d096f7c35eb07128c1530f057df1ccda/brick

Comment 7 Humble Chirammal 2017-06-15 06:43:56 UTC

(In reply to Humble Chirammal from comment #6)
> I logged into this setup and I can see the brick path is still referenced
> which cause the umount to fail.
> 
> sh-4.2# ls -l /proc/464/fd/|grep d096f7c35
> lr-x------. 1 root root 64 Jun 15 06:34 32 ->
> /var/lib/heketi/mounts/vg_143748d92cb97267ad492ea6468fd709/
> brick_d096f7c35eb07128c1530f057df1ccda/brick

https://bugzilla.redhat.com/show_bug.cgi?id=1444749#c19 has 4 references and one of them is same reference as above.

Comment 8 Raghavendra Talur 2017-06-15 07:25:19 UTC

from the container 
 rpm -qa | grep gluster
glusterfs-client-xlators-3.8.4-27.el7rhgs.x86_64
glusterfs-cli-3.8.4-27.el7rhgs.x86_64
glusterfs-server-3.8.4-27.el7rhgs.x86_64
glusterfs-libs-3.8.4-27.el7rhgs.x86_64
glusterfs-3.8.4-27.el7rhgs.x86_64
glusterfs-api-3.8.4-27.el7rhgs.x86_64
glusterfs-fuse-3.8.4-27.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-27.el7rhgs.x86_64
gluster-block-0.2-3.el7rhgs.x86_64


Mohit has confirmed that the fix for this bug was available in 27 but due to another bug the fix won't work. The other bug is fixed in build 28 and would unblock the QE from testing this bug.

I have made this depend on the other bug.

Comment 10 Humble Chirammal 2017-07-25 08:09:48 UTC

Changing to ON_QA based on the irc disucsion with karthick.

Comment 11 krishnaram Karthick 2017-07-25 08:13:49 UTC

This issue is not seen with the RHGS image - rhgs3/rhgs-server-rhel7:3.3.0-7
volume delete works as expected.

Moving the bug to verified.

Comment 21 errata-xmlrpc 2017-10-11 07:07:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2879

Note You need to log in before you can comment on or make changes to this bug.