Description of problem: Clean up of bricks seems to have failed when pvc delete request was made. As a result, there is a mismatch between heketi-cli and gluster vol list heketi-cli volume list Id:1b1807803494b5758aa4d50c12bbeffd Cluster:c480c9218e06639fae360206cb3dd35c Name:vol_1b1807803494b5758aa4d50c12bbeffd Id:d2aec01c6684e54b6af8609f1942afb8 Cluster:c480c9218e06639fae360206cb3dd35c Name:vol_d2aec01c6684e54b6af8609f1942afb8 Id:d75d08e2bee4282e01d6ae741a85c310 Cluster:c480c9218e06639fae360206cb3dd35c Name:vol_d75d08e2bee4282e01d6ae741a85c310 Id:d774e393d622b3679c353544a3be312d Cluster:c480c9218e06639fae360206cb3dd35c Name:heketidbstorage sh-4.2# gluster vol list heketidbstorage vol_d75d08e2bee4282e01d6ae741a85c310 When one of the volume was tried to delete manually, it fails with the following error. [root@dhcp46-68 yum.repos.d]# heketi-cli volume delete d2aec01c6684e54b6af8609f1942afb8 Error: database is in read-only mode Please note that volume deletion was successful earlier on a freshly created volume (lvs were deleted successfully). snippet of heketi log: [kubeexec] ERROR 2017/06/15 04:26:10 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:247: Failed to run command [umount /var/lib/heketi/mounts/vg_143748d92cb97267ad492ea6468fd709/brick_d096f7c35eb07 128c1530f057df1ccda] on glusterfs-06tt3: Err[command terminated with exit code 32]: Stdout []: Stderr [umount: /var/lib/heketi/mounts/vg_143748d92cb97267ad492ea6468fd709/brick_d096f7c35eb07128c1530f057df1ccda: t arget is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) ] [sshexec] ERROR 2017/06/15 04:26:10 /src/github.com/heketi/heketi/executors/sshexec/brick.go:134: Unable to execute command on glusterfs-06tt3: umount: /var/lib/heketi/mounts/vg_143748d92cb97267ad492ea6468fd709/ brick_d096f7c35eb07128c1530f057df1ccda: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) [kubeexec] ERROR 2017/06/15 04:26:10 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:247: Failed to run command [umount /var/lib/heketi/mounts/vg_4d7456ad6d536fae35432796265a24db/brick_4a98800162558 b62b859d6736238a99f] on glusterfs-2t1t4: Err[command terminated with exit code 32]: Stdout []: Stderr [umount: /var/lib/heketi/mounts/vg_4d7456ad6d536fae35432796265a24db/brick_4a98800162558b62b859d6736238a99f: t arget is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) ] [sshexec] ERROR 2017/06/15 04:26:10 /src/github.com/heketi/heketi/executors/sshexec/brick.go:134: Unable to execute command on glusterfs-2t1t4: umount: /var/lib/heketi/mounts/vg_4d7456ad6d536fae35432796265a24db/ brick_4a98800162558b62b859d6736238a99f: target is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) [kubeexec] ERROR 2017/06/15 04:26:10 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:247: Failed to run command [umount /var/lib/heketi/mounts/vg_5ae8497770f3cda6af559c109a3325c6/brick_c1e2139ffd057 1e31fc0353068e2998e] on glusterfs-6ffjk: Err[command terminated with exit code 32]: Stdout []: Stderr [umount: /var/lib/heketi/mounts/vg_5ae8497770f3cda6af559c109a3325c6/brick_c1e2139ffd0571e31fc0353068e2998e: t arget is busy. Version-Release number of selected component (if applicable): rpm -qa | grep 'heketi' heketi-client-5.0.0-1.el7rhgs.x86_64 rhgs3/rhgs-volmanager-rhel7:3.3.0-1 How reproducible: 1/1 Steps to Reproduce: 1. create 3 volumes (have at least one 2x3 vol type) 2. enable brick multiplexing 3. restart the volume 4. gluster pod restart 5. delete the pvc Actual results: volume is not cleaned up Expected results: proper deletion and clean up of volume Additional info:
Karthick, Is it with brick multiplex on ? If yes, this is a known issue.
(In reply to Humble Chirammal from comment #2) > Karthick, Is it with brick multiplex on ? If yes, this is a known issue. Can you please list the version of gluster container here? Also please refer: https://bugzilla.redhat.com/show_bug.cgi?id=1444749
krk, can you also trace the references as https://bugzilla.redhat.com/show_bug.cgi?id=1444749#c18 and c#19 ?
gluster version: glusterfs-client-xlators-3.8.4-27.el7rhgs.x86_64 glusterfs-cli-3.8.4-27.el7rhgs.x86_64 glusterfs-server-3.8.4-27.el7rhgs.x86_64 glusterfs-libs-3.8.4-27.el7rhgs.x86_64 glusterfs-3.8.4-27.el7rhgs.x86_64 glusterfs-api-3.8.4-27.el7rhgs.x86_64 glusterfs-fuse-3.8.4-27.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-27.el7rhgs.x86_64 gluster-block-0.2-3.el7rhgs.x86_64 logs are available here --> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1461647/
I logged into this setup and I can see the brick path is still referenced which cause the umount to fail. sh-4.2# ls -l /proc/464/fd/|grep d096f7c35 lr-x------. 1 root root 64 Jun 15 06:34 32 -> /var/lib/heketi/mounts/vg_143748d92cb97267ad492ea6468fd709/brick_d096f7c35eb07128c1530f057df1ccda/brick
(In reply to Humble Chirammal from comment #6) > I logged into this setup and I can see the brick path is still referenced > which cause the umount to fail. > > sh-4.2# ls -l /proc/464/fd/|grep d096f7c35 > lr-x------. 1 root root 64 Jun 15 06:34 32 -> > /var/lib/heketi/mounts/vg_143748d92cb97267ad492ea6468fd709/ > brick_d096f7c35eb07128c1530f057df1ccda/brick https://bugzilla.redhat.com/show_bug.cgi?id=1444749#c19 has 4 references and one of them is same reference as above.
from the container rpm -qa | grep gluster glusterfs-client-xlators-3.8.4-27.el7rhgs.x86_64 glusterfs-cli-3.8.4-27.el7rhgs.x86_64 glusterfs-server-3.8.4-27.el7rhgs.x86_64 glusterfs-libs-3.8.4-27.el7rhgs.x86_64 glusterfs-3.8.4-27.el7rhgs.x86_64 glusterfs-api-3.8.4-27.el7rhgs.x86_64 glusterfs-fuse-3.8.4-27.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-27.el7rhgs.x86_64 gluster-block-0.2-3.el7rhgs.x86_64 Mohit has confirmed that the fix for this bug was available in 27 but due to another bug the fix won't work. The other bug is fixed in build 28 and would unblock the QE from testing this bug. I have made this depend on the other bug.
Changing to ON_QA based on the irc disucsion with karthick.
This issue is not seen with the RHGS image - rhgs3/rhgs-server-rhel7:3.3.0-7 volume delete works as expected. Moving the bug to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:2879