Bug 1464150 - [GSS] Unable to delete snapshot because it's in use
Summary: [GSS] Unable to delete snapshot because it's in use
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: snapshot
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: RHGS 3.4.0
Assignee: Sunny Kumar
QA Contact: Vinayak Papnoi
URL:
Whiteboard:
: 1490699 1616151 (view as bug list)
Depends On: 1482023 1506098
Blocks: 1408949 RHGS-3.4-GSS-proposed-tracker 1503135
TreeView+ depends on / blocked
 
Reported: 2017-06-22 14:12 UTC by Simon Reber
Modified: 2018-12-13 06:44 UTC (History)
14 users (show)

Fixed In Version: glusterfs-3.12.2-2
Doc Type: Enhancement
Doc Text:
GlusterFS used to mount deactivated snapshot(s) under /run/gluster/snaps by default. Furthermore, the snapshot status command should show relevant information for the deactivated snapshot(s). Since we have a mount, there is a possibility that some process may access the mount causing issues while unmounting the volume during the snapshot deletion. This feature assures that GlusterFS does not mount deactivated snapshot(s) and displays the text 'N/A (Deactivated Snapshot)' in Volume Group filed for snapshot status command.
Clone Of:
Environment:
Last Closed: 2018-09-04 06:32:36 UTC


Attachments (Terms of Use)
sosreport from one of the gluster node (19.06 MB, application/x-xz)
2017-06-22 14:12 UTC, Simon Reber
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2607 None None None 2018-09-04 06:34:40 UTC
Github gluster glusterfs issues 276 None None None 2017-07-22 06:25:44 UTC
Red Hat Bugzilla 1467903 None None None 2019-03-22 04:06:45 UTC
Red Hat Knowledge Base (Solution) 3074421 None None None 2017-06-22 14:15:53 UTC

Internal Links: 1467903

Description Simon Reber 2017-06-22 14:12:19 UTC
Created attachment 1290740 [details]
sosreport from one of the gluster node

Description of problem:

Customer is running Red Hat Gluster Server and heavily using snapshot(s). They create and delete snapshot(s) with a self maintained script to meet their policies.

From time to time, they notice that snapshot(s) are not properly deleted on all nodes. Means they need to-do a manual clean-up, removing the volumes and everything else under /var/lib/gluster on their own.

While investigating, we found that one part of the problem is, that the snapshot(s) are automatically mounted under /run/gluster/snaps/<snapshot>. This is causing issues, as processes are accessing these shared (such as monitoring, etc.) and therefore keeping a open file handler.

Since removing a snapshot also means umounting /run/gluster/snaps/<snapshot> it' becomes clear that this operation will fail if some process is accessing the share at this given time.

Messages found in the logs of gluster are as following:

[2017-06-18 09:00:01.762895] I [MSGID: 106091] [glusterd-snapshot.c:6263:glusterd_snapshot_remove_commit] 0-management: Successfully marked snap vol_fast_registry_GMT-2017.06.15-10.03.01 for decommission.
[2017-06-18 09:00:01.763358] W [MSGID: 106073] [glusterd-snapshot.c:2780:glusterd_do_lvm_snapshot_remove] 0-management: Getting the root of the brick for volume 489a7157c2d54d269e6644856202e779 (snap vol_fast_registry_GMT-2017.06.15-10.03.01) failed. Removing lv (/dev/vg_fast_registry/489a7157c2d54d269e6644856202e779_0).
[2017-06-18 09:00:01.783014] E [MSGID: 106044] [glusterd-snapshot.c:2834:glusterd_do_lvm_snapshot_remove] 0-management: removing snapshot of the brick (glusternode03a:/run/gluster/snaps/489a7157c2d54d269e6644856202e779/brick1/registry) of device /dev/vg_fast_registry/489a7157c2d54d269e6644856202e779_0 failed
[2017-06-18 09:00:01.783044] E [MSGID: 106044] [glusterd-snapshot.c:2962:glusterd_lvm_snapshot_remove] 0-management: Failed to remove the snapshot /run/gluster/snaps/489a7157c2d54d269e6644856202e779/brick1/registry (/dev/vg_fast_registry/489a7157c2d54d269e6644856202e779_0)
[2017-06-18 09:00:01.783112] W [MSGID: 106033] [glusterd-snapshot.c:3008:glusterd_lvm_snapshot_remove] 0-management: Failed to rmdir: /run/gluster/snaps/489a7157c2d54d269e6644856202e779/, err: Directory not empty. More than one glusterd running on this node. [Directory not empty]
[2017-06-18 09:00:01.783124] W [MSGID: 106044] [glusterd-snapshot.c:3079:glusterd_snap_volume_remove] 0-management: Failed to remove lvm snapshot volume 489a7157c2d54d269e6644856202e779
[2017-06-18 09:00:01.783133] W [MSGID: 106044] [glusterd-snapshot.c:3154:glusterd_snap_remove] 0-management: Failed to remove volinfo 489a7157c2d54d269e6644856202e779 for snap vol_fast_registry_GMT-2017.06.15-10.03.01
[2017-06-18 09:00:01.783151] E [MSGID: 106044] [glusterd-snapshot.c:6298:glusterd_snapshot_remove_commit] 0-management: Failed to remove snap vol_fast_registry_GMT-2017.06.15-10.03.01
[2017-06-18 09:00:01.783159] E [MSGID: 106044] [glusterd-snapshot.c:8308:glusterd_snapshot] 0-management: Failed to delete snapshot
[2017-06-18 09:00:01.783169] W [MSGID: 106123] [glusterd-mgmt.c:272:gd_mgmt_v3_commit_fn] 0-management: Snapshot Commit Failed
[2017-06-18 09:00:01.783175] E [MSGID: 106123] [glusterd-mgmt.c:1414:glusterd_mgmt_v3_commit] 0-management: Commit failed for operation Snapshot on local node


and on the other node.


[2017-06-18 09:00:02.965240] E [MSGID: 106095] [glusterd-snapshot-utils.c:3365:glusterd_umount] 0-management: umounting /run/gluster/snaps/482f7bb2668440e5908b1cf7e32247e8/brick2 failed (Bad file descriptor) [Bad file descriptor]
[2017-06-18 09:00:05.972554] E [MSGID: 106038] [glusterd-snapshot.c:2818:glusterd_do_lvm_snapshot_remove] 0-management: umount failed for path /run/gluster/snaps/482f7bb2668440e5908b1cf7e32247e8/brick2 (brick: /run/gluster/snaps/482f7bb2668440e5908b1cf7e32247e8/brick2/registry): Bad file descriptor.
[2017-06-18 09:00:05.972602] E [MSGID: 106044] [glusterd-snapshot.c:2962:glusterd_lvm_snapshot_remove] 0-management: Failed to remove the snapshot /run/gluster/snaps/482f7bb2668440e5908b1cf7e32247e8/brick2/registry (/dev/vg_fast_registry/482f7bb2668440e5908b1cf7e32247e8_0)
[2017-06-18 09:00:10.780390] W [MSGID: 106033] [glusterd-snapshot.c:3008:glusterd_lvm_snapshot_remove] 0-management: Failed to rmdir: /run/gluster/snaps/482f7bb2668440e5908b1cf7e32247e8/, err: Directory not empty. More than one glusterd running on this node. [Directory not empty]
[2017-06-18 09:00:10.780424] W [MSGID: 106044] [glusterd-snapshot.c:3079:glusterd_snap_volume_remove] 0-management: Failed to remove lvm snapshot volume 482f7bb2668440e5908b1cf7e32247e8
[2017-06-18 09:00:10.780436] W [MSGID: 106044] [glusterd-snapshot.c:3154:glusterd_snap_remove] 0-management: Failed to remove volinfo 482f7bb2668440e5908b1cf7e32247e8 for snap vol_fast_registry_GMT-2017.06.17-10.03.01
[...]
The message "E [MSGID: 106095] [glusterd-snapshot-utils.c:3365:glusterd_umount] 0-management: umounting /run/gluster/snaps/482f7bb2668440e5908b1cf7e32247e8/brick2 failed (Bad file descriptor) [Bad file descriptor]" repeated 2 times between [2017-06-18 09:00:02.965240] and [2017-06-18 09:00:04.972387]


The question is, is there a way to prevent this from happening. Avoiding to mount them would certainly be a good option. But so far I was unable to find an configuration option that would allow that.

Version-Release number of selected component (if applicable):

 - glusterfs-3.7.9-12.el7rhgs.x86_64

How reproducible:

randomly

Steps to Reproduce:
1. Access /run/gluster/snaps/<snapshot> when a snapshot is scheduled to be deleted
2. There might be other ways, but so far I was not able to reproduce it
3.

Actual results:

snapshot is not properly removed from all Red Hat Gluster Server nodes

Expected results:

snapshot correctly removed. Also if that means that snapshots are not automatically mounted.

Additional info:

Comment 3 Mohammed Rafi KC 2017-06-23 06:44:33 UTC
If I understand the problem correctly, some external application has opened a fd in snapshot mount path when a snapshot delete is scheduled. Because of the open fd, glusterd couldn't unmount the path, hence snapshot delete is failed. Please correct me if I'm wrong.

We need to have the snapshot brick mounted when we activate snapshot. In Snapshot create, we do all the pre-requests to create a volume (snapshot can be considered as a read-only gluster volume) like creating the bricks, setting required xattr if required etc, like a normal volume. But this can be revisited.

But I assume the problem here is that an external application (in this case, monitoring tool) accessed gluster mount point which prevented the unmount operation.

@rbhat,

Do you think we need to club brick mounting till the snapshot is activated ?

Comment 11 Mohammed Rafi KC 2017-09-21 07:03:52 UTC
Upstream patch : https://review.gluster.org/18047

Comment 12 Simon Reber 2017-10-06 07:32:07 UTC
*** Bug 1490699 has been marked as a duplicate of this bug. ***

Comment 13 Sunny Kumar 2017-10-23 10:35:50 UTC
Upstream patch : https://review.gluster.org/#/c/18047/

Comment 16 Vinayak Papnoi 2018-03-01 09:58:52 UTC
Build : glusterfs-3.12.2-4.el7rhgs.x86_64

Newly created snapshots aren't mounted unless they are activated.
Deletion of snapshots is successful even after accessing /var/run/gluster/snaps during the deletion.

Hence, moving bug to verified.

Comment 17 Prashant Dhange 2018-08-27 03:40:33 UTC
*** Bug 1616151 has been marked as a duplicate of this bug. ***

Comment 19 errata-xmlrpc 2018-09-04 06:32:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607


Note You need to log in before you can comment on or make changes to this bug.