889630 – gluster volume delete is not atomic

Bug 889630 - gluster volume delete is not atomic

Summary: gluster volume delete is not atomic

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	3.3.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Krutika Dhananjay
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	902215
TreeView+	depends on / blocked

Reported:	2012-12-22 00:36 UTC by David Bronaugh
Modified:	2013-07-24 17:45 UTC (History)
CC List:	3 users (show)
Fixed In Version:	glusterfs-3.4.0
Clone Of:
Clones:	902215 (view as bug list)
Environment:
Last Closed:	2013-07-24 17:45:12 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description David Bronaugh 2012-12-22 00:36:30 UTC

Description of problem:
One can run "gluster volume delete <vol>" on a client (nothing wrong with that, I presume). Under some circumstances, it appears this can fail:

root@medusa:/var/lib/glusterd# gluster volume delete skateboard0
Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y
Operation failed on skateboard-ib

However, it doesn't completely fail. It deletes part, but not all, of the associated metadata for the volume, leaving behind just enough that the server refuses to start:

[2012-12-21 11:43:28.594814] E [glusterd-store.c:1320:glusterd_store_handle_retrieve] 0-glusterd: Unable to retrieve store handle for /var/lib/glusterd/vols/skateboard0/info, error: No such file or directory
[2012-12-21 11:43:28.594879] E [glusterd-store.c:2184:glusterd_store_retrieve_volumes] 0-: Unable to restore volume: skateboard0
[2012-12-21 11:43:28.594914] E [xlator.c:385:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2012-12-21 11:43:28.594935] E [graph.c:294:glusterfs_graph_init] 0-management: initializing translator failed
[2012-12-21 11:43:28.594951] E [graph.c:483:glusterfs_graph_activate] 0-graph: init failed
[2012-12-21 11:43:28.595233] W [glusterfsd.c:831:cleanup_and_exit] (-->/usr/sbin/glusterd(main+0x32e) [0x7f8e1ed55b2e] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0x191) [0x7f8e1ed58ad1] (-->/usr/sbin/glusterd(glusterfs_process_volfp+0x192) [0x7f8e1ed58932]))) 0-: received signum (0), shutting down

This is because, among other things, the directory /var/lib/glusterd/vols/skateboard0/ was not empty (it contained a pid file in the 'run' directory). 

Please ensure that operations such as volume deletes are fully atomic: ie, they either fully succeed, or fully fail; not some unhappy in between state that causes the system to be unavailable.

Version-Release number of selected component (if applicable): 3.3.1

Comment 1 Amar Tumballi 2012-12-24 06:12:12 UTC

David, thanks for the report, and we too think this has to be fixed soon. Meantime, can you see the behavior in 3.4.0qa (qa6 is latest at the moment) releases?

KP, any idea if this is already fixed in master branch?

Comment 2 Krutika Dhananjay 2013-01-29 07:27:11 UTC

Hi David,

Could you please find out if it was "swift.pid" that you said you found in /var/lib/glusterd/vols/skateboard0/run/ ?

Thanks.

Comment 3 Vijay Bellur 2013-03-11 21:07:34 UTC

CHANGE: http://review.gluster.org/4639 (glusterd: Mark vol as deleted by renaming voldir before cleaning up the store) merged in master by Anand Avati (avati)

Note You need to log in before you can comment on or make changes to this bug.