Red Hat Bugzilla – Bug 1463191
gfapi: discard glfs object when volume is deleted
Last modified: 2017-09-04 07:20:35 EDT
Description of problem:
currently, once we have a glfs object in hand for a given volume, after deleting and recreating the volume with same name, we still can access new volume using the old glfs object, which is wrong.
Version-Release number of selected component (if applicable):
1. write a gfapi program, once you are done calling glfs_init() try creating a file in the volume, now apply break-point there.
2. delete the volume and recreate the volume with the same name.
3. now continue with your program, in the next lines try creating another file in the volume using the same old glfs object
4. surprisingly it allows to create.
My use-case was more like calling glfs_get_volumeid() returns old volume id rather than throwing an error which should say glfs object is not valid or worst case return new volume id, but in my case it returned old uuid.
Refer https://bugzilla.redhat.com/show_bug.cgi?id=1461808#c9 for some more interesting context and sample programs.
with old glfs object we still can access new volume
return invalid object.
This problem does not look like it is unique to gfapi, FUSE will most likely have the same issue. Once a connection to a brick is dropped, a re-connect should most likely verify if the volume-id is the same as before.
I think there are several approaches to solving this:
1. glusterd initiated through GF_CBK_EVENT_NOTIFY or similar (on volume delete)
2. handling this in protocol/client:RPC_CLNT_CONNECT and client_handshake()
3. in the master xlators like fuse and gfapi
Whatever is picked, any subsequent usage of the deleted volume should result in ESTALE errors.
My current preference goes to doing this in protocol/client. What do others think?
Reproduced the bug using python-bindings. will be working on fix.
Created attachment 1314099 [details]
sample patch file for review.
I had the fix in gfapi code. The actual fix is in glfs-mgmt.c which has call to glfs_get_volume_info() which has callback that checks whether vol id is same and sets the errno .
This fix can also be in xlator/protocol/client, we have to register new handshake protocol entry for fetching volume id.
clnt_handshake_procs has to have entry GF_HNDSK_GET_VOLUME_INFO and repective call and callbacks. as adding entry to protocol would be sensitive, I made changes in gfapi code only.
REVIEW: https://review.gluster.org/18064 (gfapi: Fix for bug 1463191) posted (#1) for review on master by Anonymous Coward
REVIEW: https://review.gluster.org/18064 (Discard glfs object if volume is recreated) posted (#2) for review on master by Anonymous Coward
REVIEW: https://review.gluster.org/18064 (Discard glfs object if volume is recreated) posted (#3) for review on master by Anonymous Coward
REVIEW: https://review.gluster.org/18064 (gfapi: mark glfs object as bad if volume is re-created) posted (#4) for review on master by Anonymous Coward
steps to reproduce the issue:
here is gfapi python program , volume name is "test2", hostname is "gluster2".
host 1. in client machine
>>> from gluster import gfapi
>>> volume = gfapi.Volume('gluster2','test2')
host 2.in server machine, that is gluster2
lets delete the test2 volume on server side and re-create the volume
gluster2 # gluster volume stop test2
gluster2 # gluster volume delete test2
gluster2 # rm -rf /storage/brick3/test2vol
gluster2 # mkdir /storage/brick3/test2vol
gluster2 # gluster volume create test2 gluster2:/storage/brick3/test2vol
gluster2 # gluster volume start test2
host 1. come back to client machine, continue python program, we have volume object here. you will be able to create dir on old volume obj which should be discarded.
This shows that if volume is re-created with same name on server side, client program can access using old volume object. the fix to bug raises ENXIO err.
REVIEW: https://review.gluster.org/18064 (gfapi: mark glfs object as bad if volume is re-created) posted (#5) for review on master by Venkata Ramarao Edara (email@example.com)