Description of problem: currently, once we have a glfs object in hand for a given volume, after deleting and recreating the volume with same name, we still can access new volume using the old glfs object, which is wrong. Version-Release number of selected component (if applicable): mainline How reproducible: 1. write a gfapi program, once you are done calling glfs_init() try creating a file in the volume, now apply break-point there. 2. delete the volume and recreate the volume with the same name. 3. now continue with your program, in the next lines try creating another file in the volume using the same old glfs object 4. surprisingly it allows to create. My use-case was more like calling glfs_get_volumeid() returns old volume id rather than throwing an error which should say glfs object is not valid or worst case return new volume id, but in my case it returned old uuid. Refer https://bugzilla.redhat.com/show_bug.cgi?id=1461808#c9 for some more interesting context and sample programs. Actual results: with old glfs object we still can access new volume Expected results: return invalid object.
This problem does not look like it is unique to gfapi, FUSE will most likely have the same issue. Once a connection to a brick is dropped, a re-connect should most likely verify if the volume-id is the same as before. I think there are several approaches to solving this: 1. glusterd initiated through GF_CBK_EVENT_NOTIFY or similar (on volume delete) 2. handling this in protocol/client:RPC_CLNT_CONNECT and client_handshake() 3. in the master xlators like fuse and gfapi Whatever is picked, any subsequent usage of the deleted volume should result in ESTALE errors. My current preference goes to doing this in protocol/client. What do others think?
Reproduced the bug using python-bindings. will be working on fix.
Created attachment 1314099 [details] sample patch file for review. I had the fix in gfapi code. The actual fix is in glfs-mgmt.c which has call to glfs_get_volume_info() which has callback that checks whether vol id is same and sets the errno . This fix can also be in xlator/protocol/client, we have to register new handshake protocol entry for fetching volume id. clnt_handshake_procs has to have entry GF_HNDSK_GET_VOLUME_INFO and repective call and callbacks. as adding entry to protocol would be sensitive, I made changes in gfapi code only.
REVIEW: https://review.gluster.org/18064 (gfapi: Fix for bug 1463191) posted (#1) for review on master by Anonymous Coward
https://review.gluster.org/#/c/18064/
REVIEW: https://review.gluster.org/18064 (Discard glfs object if volume is recreated) posted (#2) for review on master by Anonymous Coward
REVIEW: https://review.gluster.org/18064 (Discard glfs object if volume is recreated) posted (#3) for review on master by Anonymous Coward
REVIEW: https://review.gluster.org/18064 (gfapi: mark glfs object as bad if volume is re-created) posted (#4) for review on master by Anonymous Coward
steps to reproduce the issue: here is gfapi python program , volume name is "test2", hostname is "gluster2". host 1. in client machine >>> from gluster import gfapi >>> volume = gfapi.Volume('gluster2','test2') >>> volume.mount() >>> volume.mkdir('/dir1') >>> volume.mkdir('/dir2') host 2.in server machine, that is gluster2 lets delete the test2 volume on server side and re-create the volume gluster2 # gluster volume stop test2 gluster2 # gluster volume delete test2 gluster2 # rm -rf /storage/brick3/test2vol gluster2 # mkdir /storage/brick3/test2vol gluster2 # gluster volume create test2 gluster2:/storage/brick3/test2vol gluster2 # gluster volume start test2 host 1. come back to client machine, continue python program, we have volume object here. you will be able to create dir on old volume obj which should be discarded. >>> volume.listdir('/') [] >>> volume.mkdir('/dir3') >>> volume.listdir('/') ['dir3'] This shows that if volume is re-created with same name on server side, client program can access using old volume object. the fix to bug raises ENXIO err.
REVIEW: https://review.gluster.org/18064 (gfapi: mark glfs object as bad if volume is re-created) posted (#5) for review on master by Venkata Ramarao Edara (redara)
What's the next step here?
I had a chat with Prasanna over this bug. He said its not a very critical issue for now. I have kept this bug on the backseat for now. Will take up this bug once I am finished with https://bugzilla.redhat.com/show_bug.cgi?id=1763030.
(In reply to Vishal Pandey from comment #13) > I had a chat with Prasanna over this bug. He said its not a very critical > issue for now. I have kept this bug on the backseat for now. Will take up > this bug once I am finished with > https://bugzilla.redhat.com/show_bug.cgi?id=1763030. Why would you take it if it's low severity and priority? Can an intern look at it?
Yes, an intern can definitely take a look at it. But the only issue is if there is any gfapi folk avilable to help out in case of any issues he/she might face.
(In reply to INVALID USER from comment #15) > Yes, an intern can definitely take a look at it. But the only issue is if > there is any gfapi folk avilable to help out in case of any issues he/she > might face. No one took it, closing.