Hide Forgot
As of now, libglusterfsclient waits till CHILD_UP event is received before sending a lookup on '/'. This lookup call is necessary to have the layout for '/' constructed in distribute. But distribute sends CHILD_UP event when the first child comes up. It does not wait till all the children are up. If a lookup is sent now, only partial layout of '/' corresponding to those children of distribute that were up during lookup call is constructed. This results in failure of operations for those files whose hash values falls in the range of hash values not constructed in the layout. The size of the time window during the operations fail is from the first lookup on '/' to the first revalidate on '/'. On revalidate the layout is reconstructed properly. As of now, since there is no mechanism to detect whether all the children of distribute are up, as a temporary workaround glusterfs_init waits for 100 milliseconds before sending first lookup.
PATCH: http://patches.gluster.com/patch/1295 in master (libglusterfsclient: Wait for time ample enough for all the children of distribute to initialize before sending lookup on '/'.)
PATCH: http://patches.gluster.com/patch/1296 in release-2.0 (libglusterfsclient: Wait for time ample enough for all the children of distribute to initialize before sending lookup on '/'.)