Description of problem: iozone -a fails for pnfs mount throwing read: Bad file descriptor error. #iozone -a Auto Mode Command line used: iozone -a Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bkwd record stride KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 64 4 576282 1392258 Error reading block 0 66700000 read: Bad file descriptor Version-Release number of selected component (if applicable): mainline How reproducible: almost (90%) Steps to Reproduce: 1. Create a volume 2. Export via nfs-ganesha 3. Mount using pnfs(version = 4.1) 4. run iozone -a on the mount point Actual results: failed throwing bad file-descriptor error Expected results: should succeed Additional info: Only iozone -a (automated mode)fails No crashes seen in bricks or nfs-ganesha But when iozone run in individually (from test i = 0 to 8) all the test passes successfully. It seems to be timing issue for performing the cache invalidation on the M.D.S
There are no notable errors seen in ganesha,gfapi and brick logs. From the packet trace it seems the request is not even send to servers. I am attaching strace output and nfs-client log
Created attachment 1068901 [details] strace_output
Created attachment 1068902 [details] nfs-client log
On further debugging , i just confirmed it is a known timing issue. The pNFS cluster consists of M.D.S and D.S. The file will be opened from the M.D.S and all the I/O's goes to the corresponding D.Ses. So after performing the write operation, UPCALL infrastructure should invalidate the context of M.D.S(otherwise M.D.S should be unknown of the modification happened). In the case of "iozone -a" ,after performing the write operation , then client send open call for next read operation(). This open reaches M.D.S before the upcall notification.So M.D.S replies back with old information from its cache(In getattr call as part of the Compound Open Call, the size of the file seems to be 0 instead of 64k) and thus iozone -a is interpreted.
If this is still an issue please open an issue in the github tracker at https://github.com/nfs-ganesha/nfs-ganesha/issues