File size initially wrong after a replica volumes image failure and repair. During testing, we noticed a strange situation where the files data was correct, but the size indicated was incorrect. But later, without any writes -- only an 'ls' or 'dir' command, the file's size then becomes correct. This can be reproduced by: 1) Start with GlusterFS replica volume duplex. 2) Generate a test file of a specific size 'N'. Give plenty of time for the file I/O to complete and be flushed. 3) Shut down one of the GlusterFS nodes such that the replica volume becomes simplex. 4) Verify that the file is still there and good and has size 'N', both by an 'ls' or 'dir' command, and by reading the entire file. 5) Regenerate the same test file of a specific size not equal to 'N', say 'N * 2'. Again, give plenty of time for the file I/O to complete and be flushed. 6) Allow the failed GlusterFS node back up so that the GlusterFS replica volume becomes duplex. Do not start the forced healing with "find ... stat ..." yet. 7) Re-verify the file by reading its contents fully before doing an 'ls' or 'dir' command on it. The file's data will be correct, but it will appear to have the size 'N', not 'N * 2'. 8) Do an 'ls' or 'dir' and see that the size is shown as 'N * 2'. 9) Re-verify the same file and now see that both the file data and size are correct, 'N * 2'. ===== This was first tested using GlusterFS 3.2.3, but was retested with 3.2.5 -- the newer version made no difference. The only thing that seems to avoid this problem is setting 'performance.stat-prefetch' to 'off', it defaults to 'on'. It seems like there might be a problem here with a files cached metadata that could result in some strange and transient problems. Is this a known issue that is documented somewhere that I missed? I would rather be using the GlusterFS volume default options, but for now only trust 'performance.stat-prefetch' being set to 'off', despite the expected performance loss. ~ Jeff Byers ~
hi Jeff, Thanks for logging the bug. I have doubts in step 6,7 in the steps you provided. How much time did you wait after step 6 is completed?. In step-7 what command are you using to read the file and how are you verifying its size to be N and not N*2?. Do you have a script that tests this. Pranith
When the full test is running, the client is writing and reading files continuously, writing the bunch, then verifying the bunch. When reproducing manually, it is human time in seconds, or minutes. The commands that generate the file and verify the file are home grown. The generator takes a size and file name, writes a header and fills in the rest with a data pattern. The verifier takes the name, and from the file header determines the expected size, reads and verifies the data pattern in 64K chunks, and verifies that the file is the correct size by seeing EOF when it should. Note that the verifier program does not use 'stat' information for the file size, the file contents itself determines the size. There are test scripts and programs, but they are tied into a system and might be difficult to extract them from. The file generator and verifier may be free standing enough though. ~ Jeff Byers ~
Hi Jeff, From what I understand about your verifier, it gets the size the generator wrote to the header of the file and then verifies that the data of given size is present and correct. Is this correct? But this still doesn't clarify Pranith's doubt about how size is N not N*2. If the verifier says data is alright, then the size it reads form the header must be N*2. Where is the size shown as N? Also, from where are you verifying the files. From the mountpoint or the bricks? If it is possible, can you also provide the file generator & verifier. Kaushal
(In reply to comment #3) > Hi Jeff, > > From what I understand about your verifier, it gets the size the generator > wrote to the header of the file and then verifies that the data of given size > is present and correct. Is this correct? > But this still doesn't clarify Pranith's doubt about how size is N not N*2. If > the verifier says data is alright, then the size it reads form the header must > be N*2. Where is the size shown as N? > Also, from where are you verifying the files. From the mountpoint or the > bricks? > > If it is possible, can you also provide the file generator & verifier. > > Kaushal Hi Jeff, From what I understand about your verifier, it gets the size the generator wrote to the header of the file and then verifies that the data of given size is present and correct. Is this correct? Jeff: Yes But this still doesn't clarify Pranith's doubt about how size is N not N*2. If the verifier says data is alright, then the size it reads form the header must be N*2. Where is the size shown as N? Jeff: I wasnt clear enough. The data was correct up to N, but EOF was reached on the verify read 64KB loop before the N*2 worth of reads were done. But if one does a "dir", or waits a while, and the verifier is rerun, all of the data is then there. In our test scripts, we went so far as to put a retry loop, and halfway through the verify retries, did the "dir" which allowed it to then pass. This was done until we determined that 'performance.stat-prefetch=off' would resolve the problem completely. Also, from where are you verifying the files. From the mount-point or the bricks? Jeff: All of the I/O is being done from the mount-point of the volumes. Our configuration is 2-way GlusterFS replica volume, with local GlusterFS clients and Samba arbitrated by CTDB, with the host being Windows using CIFS. If it is possible, can you also provide the file generator & verifier. We are looking at that, and are going to try to simplify the environment enough that we could send something over to you. ~ Jeff Byers ~
Jeff, can you check if the issue still exists with 3.3.0 release?
Can you point me to the change list, or where in the source the fix was implemented so that, when I have time, I can add logging to make sure that the code path with the fix was executed? Thanks.
Amar: Can you answer comment 6, please?
> We are looking at that, and are going to try to > simplify the environment enough that we could send > something over to you. > > ~ Jeff Byers ~ Jeff, Could you let us know if you have a simpler test case that can re-create the issue? Pranith
(In reply to comment #8) > > We are looking at that, and are going to try to > > simplify the environment enough that we could send > > something over to you. > > > > ~ Jeff Byers ~ > > Jeff, > Could you let us know if you have a simpler test case that can > re-create the issue? > > Pranith We believe that we provided some simplified test scripts and instructions on how to use them to you guys about a year ago. The test engineer remembers getting authorization to do this, and putting something together. Unfortunately, it has been too long, and cannot find any local copies of what we had sent over to you. Do you remember communicating with, and/or receiving anything from our David McBride? We do have our original programs and scripts of course, but the work to make them externally useful would need to be repeated. What is the likelihood of any use being make of any test scripts and programs we would provide now?
The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug. If there has been no update before 9 December 2014, this bug will get automatocally closed.