Description of problem: When the host filesystem containing the glusterd workdir files (e.g. /var/lib/glusterd/) is full, this can lead to situations where files such as glusterd.info or peers/* etc are replaced with empty zero-length files. See Bug 1207534 and [1] for another example of this. I experienced this with glusterfs-server 3.4.2-1ubuntu1 running on Ubuntu 14.04. Digging through the related code, I suspect this may be caused by the use of feof() in following error libglusterfs/src/store.c gf_store_save_value() code path which is intended to handle exactly these kinds of write errors: ret = fprintf (fp, "%s=%s\n", key, value); if (ret < 0) { ... } ret = fflush (fp); if (feof (fp)) { ... } ret = 0; This code appears to be present in both glusterfs 3.4.2 and mainline git. Based on my understanding, an error condition such as ENOSPC in fflush() will NOT cause feof() to return nonzero. Steps to Reproduce: Using the following example test program: int main (int argc, char **argv) { int ret; const char *path = argv[1]; int fd = open(path, O_RDWR | O_CREAT | O_TRUNC, 0600); FILE *fp = fdopen(fd, "a+"); ret = fprintf(fp, "foo=bar\n"); fprintf(stderr, "fprintf = %d: %s\n", ret, strerror(errno)); ret = fflush(fp); fprintf(stderr, "fflush = %d: %s\n", ret, strerror(errno)); ret = feof(fp); fprintf(stderr, "feof = %d\n", ret); fclose(fp); close(fd); return ret; } Running this on a "full" ext4 filesystem: $ touch mnt/test/test2 && ls -l mnt/test/test2 -rw-r--r-- 1 xxx staff 0 Jun 23 15:52 mnt/test/test2 $ echo foo > mnt/test/test2 -bash: echo: write error: No space left on device $ rm mnt/test/test2 Gives the following: $ ./test mnt/test/test2 fprintf = 8: Success fflush = -1: No space left on device feof = 0 Actual results: I believe that this behavior of fflush()/feof() would cause gf_store_save_value() to ignore the write error, causing the store file to be replaced by an empty file as per the bug symptoms. Expected results: gf_store_save_value() should handle the -1 return code from fflush() by returning an error, causing the tmpfile to be unlinked and the existing file to be left intact. [1] https://www.mail-archive.com/users@ovirt.org/msg25215.html
Additional clarification that the test case indeed creates an empty file on disk: $ cat mnt/test/test2 $ ls -l mnt/test/test2 -rw-r--r-- 1 xxx staff 0 Jun 23 15:53 mnt/test/test2
I'm moving this to the 'mainline' version for now. 3.4.x is EOL and will not get any updates anymore.
This is presumeably a dup of #1253148, based on the committed fix. Haven't tested it.
*** This bug has been marked as a duplicate of bug 1253148 ***