Created attachment 573597 [details] fuse mount log Description of problem: When one of the brink in a replicate volume goes offline during a write operation and comes back online , the brick which went offline forgets the previous file ownership and sets ownership of the file to 'root'. Since we honor the lowest UID , we self-heal the metadata from brick which has wrong ownership ('root' ownership) to brick which has correct ownership. Version-Release number of selected component (if applicable): 3.2.6 How reproducible: often Program used (testprogram):- ------------- #include <stdio.h> #include <unistd.h> int main(int argc, char *argv[]){ int c=0; FILE *fd; fd=fopen(argv[1],"w"); while(1) { fprintf(fd, "%i\n", ++c); fflush(fd); printf("%i\n",c); sleep(1); } } Steps to Reproduce: 1. Create a replicate volume. Start the volume 2. Create a fuse mount from client. 3. from the mount point execute: "testprogram ./helloworld.txt" 4. Bring down a brick 5. Bring back the brick Actual results: When Brick was offline:- ------------------------ Server1 :- --------- [03/29/12 - 17:19:34 root@APP-SERVER1 ~]# ls -l /export1/dstore1/test/ total 8 -rw-rw-r-- 1 220 qa 96 Mar 29 17:19 foo.txt Server2 -------- [03/29/12 - 17:16:43 root@APP-SERVER2 ~]# ls -l /export1/dstore1/test/ total 64 -rw-rw-r-- 1 220 qa 45 Mar 29 17:18 foo.txt Client ------- [qa@APP-CLIENT1 test]$ ls -l foo.txt -rw-rw-r-- 1 qa qa 292 Mar 29 17:20 foo.txt When Brick came online:- ----------------------- [03/29/12 - 17:19:44 root@APP-SERVER1 ~]# gluster volume start dstore force Starting volume dstore has been successful Server1:- ----------- [03/29/12 - 17:20:07 root@APP-SERVER1 ~]# ls -l /export1/dstore1/test/ total 8 -rw-rw-r-- 1 root root 273 Mar 29 17:20 foo.txt Server2:- --------- [03/29/12 - 17:19:50 root@APP-SERVER2 ~]# ls -l /export1/dstore1/test/ total 4 -rw-rw-r-- 1 root root 292 Mar 29 17:20 foo.txt Client:- ---------- [qa@APP-CLIENT1 test]$ ls -l foo.txt -rw-rw-r-- 1 root root 292 Mar 29 17:20 foo.txt Expected results: The ownership of the file shouldn't be changed to 'root' Additional info:
This bug appears because protocol_client_reopen sends reopens with the same flags/uid/gid at the time of open from the application. ➜ ~pranithk ls -l /gfs/r2_? /gfs/r2_0: total 28 -rwxr-xr-x. 1 root root 7240 Jun 11 18:47 a.out -rw-r--r--. 1 pranithk pranithk 400 Jun 11 18:50 h.txt -rw-r--r--. 1 root root 249 Jun 11 18:47 infinitewrite.c (gdb) 679 ret = inode_path (inode, NULL, &path); (gdb) p fdctx->flags & O_CREAT $4 = 64 (gdb) c Continuing. Breakpoint 2, client3_1_reopen_cbk (req=0x7f831b296710, iov=0x7f831b296750, count=1, myframe=0x7f83222cf978) at client-handshake.c:395 395 int32_t ret = -1; (gdb) c Continuing. after it hits the breakpoint: ➜ ~pranithk ls -l /gfs/r2_? /gfs/r2_0: total 28 -rwxr-xr-x. 1 root root 7240 Jun 11 18:47 a.out -rw-r--r--. 1 root root 472 Jun 11 18:50 h.txt <<---- permissions changed -rw-r--r--. 1 root root 249 Jun 11 18:47 infinitewrite.c posix_open does a chown if the fdctx->flags has O_CREAT in 3.2.x This bug does not appear on 3.3 because the chown part in posix_open does not exist anymore. Assigning it to protocol/client to take appropriate action.
Created attachment 592682 [details] Test cases for re-open of files This is a go program. use go run <go-prog> hname <username-other-than-root> to run
Fix available in 3.3.