Bug 807976 - losing file ownership in replicated volume when one of the brick comes online
Summary: losing file ownership in replicated volume when one of the brick comes online
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.2.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Junaid
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-29 09:25 UTC by Shwetha Panduranga
Modified: 2013-08-06 22:38 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-10-31 07:02:30 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
fuse mount log (90.74 KB, text/x-log)
2012-03-29 09:25 UTC, Shwetha Panduranga
no flags Details
Test cases for re-open of files (5.04 KB, text/x-go)
2012-06-18 14:59 UTC, Pranith Kumar K
no flags Details

Description Shwetha Panduranga 2012-03-29 09:25:14 UTC
Created attachment 573597 [details]
fuse mount log

Description of problem:
When one of the brink in a replicate volume goes offline during a write operation and comes back online , the brick which went offline forgets the previous file ownership and sets ownership of the file to 'root'. 

Since we honor the lowest UID , we self-heal the metadata from brick which has wrong ownership ('root' ownership) to brick which has correct ownership. 

Version-Release number of selected component (if applicable):
3.2.6

How reproducible:
often

Program used (testprogram):-
-------------
#include <stdio.h>
#include <unistd.h>

int main(int argc, char *argv[]){
    int c=0;
    FILE *fd;
    fd=fopen(argv[1],"w");
    while(1) {
        fprintf(fd, "%i\n", ++c);
        fflush(fd);
        printf("%i\n",c);
        sleep(1);
    }
}
Steps to Reproduce:
1. Create a replicate volume. Start the volume
2. Create a fuse mount from client. 
3. from the mount point execute: "testprogram ./helloworld.txt"
4. Bring down a brick
5. Bring back the brick

Actual results:

When Brick was offline:-
------------------------
Server1 :-
---------
[03/29/12 - 17:19:34 root@APP-SERVER1 ~]# ls -l /export1/dstore1/test/
total 8
-rw-rw-r-- 1 220 qa 96 Mar 29 17:19 foo.txt

Server2 
--------
[03/29/12 - 17:16:43 root@APP-SERVER2 ~]# ls -l /export1/dstore1/test/
total 64
-rw-rw-r-- 1 220 qa 45 Mar 29 17:18 foo.txt

Client 
-------
[qa@APP-CLIENT1 test]$ ls -l foo.txt 
-rw-rw-r-- 1 qa qa 292 Mar 29 17:20 foo.txt


When Brick came online:-
-----------------------
[03/29/12 - 17:19:44 root@APP-SERVER1 ~]# gluster volume start dstore force
Starting volume dstore has been successful

Server1:-
-----------
[03/29/12 - 17:20:07 root@APP-SERVER1 ~]# ls -l /export1/dstore1/test/
total 8
-rw-rw-r-- 1 root root 273 Mar 29 17:20 foo.txt

Server2:-
---------
[03/29/12 - 17:19:50 root@APP-SERVER2 ~]# ls -l /export1/dstore1/test/
total 4
-rw-rw-r-- 1 root root 292 Mar 29 17:20 foo.txt

Client:-
----------
[qa@APP-CLIENT1 test]$ ls -l foo.txt 
-rw-rw-r-- 1 root root 292 Mar 29 17:20 foo.txt

Expected results:
The ownership of the file shouldn't be changed to 'root'

Additional info:

Comment 1 Pranith Kumar K 2012-06-11 13:28:18 UTC
This bug appears because protocol_client_reopen sends reopens with the same flags/uid/gid at the time of open from the application.

➜  ~pranithk  ls -l /gfs/r2_?
/gfs/r2_0:
total 28
-rwxr-xr-x. 1 root     root     7240 Jun 11 18:47 a.out
-rw-r--r--. 1 pranithk pranithk  400 Jun 11 18:50 h.txt
-rw-r--r--. 1 root     root      249 Jun 11 18:47 infinitewrite.c

(gdb) 
679	        ret = inode_path (inode, NULL, &path);
(gdb) p fdctx->flags & O_CREAT
$4 = 64
(gdb) c
Continuing.

Breakpoint 2, client3_1_reopen_cbk (req=0x7f831b296710, iov=0x7f831b296750, count=1, myframe=0x7f83222cf978) at client-handshake.c:395
395	        int32_t        ret                   = -1;
(gdb) c
Continuing.

after it hits the breakpoint:
➜  ~pranithk  ls -l /gfs/r2_?
/gfs/r2_0:
total 28
-rwxr-xr-x. 1 root root 7240 Jun 11 18:47 a.out
-rw-r--r--. 1 root root  472 Jun 11 18:50 h.txt <<---- permissions changed
-rw-r--r--. 1 root root  249 Jun 11 18:47 infinitewrite.c

posix_open does a chown if the fdctx->flags has O_CREAT in 3.2.x

This bug does not appear on 3.3 because the chown part in posix_open does not exist anymore.

Assigning it to protocol/client to take appropriate action.

Comment 2 Pranith Kumar K 2012-06-18 14:59:07 UTC
Created attachment 592682 [details]
Test cases for re-open of files

This is a go program. use go run <go-prog> hname <username-other-than-root> to run

Comment 3 Vijay Bellur 2012-10-31 07:02:30 UTC
Fix available in 3.3.


Note You need to log in before you can comment on or make changes to this bug.