Bug 765261 (GLUSTER-3529)

Summary: Clients mounting replicated FS via NFS return input / output error on some files
Product: [Community] GlusterFS Reporter: Jonathan Barber <jonathan.barber>
Component: access-controlAssignee: shishir gowda <sgowda>
Status: CLOSED WONTFIX QA Contact:
Severity: low Docs Contact:
Priority: medium    
Version: 3.1.2CC: gluster-bugs, nsathyan
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: nfs
Documentation: DP CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jonathan Barber 2011-09-08 15:55:29 UTC
I have two nodes running RHEL 5u4 x86_64 and glusterfs 3.2.3-1 from the RPMs here:
http://download.gluster.com/pub/gluster/glusterfs/3.1/3.1.2/RHEL/

They have a replicated GlusterFS created with:
gluster volume create TEST replica 2 transport tcp server1:/exp server2:/exp

This volume is exported via NFS to two clients (RHEL 5u2 i386). Each client mounts one of the servers via autofs with the mount options:
-rw,soft,intr,rsize=8192,wsize=8192

i.e. the setup is like this:
client1 <---NFS---> server1 <---GlusterFS---> server2 <---NFS---> client2

When I create files on client under the NFS mount with the command:
for i in {1..1000}; do
  sudo -u user1 dd if=/dev/urandom of=/repository/info150/test$RANDOM.vox bs=1024 count=36
done

where user1 is:
uid=501(user1) gid=10000(group1) groups=10000(group1),11000(group2)

and then read the files on the client2 with the command:
cat /repository/info150/test*.vox

with the user user2:
uid=500(user2) gid=10000(group1) groups=10000(group1),11000(group2)

I get errors with some files (not all files):
cat: /repository/info150/test6129.vox: Input/output error

Looking at a TCP dump of the NFS exchange (using wireshark), the error returned by the NFS server is NFS3ERR_PERM.

Setting the log-level to DEBUG with:
gluster volume set TEST diagnostics.brick-log-level TRACE

Generates the following output in /var/log/glusterfs/bricks/exp.log:
[2011-09-08 11:31:44.618975] T [rpcsvc.c:958:rpcsvc_handle_rpc_call] rpcsvc: Client port: 1023
[2011-09-08 11:31:44.619074] T [rpcsvc-auth.c:276:rpcsvc_auth_request_init] rpc-service: Auth handler: AUTH_GLUSTERFS
[2011-09-08 11:31:44.619090] T [rpcsvc.c:887:rpcsvc_request_create] rpc-service: recieved rpc-message (XID: 0x14a0, Ver: 2, Program: 1298437, ProgVers: 310, Proc: 11) from rpc-transport (tcp.TEST-server)
[2011-09-08 11:31:44.619106] T [auth-glusterfs.c:185:auth_glusterfs_authenticate] rpc-service: Auth Info: pid: 0, uid: 500, gid: 10000, owner: 12343
[2011-09-08 11:31:44.619121] T [rpcsvc.c:723:rpcsvc_program_actor] rpc-service: Actor found: GlusterFS-3.1.0 - OPEN
[2011-09-08 11:31:44.619170] T [server-resolve.c:127:resolve_loc_touchup] TEST-server: return value inode_path 21
[2011-09-08 11:31:44.619240] T [access-control.c:210:ac_test_access] access-control: Testing owner access
[2011-09-08 11:31:44.619276] T [access-control.c:220:ac_test_access] access-control: Testing group access
[2011-09-08 11:31:44.619296] T [access-control.c:231:ac_test_access] access-control: Testing other access
[2011-09-08 11:31:44.619315] T [access-control.c:239:ac_test_access] access-control: No access allowed
[2011-09-08 11:31:44.619349] D [server3_1-fops.c:1283:server_open_cbk] TEST-server: 5280: OPEN /info150/test6129.vox (49238) ==> -1 (Operation not permitted)
[2011-09-08 11:31:44.619398] T [rpcsvc.c:1516:rpcsvc_submit_generic] rpc-service: Tx message: 16
[2011-09-08 11:31:44.619417] T [rpcsvc.c:1151:rpcsvc_record_build_header] rpc-service: Reply fraglen 40, payload: 16, rpc hdr: 24
[2011-09-08 11:31:44.619463] T [rpcsvc.c:1555:rpcsvc_submit_generic] rpc-service: submitted reply for rpc-message (XID: 0x5280x, Program: GlusterFS-3.1.0, ProgVers: 310, Proc: 11) to rpc-transport (tcp.TEST-server)

After reading the file as root on client2, user2 can read the file without generating errors.

Upgrading to the 3.2.3-1 RPMs fixes this problem, but I wanted to document this in case anyone else had this problem.

Comment 1 shishir gowda 2011-09-13 01:01:24 UTC
With the Posix ACL support introduced in 3.2.2 release, the issue seems to be fixed. Can have the bug documented in releases.