Bug 765261 (GLUSTER-3529) - Clients mounting replicated FS via NFS return input / output error on some files
Summary: Clients mounting replicated FS via NFS return input / output error on some files
Keywords:
Status: CLOSED WONTFIX
Alias: GLUSTER-3529
Product: GlusterFS
Classification: Community
Component: access-control
Version: 3.1.2
Hardware: x86_64
OS: Linux
medium
low
Target Milestone: ---
Assignee: shishir gowda
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-09-08 15:55 UTC by Jonathan Barber
Modified: 2013-12-09 01:26 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: nfs
Documentation: DP
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Jonathan Barber 2011-09-08 15:55:29 UTC
I have two nodes running RHEL 5u4 x86_64 and glusterfs 3.2.3-1 from the RPMs here:
http://download.gluster.com/pub/gluster/glusterfs/3.1/3.1.2/RHEL/

They have a replicated GlusterFS created with:
gluster volume create TEST replica 2 transport tcp server1:/exp server2:/exp

This volume is exported via NFS to two clients (RHEL 5u2 i386). Each client mounts one of the servers via autofs with the mount options:
-rw,soft,intr,rsize=8192,wsize=8192

i.e. the setup is like this:
client1 <---NFS---> server1 <---GlusterFS---> server2 <---NFS---> client2

When I create files on client under the NFS mount with the command:
for i in {1..1000}; do
  sudo -u user1 dd if=/dev/urandom of=/repository/info150/test$RANDOM.vox bs=1024 count=36
done

where user1 is:
uid=501(user1) gid=10000(group1) groups=10000(group1),11000(group2)

and then read the files on the client2 with the command:
cat /repository/info150/test*.vox

with the user user2:
uid=500(user2) gid=10000(group1) groups=10000(group1),11000(group2)

I get errors with some files (not all files):
cat: /repository/info150/test6129.vox: Input/output error

Looking at a TCP dump of the NFS exchange (using wireshark), the error returned by the NFS server is NFS3ERR_PERM.

Setting the log-level to DEBUG with:
gluster volume set TEST diagnostics.brick-log-level TRACE

Generates the following output in /var/log/glusterfs/bricks/exp.log:
[2011-09-08 11:31:44.618975] T [rpcsvc.c:958:rpcsvc_handle_rpc_call] rpcsvc: Client port: 1023
[2011-09-08 11:31:44.619074] T [rpcsvc-auth.c:276:rpcsvc_auth_request_init] rpc-service: Auth handler: AUTH_GLUSTERFS
[2011-09-08 11:31:44.619090] T [rpcsvc.c:887:rpcsvc_request_create] rpc-service: recieved rpc-message (XID: 0x14a0, Ver: 2, Program: 1298437, ProgVers: 310, Proc: 11) from rpc-transport (tcp.TEST-server)
[2011-09-08 11:31:44.619106] T [auth-glusterfs.c:185:auth_glusterfs_authenticate] rpc-service: Auth Info: pid: 0, uid: 500, gid: 10000, owner: 12343
[2011-09-08 11:31:44.619121] T [rpcsvc.c:723:rpcsvc_program_actor] rpc-service: Actor found: GlusterFS-3.1.0 - OPEN
[2011-09-08 11:31:44.619170] T [server-resolve.c:127:resolve_loc_touchup] TEST-server: return value inode_path 21
[2011-09-08 11:31:44.619240] T [access-control.c:210:ac_test_access] access-control: Testing owner access
[2011-09-08 11:31:44.619276] T [access-control.c:220:ac_test_access] access-control: Testing group access
[2011-09-08 11:31:44.619296] T [access-control.c:231:ac_test_access] access-control: Testing other access
[2011-09-08 11:31:44.619315] T [access-control.c:239:ac_test_access] access-control: No access allowed
[2011-09-08 11:31:44.619349] D [server3_1-fops.c:1283:server_open_cbk] TEST-server: 5280: OPEN /info150/test6129.vox (49238) ==> -1 (Operation not permitted)
[2011-09-08 11:31:44.619398] T [rpcsvc.c:1516:rpcsvc_submit_generic] rpc-service: Tx message: 16
[2011-09-08 11:31:44.619417] T [rpcsvc.c:1151:rpcsvc_record_build_header] rpc-service: Reply fraglen 40, payload: 16, rpc hdr: 24
[2011-09-08 11:31:44.619463] T [rpcsvc.c:1555:rpcsvc_submit_generic] rpc-service: submitted reply for rpc-message (XID: 0x5280x, Program: GlusterFS-3.1.0, ProgVers: 310, Proc: 11) to rpc-transport (tcp.TEST-server)

After reading the file as root on client2, user2 can read the file without generating errors.

Upgrading to the 3.2.3-1 RPMs fixes this problem, but I wanted to document this in case anyone else had this problem.

Comment 1 shishir gowda 2011-09-13 01:01:24 UTC
With the Posix ACL support introduced in 3.2.2 release, the issue seems to be fixed. Can have the bug documented in releases.


Note You need to log in before you can comment on or make changes to this bug.