Created attachment 700807 [details] Vol file Description of problem: After add-brick and rebalance fix-layout many files are inaccessible from clients (invalid argument). Client log: [2013-02-21 12:48:53.121691] I [afr-self-heal-entry.c:2333:afr_sh_entry_fix] 0-mirror-replicate-2: [...]/140.ACQ: Performing conservative merge [2013-02-21 12:48:54.100924] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-mirror-client-5: remote operation failed: Permission denied. Path: [...]/140.ACQ/1.3.12.2.1107.5.2.32.35052.2011011711023368433700039.IMA (ddef2b1a-b3cc-424c-a663-995bb77cd4c4) [2013-02-21 12:48:54.101005] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-mirror-client-4: remote operation failed: Permission denied. Path: [...]/140.ACQ/1.3.12.2.1107.5.2.32.35052.2011011711023368433700039.IMA (ddef2b1a-b3cc-424c-a663-995bb77cd4c4) [2013-02-21 13:20:31.971211] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 1360169: OPEN() [...]/140.ACQ/1.3.12.2.1107.5.2.32.35052.2011011711023368433700039.IMA => -1 (Invalid argument) Additional info (xattrs for the directory on all bricks and file (where it exists): ogawa:/raid/mirror2.../140.ACQ/: no link, no file trusted.afr.mirror-client-4=0x000000000000000000000000 trusted.afr.mirror-client-5=0x000000000000000000000000 trusted.gfid=0x94cd9acd1a0e4bdc88fb400a1176c830 trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9 ogawa:/raid/mirror.../140.ACQ/: no link, no file trusted.gfid=0x94cd9acd1a0e4bdc88fb400a1176c830 trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff mansfield:/raid/mirror.../140.ACQ/: File exists trusted.gfid=0x94cd9acd1a0e4bdc88fb400a1176c830 trusted.glusterfs.dht=0x00000001000000000000000055555554 file (.../140.ACQ/1.3.12.2.1107.5.2.32.35052.2011011711023368433700039.IMA): trusted.afr.mirror-client-0=0x000000000000000000000000 trusted.afr.mirror-client-1=0x000000000000000000000000 trusted.gfid=0xddef2b1ab3cc424ca663995bb77cd4c4 rabi:/raid/mirror.../140.ACQ/: no file, no link trusted.gfid=0x94cd9acd1a0e4bdc88fb400a1176c830 trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff rabi:/raitd/mirror2.../140.ACQ/: no file, no link trusted.afr.mirror-client-4=0x000000000000000000000000 trusted.afr.mirror-client-5=0x000000000000000000000000 trusted.gfid=0x94cd9acd1a0e4bdc88fb400a1176c830 trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9 lauterbur:/raid/mirror/.../140.ACQ/: File exists trusted.gfid=0x94cd9acd1a0e4bdc88fb400a1176c830 trusted.glusterfs.dht=0x00000001000000000000000055555554 file (.../140.ACQ/1.3.12.2.1107.5.2.32.35052.2011011711023368433700039.IMA): trusted.afr.mirror-client-0=0x000000000000000000000000 trusted.afr.mirror-client-1=0x000000000000000000000000 trusted.gfid=0xddef2b1ab3cc424ca663995bb77cd4c4
Reading the effected file (1.3.12.2.1107.5.2.32.35052.2011011711023368433700039.IMA) on the fuse mount on any of the brick-servers is successful... any afterward the file is accessible on the clients as normal. I am looking for a solution that doesn't mean I need to find /gluster-fuse/mnt -type f -exec cat {} > /dev/null \;
Fuse Client stat of parent dir: File: ‘140.ACQ’ Size: 8198 Blocks: 32 IO Block: 131072 directory Device: 27h/39d Inode: 9870553420299421744 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-01-23 14:08:49.667947000 -0500 Modify: 2013-01-23 14:08:49.667947000 -0500 Change: 2013-02-21 15:15:16.890864712 -0500 Birth: - Stat of dir on failing (permission denied) bricks: ogawa:/raid/mirror2 (mirror-client-5) File: `.' Size: 6 Blocks: 8 IO Block: 4096 directory Device: fd00h/64768d Inode: 9674807 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:file_t:s0 Access: 2013-01-23 14:08:49.667947000 -0500 Modify: 2013-02-21 13:26:11.021860626 -0500 Change: 2013-02-21 15:16:36.835845611 -0500 Birth: - rabi:/raid/mirror2 (mirror-client-4) File: `.' Size: 6 Blocks: 8 IO Block: 4096 directory Device: fd01h/64769d Inode: 9674807 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:file_t:s0 Access: 2013-01-23 14:08:49.667947000 -0500 Modify: 2013-01-23 14:08:49.667947000 -0500 Change: 2013-02-21 15:15:20.557943450 -0500 Birth: - Stat of directory on remaining bricks: ogawa:/raid/mirror (mirror-client-2) File: `.' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: fd00h/64768d Inode: 55051086 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2012-02-17 10:25:19.000000000 -0500 Modify: 2012-02-17 10:25:19.000000000 -0500 Change: 2013-02-19 20:53:29.211653885 -0500 Birth: - rabi:/raid/mirror (mirror-client-3) File: `.' Size: 4096 Blocks: 16 IO Block: 4096 directory Device: fd00h/64768d Inode: 143917118 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:file_t:s0 Access: 2012-02-17 10:25:19.000000000 -0500 Modify: 2012-02-17 10:25:19.000000000 -0500 Change: 2013-02-19 20:53:31.227207773 -0500 Birth: - mansfield:/raid/mirror (mirror-client-1) File: `.' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: fd00h/64768d Inode: 237376498 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2012-02-17 10:25:19.000000000 -0500 Modify: 2012-02-17 10:25:19.000000000 -0500 Change: 2013-02-19 20:53:27.470359001 -0500 Birth: - lauterbur:/raid/mirror (mirror-client-0) File: `.' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: fd00h/64768d Inode: 55051086 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2012-02-17 10:25:19.000000000 -0500 Modify: 2012-02-17 10:25:19.000000000 -0500 Change: 2013-02-19 20:53:29.211653885 -0500 Birth: -
ogawa:/raid/mirror2 and rabi:/raid/mirror2 are the two new bricks... the one's with the "permission denied" errors in mknod. They are XFS where as the older bricks are ext4. Selinux is disabled and there are no ACLs in place on the bricks.
After reading the file on a brick-server's fuse mount; I find that link files are created on the new bricks that point to the correct replica... and everything works (from all clients). Why would a dht miss from one fuse client "do the right thing" (brick-server), while a dht miss from another client (any not serving a brick) get permission denied error?
The successful heal operation on the brick-server looks like this in the fuse-client log: [2013-02-22 05:20:52.932596] I [afr-common.c:1340:afr_launch_self_heal] 0-mirror-replicate-1: background meta-data self-heal triggered. path: [...]/140.ACQ/1.3.12.2.1107.5.2.32.35052.2011011711023368433700039.IMA, reason: lookup detected pending operations [2013-02-22 05:20:52.941263] I [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-mirror-replicate-1: background meta-data self-heal completed on [...]/140.ACQ/1.3.12.2.1107.5.2.32.35052.2011011711023368433700039.IMA Compare to failing operation on non-brick-server fuse mount above.
On the servers hosting the 'new' bricks, the self-heal produces: [2013-02-21 16:50:44.833730] I [dht-common.c:954:dht_lookup_everywhere_cbk] 0-mirror-dht: deleting stale linkfile [...]/140.ACQ/1.3.12.2.1107.5.2.13.20522.4.0.15670664232227.IMA on mirror-replicate-1
I think I am seeing the same bug, which affects a significant proportion of paths during and after fix-layout operations. Access to affected files and directories by unprivileged users results in either "Invalid argument" or "Input/Output error". Users with write access to the parent directory do not experience any problems. In some cases "ls -l" fails with "Input/Output error", and in other cases "ls -l" works but attempting to open the file for reading results in "Invalid argument". Here is the most recent example I have found, where I am attempting to open a file owned by user qw901080 as user sms05dab, and the parent directory is owned by user qw901080. [sms05dab@perseus netcdf-4.0]$ /packages/netcdf/netcdf-4.0/bin/ncdump -h /glusterfs/nemo/users/mvc/WORK/ORCA025_ECMWF/ORCA025-R07-MEAN/Exp4/1989/ORCA025-R07_y1989m02_gridT.nc /packages/netcdf/netcdf-4.0/bin/ncdump: /glusterfs/nemo/users/mvc/WORK/ORCA025_ECMWF/ORCA025-R07-MEAN/Exp4/1989/ORCA025-R07_y1989m02_gridT.nc: Invalid argument [sms05dab@perseus netcdf-4.0]$ ls -l /glusterfs/nemo/users/mvc/WORK/ORCA025_ECMWF/ORCA025-R07-MEAN/Exp4/1989/ORCA025-R07_y1989m02_gridT.nc-rw-r--r-- 1 qw901080 nemo 618364484 Jul 15 2011 /glusterfs/nemo/users/mvc/WORK/ORCA025_ECMWF/ORCA025-R07-MEAN/Exp4/1989/ORCA025-R07_y1989m02_gridT.nc [sms05dab@perseus netcdf-4.0]$ ls -ld /glusterfs/nemo/users/mvc/WORK/ORCA025_ECMWF/ORCA025-R07-MEAN/Exp4/1989 drwxr-xr-x 2 qw901080 nemo 98322 Feb 20 22:19 /glusterfs/nemo/users/mvc/WORK/ORCA025_ECMWF/ORCA025-R07-MEAN/Exp4/1989 The "Invalid argument" error occurred when trying to open the file via NFS and native GlusterFS client, both on a compute server and on a GlusterFS storage server. The ncdump utility required read access to the file, which should have been possible given that the file was world readable. Changing the permissions of the parent directory to 775 gave user sms05dab (a member of the nemo group) write access to the file and the parent directory, and this allowed the file to be opened for reading by the ncdump utility.
https://bugzilla.redhat.com/show_bug.cgi?id=884597 Extrapolating from the above: I suspect that the UID:GID from the frame doesn't have permission to unlink/create the stale/nonexistant linkfile. When I access the problematic files as superuser, the links are established correctly.
For others experiencing this problem, you can correct the problem (create/update the link files) by open all the files as root: find /mnt/fuse -type f -exec sh -c "< {}" \; This takes quite a while if you have many files in your volume (as I do).
Hi Shawn, That is correct, the bug 884597 should fix your issue. The reason you are seeing failures is: 1. File: foo has x:uid y:gid 2. After a add-brick followed by afix-layout/rebalance, the hash subvolume of the file foo changes. 3. Subsequent access to these files end up creating linkto files on the hashed_subvol. 4. The creation happens with uid/gid of the application which is accessing the files. 4. If the Parent of the File foo does not have enough permissions to create(mknod) these linkto files, you would see such failures. So, a access as root would succeed, since mknod's would pass through. We have fixed this in bug 884597, where the mknod happens as root:root, and then we change the permissions to the original file. One of the work-around is to change permissions of the volume itself. Please let me know if upgrading the to the release which has the fix solves your issue. Will mark it as duplicate of the bug once you confirm the behaviour.