Bug 913699 - Conservative merge fails on client3_1_mknod_cbk
Summary: Conservative merge fails on client3_1_mknod_cbk
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: 3.3.0
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Nagaprasad Sathyanarayana
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-02-21 20:04 UTC by Shawn Nock
Modified: 2016-02-18 00:19 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-11-03 16:20:36 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Vol file (2.75 KB, application/octet-stream)
2013-02-21 20:04 UTC, Shawn Nock
no flags Details

Description Shawn Nock 2013-02-21 20:04:26 UTC
Created attachment 700807 [details]
Vol file

Description of problem:

After add-brick and rebalance fix-layout many files are inaccessible from clients (invalid argument). 

Client log:

[2013-02-21 12:48:53.121691] I [afr-self-heal-entry.c:2333:afr_sh_entry_fix] 0-mirror-replicate-2: [...]/140.ACQ: Performing conservative merge

[2013-02-21 12:48:54.100924] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-mirror-client-5: remote operation failed: Permission denied. Path: [...]/140.ACQ/1.3.12.2.1107.5.2.32.35052.2011011711023368433700039.IMA (ddef2b1a-b3cc-424c-a663-995bb77cd4c4)

[2013-02-21 12:48:54.101005] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-mirror-client-4: remote operation failed: Permission denied. Path: [...]/140.ACQ/1.3.12.2.1107.5.2.32.35052.2011011711023368433700039.IMA (ddef2b1a-b3cc-424c-a663-995bb77cd4c4)

[2013-02-21 13:20:31.971211] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 1360169: OPEN() [...]/140.ACQ/1.3.12.2.1107.5.2.32.35052.2011011711023368433700039.IMA => -1 (Invalid argument)

Additional info (xattrs for the directory on all bricks and file (where it exists):

ogawa:/raid/mirror2.../140.ACQ/: no link, no file
trusted.afr.mirror-client-4=0x000000000000000000000000
trusted.afr.mirror-client-5=0x000000000000000000000000
trusted.gfid=0x94cd9acd1a0e4bdc88fb400a1176c830
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9

ogawa:/raid/mirror.../140.ACQ/: no link, no file
trusted.gfid=0x94cd9acd1a0e4bdc88fb400a1176c830
trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff

mansfield:/raid/mirror.../140.ACQ/: File exists
trusted.gfid=0x94cd9acd1a0e4bdc88fb400a1176c830
trusted.glusterfs.dht=0x00000001000000000000000055555554

  file (.../140.ACQ/1.3.12.2.1107.5.2.32.35052.2011011711023368433700039.IMA):
  trusted.afr.mirror-client-0=0x000000000000000000000000
  trusted.afr.mirror-client-1=0x000000000000000000000000
  trusted.gfid=0xddef2b1ab3cc424ca663995bb77cd4c4

rabi:/raid/mirror.../140.ACQ/: no file, no link
trusted.gfid=0x94cd9acd1a0e4bdc88fb400a1176c830
trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff

rabi:/raitd/mirror2.../140.ACQ/: no file, no link
trusted.afr.mirror-client-4=0x000000000000000000000000
trusted.afr.mirror-client-5=0x000000000000000000000000
trusted.gfid=0x94cd9acd1a0e4bdc88fb400a1176c830
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9

lauterbur:/raid/mirror/.../140.ACQ/: File exists
trusted.gfid=0x94cd9acd1a0e4bdc88fb400a1176c830
trusted.glusterfs.dht=0x00000001000000000000000055555554

  file (.../140.ACQ/1.3.12.2.1107.5.2.32.35052.2011011711023368433700039.IMA):
  trusted.afr.mirror-client-0=0x000000000000000000000000
  trusted.afr.mirror-client-1=0x000000000000000000000000
  trusted.gfid=0xddef2b1ab3cc424ca663995bb77cd4c4

Comment 1 Shawn Nock 2013-02-21 20:08:22 UTC
Reading the effected file (1.3.12.2.1107.5.2.32.35052.2011011711023368433700039.IMA) on the fuse mount on any of the brick-servers is successful... any afterward the file is accessible on the clients as normal.

I am looking for a solution that doesn't mean I need to find /gluster-fuse/mnt -type f -exec cat {} > /dev/null \;

Comment 2 Shawn Nock 2013-02-21 20:24:08 UTC
Fuse Client stat of parent dir:
  File: ‘140.ACQ’
  Size: 8198            Blocks: 32         IO Block: 131072 directory
Device: 27h/39d Inode: 9870553420299421744  Links: 2
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-01-23 14:08:49.667947000 -0500
Modify: 2013-01-23 14:08:49.667947000 -0500
Change: 2013-02-21 15:15:16.890864712 -0500
 Birth: -

Stat of dir on failing (permission denied) bricks:

ogawa:/raid/mirror2 (mirror-client-5)
  File: `.'
  Size: 6               Blocks: 8          IO Block: 4096   directory
Device: fd00h/64768d    Inode: 9674807     Links: 2
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:file_t:s0
Access: 2013-01-23 14:08:49.667947000 -0500
Modify: 2013-02-21 13:26:11.021860626 -0500
Change: 2013-02-21 15:16:36.835845611 -0500
 Birth: -

rabi:/raid/mirror2 (mirror-client-4)
  File: `.'
  Size: 6               Blocks: 8          IO Block: 4096   directory
Device: fd01h/64769d    Inode: 9674807     Links: 2
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:file_t:s0
Access: 2013-01-23 14:08:49.667947000 -0500
Modify: 2013-01-23 14:08:49.667947000 -0500
Change: 2013-02-21 15:15:20.557943450 -0500
 Birth: -

Stat of directory on remaining bricks:

ogawa:/raid/mirror (mirror-client-2)
  File: `.'
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: fd00h/64768d    Inode: 55051086    Links: 2
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2012-02-17 10:25:19.000000000 -0500
Modify: 2012-02-17 10:25:19.000000000 -0500
Change: 2013-02-19 20:53:29.211653885 -0500
 Birth: -


rabi:/raid/mirror (mirror-client-3)
  File: `.'
  Size: 4096            Blocks: 16         IO Block: 4096   directory
Device: fd00h/64768d    Inode: 143917118   Links: 2
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:file_t:s0
Access: 2012-02-17 10:25:19.000000000 -0500
Modify: 2012-02-17 10:25:19.000000000 -0500
Change: 2013-02-19 20:53:31.227207773 -0500
 Birth: -

mansfield:/raid/mirror (mirror-client-1)
File: `.'
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: fd00h/64768d    Inode: 237376498   Links: 2
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2012-02-17 10:25:19.000000000 -0500
Modify: 2012-02-17 10:25:19.000000000 -0500
Change: 2013-02-19 20:53:27.470359001 -0500
 Birth: -

lauterbur:/raid/mirror (mirror-client-0)
 File: `.'
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: fd00h/64768d    Inode: 55051086    Links: 2
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2012-02-17 10:25:19.000000000 -0500
Modify: 2012-02-17 10:25:19.000000000 -0500
Change: 2013-02-19 20:53:29.211653885 -0500
 Birth: -

Comment 3 Shawn Nock 2013-02-21 20:26:45 UTC
ogawa:/raid/mirror2 and rabi:/raid/mirror2 are the two new bricks... the one's with the "permission denied" errors in mknod. They are XFS where as the older bricks are ext4.

Selinux is disabled and there are no ACLs in place on the bricks.

Comment 4 Shawn Nock 2013-02-21 21:25:04 UTC
After reading the file on a brick-server's fuse mount; I find that link files are created on the new bricks that point to the correct replica... and everything works (from all clients). 

Why would a dht miss from one fuse client "do the right thing" (brick-server), while a dht miss from another client (any not serving a brick) get permission denied error?

Comment 5 Shawn Nock 2013-02-22 14:42:12 UTC
The successful heal operation on the brick-server looks like this in the fuse-client log:

[2013-02-22 05:20:52.932596] I [afr-common.c:1340:afr_launch_self_heal] 0-mirror-replicate-1: background  meta-data self-heal triggered. path: [...]/140.ACQ/1.3.12.2.1107.5.2.32.35052.2011011711023368433700039.IMA, reason: lookup detected pending operations
[2013-02-22 05:20:52.941263] I [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk] 0-mirror-replicate-1: background  meta-data self-heal completed on [...]/140.ACQ/1.3.12.2.1107.5.2.32.35052.2011011711023368433700039.IMA

Compare to failing operation on non-brick-server fuse mount above.

Comment 6 Shawn Nock 2013-02-22 14:58:24 UTC
On the servers hosting the 'new' bricks, the self-heal produces:

[2013-02-21 16:50:44.833730] I [dht-common.c:954:dht_lookup_everywhere_cbk] 0-mirror-dht: deleting stale linkfile [...]/140.ACQ/1.3.12.2.1107.5.2.13.20522.4.0.15670664232227.IMA on mirror-replicate-1

Comment 7 Dan Bretherton 2013-02-27 18:34:22 UTC
I think I am seeing the same bug, which affects a significant proportion of paths during and after fix-layout operations.  Access to affected files and directories by unprivileged users results in either "Invalid argument" or "Input/Output error".  Users with write access to the parent directory do not experience any problems.  In some cases "ls -l" fails with "Input/Output error", and in other cases "ls -l" works but attempting to open the file for reading results in "Invalid argument".  Here is the most recent example I have found, where I am attempting to open a file owned by user qw901080 as user sms05dab, and the parent directory is owned by user qw901080.

[sms05dab@perseus netcdf-4.0]$ /packages/netcdf/netcdf-4.0/bin/ncdump -h /glusterfs/nemo/users/mvc/WORK/ORCA025_ECMWF/ORCA025-R07-MEAN/Exp4/1989/ORCA025-R07_y1989m02_gridT.nc
/packages/netcdf/netcdf-4.0/bin/ncdump: /glusterfs/nemo/users/mvc/WORK/ORCA025_ECMWF/ORCA025-R07-MEAN/Exp4/1989/ORCA025-R07_y1989m02_gridT.nc: Invalid argument
[sms05dab@perseus netcdf-4.0]$ ls -l  /glusterfs/nemo/users/mvc/WORK/ORCA025_ECMWF/ORCA025-R07-MEAN/Exp4/1989/ORCA025-R07_y1989m02_gridT.nc-rw-r--r-- 1 qw901080 nemo 618364484 Jul 15  2011 /glusterfs/nemo/users/mvc/WORK/ORCA025_ECMWF/ORCA025-R07-MEAN/Exp4/1989/ORCA025-R07_y1989m02_gridT.nc
[sms05dab@perseus netcdf-4.0]$ ls -ld  /glusterfs/nemo/users/mvc/WORK/ORCA025_ECMWF/ORCA025-R07-MEAN/Exp4/1989
drwxr-xr-x 2 qw901080 nemo 98322 Feb 20 22:19 /glusterfs/nemo/users/mvc/WORK/ORCA025_ECMWF/ORCA025-R07-MEAN/Exp4/1989

The "Invalid argument" error occurred when trying to open the file via NFS and native GlusterFS client, both on a compute server and on a GlusterFS storage server.  The ncdump utility required read access to the file, which should have been possible given that the file was world readable.

Changing the permissions of the parent directory to 775 gave user sms05dab (a member of the nemo group) write access to the file and the parent directory, and this allowed the file to be opened for reading by the ncdump utility.

Comment 8 Shawn Nock 2013-03-05 17:00:36 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=884597

Extrapolating from the above: I suspect that the UID:GID from the frame doesn't have permission to unlink/create the stale/nonexistant linkfile.

When I access the problematic files as superuser, the links are established correctly.

Comment 9 Shawn Nock 2013-03-05 17:45:19 UTC
For others experiencing this problem, you can correct the problem (create/update the link files) by open all the files as root:

find /mnt/fuse -type f -exec sh -c "< {}" \;

This takes quite a while if you have many files in your volume (as I do).

Comment 10 shishir gowda 2013-03-15 10:29:16 UTC
Hi Shawn,

That is correct, the bug 884597 should fix your issue. The reason you are seeing failures is:

1. File: foo has x:uid y:gid
2. After a add-brick followed by afix-layout/rebalance, the hash subvolume of the file foo changes.
3. Subsequent access to these files end up creating linkto files on the hashed_subvol.
4. The creation happens with uid/gid of the application which is accessing the files. 
4. If the Parent of the File foo does not have enough permissions to create(mknod) these linkto files, you would see such failures.

So, a access as root would succeed, since mknod's would pass through.

We have fixed this in bug 884597, where the mknod happens as root:root, and then we change the permissions to the original file.

One of the work-around is to change permissions of the volume itself.

Please let me know if upgrading the to the release which has the fix solves your issue. Will mark it as duplicate of the bug once you confirm the behaviour.


Note You need to log in before you can comment on or make changes to this bug.