Created attachment 923023 [details] Logs for daemon, shd, brick and client Description of problem: Gluster volume remove-brick vol replica n-1 caused an issue where a fuse client that had it mounted during the remove generated massive amounts of log data. After a remount some files are still causing warnings about SETXATTR() GETXATTR() ACCESS() Version-Release number of selected component (if applicable): 3.4.2-1 How reproducible: Not sure, I expect easily but I did not attempt. We had a 3 brick replica volume and I wanted to make some disk changes under one of the bricks, so I removed one. Steps to Reproduce: 1. Create a 3 brick replica volume, put some data one it 2. Mount volume using fuse client in linux 3. Remove a brick from the volume and watch the fuse client logs Actual results: Expected results: Additional info: Gluster volume info datashare_volume Volume Name: datashare_volume Type: Replicate Volume ID: 56f56605-b6ca-41bd-bfa2-cebc0145c94a Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: balthasar-gluster:/zvol/gluster_datashare/datashare_volume Brick2: melchior-gluster:/zvol/gluster_datashare/datashare_volume Options Reconfigured: server.statedump-path: /tmp/ getfattr -m . -d -e hex Thumbs.db (On Melchior) # file: Thumbs.db system.posix_acl_access=0x0200000001000600ffffffff04000000ffffffff080007009508e86308000500a108e86308000700ae08e86310000700ffffffff20000000ffffffff trusted.afr.datashare_volume-client-0=0x000000000000000000000000 trusted.afr.datashare_volume-client-1=0x000000000000000000000000 trusted.afr.datashare_volume-client-2=0x000000000000000000000000 trusted.gfid=0x48bf86616b904332913fc1a6d7838ed2 getfattr -m . -d -e hex Thumbs.db (On Balthasar) # file: Thumbs.db system.posix_acl_access=0x0200000001000600ffffffff04000000ffffffff080007009508e86308000500a108e86308000700ae08e86310000700ffffffff20000000ffffffff trusted.afr.datashare_volume-client-0=0x000000000000000000000000 trusted.afr.datashare_volume-client-1=0x000000000000000000000000 trusted.afr.datashare_volume-client-2=0x000000000000000000000000 trusted.gfid=0x48bf86616b904332913fc1a6d7838ed2
I just realized some of my logs go back further than I expected, the remove-brick would have taken place on 07/28/2014, anything earlier than that is likely related to past issues.
I'd also like to note that windows "Thumbs.db" seems almost exclusively affected for "SETATTR" (which is the majority of the errors). Only one or two other directories have complained on "ACCESS" (but this could be unrelated?) This info may not be significant but I figure more is always better.
hi Mark, Thanks for raising the bug. Are the applications erroring out because of these errors? or these are just the warnings are appearing in the logs? Pranith
Thanks Pranith, at first I thought it might simply be this https://bugzilla.redhat.com/show_bug.cgi?id=1104861 With the help of JoeJulian I removed any 3rd AFRs (assuming there were any), (stepped down from 3-way rep to 2-way online) but it appears the issue is still around. The specific problem I'm seeing is in my client log it complains when trying to perform a SETXATTR. An example from today is On a temp file probably excel or something [2014-08-01 17:25:30.162001] W [fuse-bridge.c:1172:fuse_err_cbk] 0-glusterfs-fuse: 3602757: SETXATTR() /datashare/accounting/Accounts Payable/AP REPORTS/2014 Reports/AP Totals/DBB2C7B6.tmp => -1 (Operation not permitted) [2014-08-01 17:25:30.162124] W [fuse-bridge.c:993:fuse_setattr_cbk] 0-glusterfs-fuse: 3602758: SETATTR() /datashare/accounting/Accounts Payable/AP REPORTS/2014 Reports/AP Totals/DBB2C7B6.tmp => -1 (Operation not permitted) And many Thumbs.db do this [2014-08-01 17:43:19.244029] W [fuse-bridge.c:1172:fuse_err_cbk] 0-glusterfs-fuse: 3625342: SETXATTR() /datashare/engineering/Engineering-Operations/GoranB/Thumbs.db => -1 (Operation not permitted) [2014-08-01 17:43:19.244226] W [fuse-bridge.c:993:fuse_setattr_cbk] 0-glusterfs-fuse: 3625343: SETATTR() /datashare/engineering/Engineering-Operations/GoranB/Thumbs.db => -1 (Operation not permitted) I'm assuming Gluster is trying to set, add or remove an xattr from these files but the logs don't indicate what or why. So far, no files seem to be damaged Here are the xattrs for one of these files getfattr -m . -d -e hex ./Thumbs.db # file: Thumbs.db system.posix_acl_access=0x0200000001000600ffffffff04000000ffffffff080007009908e86308000500a508e86308000700b208e86310000700ffffffff20000000ffffffff trusted.afr.datashare_volume-client-0=0x000000000000000000000000 trusted.afr.datashare_volume-client-1=0x000000000000000000000000 trusted.gfid=0x69edffaed25d49a9baa970a04e310441 None of the server logs seem to be complaining right now, not the glusterd, shd or brick logs just the client. I have not remounted the fuse mount since running the setfattr -x but I'd be surprised if that was the case. The heal status says all is good, no healed recently, no split-brain, nothing healing.
Hi Mark, For the current error, can you remount fuse mount with acl option, like -o acl and see if you still get the error messages. I have seen these errors when "user.xyz" extended attributes are set over a fuse mount without acl option. Secondly, Are you accessing gluster volume through Samba over a fuse mount? If yes, have you tried using the vfs plugin that we have? That is the recommend way for 3.4 onwards. Get more details here http://lalatendumohanty.wordpress.com/2014/02/11/using-glusterfs-with-samba-and-samba-vfs-plugin-for-glusterfs-on-fedora-20/ and packages here http://download.gluster.org/pub/gluster/glusterfs/samba/
Thanks for your response Raghavendra, This is our fstab on the client (no we are not using VFS presently). The clients job is basically to be a proxy SMB server for windows clients while providing replication storage over gluster. # Mount Gluster Share balthasar-gluster:/datashare_volume /mnt/datashare_gluster glusterfs defaults,acl,_netdev 0 0 This is one of our brick mounts (all are the same) /dev/epoch/gluster_datashare-part1 /zvol/gluster_datashare ext4 acl,user_xattr,defaults While I can appreciate that VFS is recommended (and will look into, setup, test etc) this problem is new and only occurred after removing a replicate brick. At this point I understand that 3.4.x has issues with add/replace/remove brick while the volume is live (if I'm incorrect please clarify). However, we have remounted the volume more than once, it was always mounted with acl and as far as I can get (getfattr) there is nothing unusual about the xattrs of the files coming up in our warning messages (something that never happened before). So I'd have to guess that some damage has been done to metadata, but I'm not informed enough to find it. At this point I'm looking at creating a new volume and migrating the data using rsync but I'd also really like to get to the bottom of these warnings. Presently I have no knowledge of what fuse_setattr_cbk is trying to do that it either can't (or thinks it can't) do? Normally I would look elsewhere for a warning message like this (permission problems etc) but in this case the warning has only occurred after the brick removal and as far as I can tell there is no data damage or loss. However, if xattr attributes are failing to update and they control replication or healing data loss is definitely a fear, which is what prompts us to consider moving to a new volume (yes we have backups). Please let me know if there is any further data I can provide.
This bug likely still exists in 3.4.2-1. However, I have migrated all of my servers/clients to 3.5.2. As far as I know it is not safe to add or remove bricks on a live volume in 3.4.2-1 if files are open.
GlusterFS 3.7.0 has been released (http://www.gluster.org/pipermail/gluster-users/2015-May/021901.html), and the Gluster project maintains N-2 supported releases. The last two releases before 3.7 are still maintained, at the moment these are 3.6 and 3.5. This bug has been filed against the 3,4 release, and will not get fixed in a 3.4 version any more. Please verify if newer versions are affected with the reported problem. If that is the case, update the bug with a note, and update the version if you can. In case updating the version is not possible, leave a comment in this bug report with the version you tested, and set the "Need additional information the selected bugs from" below the comment box to "bugs". If there is no response by the end of the month, this bug will get automatically closed.
GlusterFS 3.4.x has reached end-of-life.\ \ If this bug still exists in a later release please reopen this and change the version or open a new bug.