Description of problem: writes to glusterfs folder are not synced to other nodes unless they are explicitly read from glusterfs mount for example, I can write 1000 files in a folder in the glusterfs mount, then if I ssh to another server and query the actual brick, the files are not there, if I `cat` each file from the glusterfs mount - files appear in the brick mount. Aren't files supposed to be automatically copied to the brick? What if meanwhile the server on which I performed the writes goes down, are they lost? Version-Release number of selected component (if applicable): 3.7.2-3.el7 How reproducible: very Steps to Reproduce: 1. touch /ssd_data/test.file on 10.0.1.3 (ssd_data is glusterfs mount) 2. try and open /ssd_storage/test.file on 10.0.1.2 (not found) (ssd_storage is the brick mount) Actual results: file not found Expected results: file is present Additional info: Number of volumes 2 Volume Names ssd-gluster-data, sata-gluster-data Type of volumes replica 3 Output of gluster volume info gluster volume info Volume Name: sata-gluster-data Type: Replicate Volume ID: dbb498cc-fa8f-4513-8d6c-1aa9a159b7cf Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.0.1.3:/sata_storage/GLUSTER_DO_NOT_MODIFY Brick2: 10.0.1.2:/sata_storage/GLUSTER_DO_NOT_MODIFY Brick3: 10.0.1.1:/sata_storage/GLUSTER_DO_NOT_MODIFY Options Reconfigured: nfs.disable: off cluster.server-quorum-type: server cluster.server-quorum-ratio: 60 Volume Name: ssd-gluster-data Type: Replicate Volume ID: 967b2328-c5d3-4ea1-8cd0-8114796f3b50 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.0.1.3:/ssd_storage/GLUSTER_DO_NOT_MODIFY Brick2: 10.0.1.2:/ssd_storage/GLUSTER_DO_NOT_MODIFY Brick3: 10.0.1.1:/ssd_storage/GLUSTER_DO_NOT_MODIFY Options Reconfigured: nfs.disable: off cluster.server-quorum-type: server cluster.server-quorum-ratio: 60 Output of gluster volume status gluster volume status Status of volume: sata-gluster-data Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.0.1.3:/sata_storage/GLUSTER_DO_NOT _MODIFY 49153 0 Y 16358 Brick 10.0.1.2:/sata_storage/GLUSTER_DO_NOT _MODIFY 49155 0 Y 8691 Brick 10.0.1.1:/sata_storage/GLUSTER_DO_NOT _MODIFY 49154 0 Y 24495 NFS Server on localhost 2049 0 Y 8705 Self-heal Daemon on localhost N/A N/A Y 8713 NFS Server on 10.0.1.1 2049 0 Y 24509 Self-heal Daemon on 10.0.1.1 N/A N/A Y 24517 NFS Server on 10.0.1.3 2049 0 Y 30660 Self-heal Daemon on 10.0.1.3 N/A N/A Y 30699 Task Status of Volume sata-gluster-data ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: ssd-gluster-data Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.0.1.3:/ssd_storage/GLUSTER_DO_NOT_ MODIFY 49152 0 Y 16366 Brick 10.0.1.2:/ssd_storage/GLUSTER_DO_NOT_ MODIFY 49152 0 Y 8698 Brick 10.0.1.1:/ssd_storage/GLUSTER_DO_NOT_ MODIFY 49153 0 Y 24502 NFS Server on localhost 2049 0 Y 8705 Self-heal Daemon on localhost N/A N/A Y 8713 NFS Server on 10.0.1.1 2049 0 Y 24509 Self-heal Daemon on 10.0.1.1 N/A N/A Y 24517 NFS Server on 10.0.1.3 2049 0 Y 30660 Self-heal Daemon on 10.0.1.3 N/A N/A Y 30699 Task Status of Volume ssd-gluster-data ------------------------------------------------------------------------------ There are no active volume tasks fstab: 10.0.1.3:/ssd-gluster-data /ssd_data glusterfs defaults,_netdev,backup-volfile-servers=10.0.1.1:10.0.1.2 0 0 10.0.1.3:/sata-gluster-data /sata_data glusterfs defaults,_netdev,backup-volfile-servers=10.0.1.1:10.0.1.2 0 0 Client Information OS Type: Linux Mount type: fuse.glusterfs OS Version: CentOS 7, x86_64
Forgot to add, my ssd_data.log is full with: [2015-07-26 14:05:43.676576] E [MSGID: 114031] [client-rpc-fops.c:251:client3_3_mknod_cbk] 0-ssd-gluster-data-client-2: remote operation failed: Permission denied. Path: (null) [Permission denied] [2015-07-26 14:05:43.841046] E [MSGID: 114031] [client-rpc-fops.c:251:client3_3_mknod_cbk] 0-ssd-gluster-data-client-1: remote operation failed: Permission denied. Path: (null) [Permission denied] [2015-07-26 14:05:43.841409] E [MSGID: 114031] [client-rpc-fops.c:251:client3_3_mknod_cbk] 0-ssd-gluster-data-client-2: remote operation failed: Permission denied. Path: (null) [Permission denied] [2015-07-26 14:05:44.251452] E [MSGID: 114031] [client-rpc-fops.c:251:client3_3_mknod_cbk] 0-ssd-gluster-data-client-1: remote operation failed: Permission denied. Path: (null) [Permission denied] [2015-07-26 14:05:44.251818] E [MSGID: 114031] [client-rpc-fops.c:251:client3_3_mknod_cbk] 0-ssd-gluster-data-client-2: remote operation failed: Permission denied. Path: (null) [Permission denied] [2015-07-26 14:05:44.643556] E [MSGID: 114031] [client-rpc-fops.c:321:client3_3_mkdir_cbk] 0-ssd-gluster-data-client-1: remote operation failed: Permission denied. Path: (null) [Permission denied] [2015-07-26 14:05:44.643940] E [MSGID: 114031] [client-rpc-fops.c:321:client3_3_mkdir_cbk] 0-ssd-gluster-data-client-2: remote operation failed: Permission denied. Path: (null) [Permission denied] [2015-07-26 14:05:44.816162] E [MSGID: 114031] [client-rpc-fops.c:251:client3_3_mknod_cbk] 0-ssd-gluster-data-client-1: remote operation failed: Permission denied. Path: (null) [Permission denied] that's on servers 10.0.1.1 and 10.0.1.2 10.0.1.3 has just: [2015-07-29 15:19:11.311369] W [fuse-bridge.c:1263:fuse_err_cbk] 0-glusterfs-fuse: 352139993: REMOVEXATTR() /somefile.. => -1 (No data available) selinux is disabled on all 3 servers
Ivo, Could you share the brick logs from 10.0.1.1 and 10.0.1.2? Thanks, Krutika
Hello Krutika, see my first comment here - that's all that's relevant in those logs.
Ivo, Sorry I did not make myself clear. Could you share the logs at /var/log/glusterfs/bricks/*.log on 10.0.1.1 and 10.0.1.2? -Krutika
Sorry - on 10.0.1.1 and 10.0.1.2 I have tons of [2015-08-04 13:10:28.801167] E [MSGID: 115057] [server-rpc-fops.c:535:server_mknod_cbk] 0-ssd-gluster-data-server: 193268: MKNOD /$FILE (04cc4296-7183-4efb-9bcc-7e6857eb2076/$FILE) ==> (Permission denied) [Permission denied] when healing (cat from glusterfs mount) log is full with: [2015-08-04 12:59:51.012863] I [dict.c:467:dict_get] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7f4c15b2cef6] (--> /usr/lib64/libglusterfs.so.0(dict_get+0x90)[0x7f4c15b25ce0] (--> /usr/lib64/glusterfs/3.7.2/xlator/storage/posix.so(posix_setxattr+0x1b9)[0x7f4c0823c439] (--> /usr/lib64/libglusterfs.so.0(default_setxattr+0x75)[0x7f4c15b313a5] (--> /usr/lib64/glusterfs/3.7.2/xlator/features/changetimerecorder.so(ctr_setxattr+0x186)[0x7f4c037cc696] ))))) 0-dict: !this || key=dht-get-iatt-in-xattr
(In reply to Ivo Petrov from comment #5) > Sorry - on 10.0.1.1 and 10.0.1.2 I have tons of > > [2015-08-04 13:10:28.801167] E [MSGID: 115057] > [server-rpc-fops.c:535:server_mknod_cbk] 0-ssd-gluster-data-server: 193268: > MKNOD /$FILE (04cc4296-7183-4efb-9bcc-7e6857eb2076/$FILE) ==> (Permission > denied) [Permission denied] > > when healing (cat from glusterfs mount) log is full with: > > > [2015-08-04 12:59:51.012863] I [dict.c:467:dict_get] (--> > /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7f4c15b2cef6] (--> > /usr/lib64/libglusterfs.so.0(dict_get+0x90)[0x7f4c15b25ce0] (--> > /usr/lib64/glusterfs/3.7.2/xlator/storage/posix. > so(posix_setxattr+0x1b9)[0x7f4c0823c439] (--> > /usr/lib64/libglusterfs.so.0(default_setxattr+0x75)[0x7f4c15b313a5] (--> > /usr/lib64/glusterfs/3.7.2/xlator/features/changetimerecorder. > so(ctr_setxattr+0x186)[0x7f4c037cc696] ))))) 0-dict: !this || > key=dht-get-iatt-in-xattr This looks like an error that could *possibly* be originating from posix-acl translator. CC'ing Raghavendra Talur. Raghavendra, Would you be able to help us out here? Basically, entry creations on Ivo's 3-way replicated volume seem to be succeeding on only the first sub-volume of AFR and failing with EACCES (?) on the remaining two.
(In reply to Ivo Petrov from comment #5) > when healing (cat from glusterfs mount) log is full with: > > > [2015-08-04 12:59:51.012863] I [dict.c:467:dict_get] (--> > /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7f4c15b2cef6] (--> > /usr/lib64/libglusterfs.so.0(dict_get+0x90)[0x7f4c15b25ce0] (--> > /usr/lib64/glusterfs/3.7.2/xlator/storage/posix. > so(posix_setxattr+0x1b9)[0x7f4c0823c439] (--> > /usr/lib64/libglusterfs.so.0(default_setxattr+0x75)[0x7f4c15b313a5] (--> > /usr/lib64/glusterfs/3.7.2/xlator/features/changetimerecorder. > so(ctr_setxattr+0x186)[0x7f4c037cc696] ))))) 0-dict: !this || > key=dht-get-iatt-in-xattr This is fixed in 3.7.3 version, you can look at this bug https://bugzilla.redhat.com/show_bug.cgi?id=1235923
(In reply to Ivo Petrov from comment #5) > Sorry - on 10.0.1.1 and 10.0.1.2 I have tons of > > [2015-08-04 13:10:28.801167] E [MSGID: 115057] > [server-rpc-fops.c:535:server_mknod_cbk] 0-ssd-gluster-data-server: 193268: > MKNOD /$FILE (04cc4296-7183-4efb-9bcc-7e6857eb2076/$FILE) ==> (Permission > denied) [Permission denied] For this original problem, my first question is about the user/process which is doing the write on gluster mount. Is it root or someone else? If someone else, does that user exist on the other two nodes? If non of the above the help us with determining whats wrong, we will have to check the brick logs at debug level. You can do that by executing "gluster volume set <VOLNAME> brick-log-level DEBUG".
Ivo Petrov, Are you still experiencing this bug? Could you provide the information asked by Raghavendra Talur in comment #8? Pranith
Hello, I'm sorry, I've missed comment #8. I've tested this again and it seems to be working as expected now, the only difference is I had umount/mount the problematic mount on the node not from itself (local IP) but from another node, after that I checked consistency with a script I wrote - get all files from a "good node" get all files from "problematic node" diff the two lists and `cat to dev null` all files that are different on the prolematic node (cat is from glusterfs mount which forces them to be synced locally) After all that above, I can no longer reproduce the original issue. The mounts/setup is currently as in the original post and all is working as expected.
Thanks Ivo Petrov. Closing the bug as per Comment #10