Description of problem: Have got 2 VM's running Centos 6 with gluster, so 2 bricks. Have got an issue where a file based queue system we have got shows some strange results. Looks like the files are fine on both servers. Permissions the same. Filesize the same. But when you cat/less the file, on 1 server the content of the file is there, on the other server its blank. If I restart glusterd, or touch the file, the content of the file will return. I can duplicate the bug by running 2 php scripts. Using PHP, with write.php script I can fopen & write content to a file in a loop, then once file written, with file_get_contents read the contents of the file fine and I can then move it to another directory pretty much all the time. Files appear on both bricks & are ok almost all the time. However if I have another script running also trying to read files (also using file_get_contents in php) at the same time, I occasionally get permission denied errors (I presume its trying to open a file thats currently being written or moved). Its then I see the blank content in 1 of the files If I restart glusterd, or even touch the blank file, the content of the file comes back. It looks like maybe the self healing isn't 100% working in my environment? Maybe caching initial read of file as it was being written? I also found this fix which may be related? http://review.gluster.com/#/c/3775/ So maybe its fixed in a newer version? Maybe a different gluster volume setting I need? When I enable: diagnostics.client-log-level: DEBUG diagnostics.brick-log-level: DEBUG Nothing seems obvious in terms of errors. If you cannot reproduce, I can provide cli.log etc if needed. Running 'gluster volume info' I get (modded ip addresses) gluster volume info Volume Name: prxvol Type: Replicate Volume ID: b5879503-3c6b-4556-b6ee-e600208150a7 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 1.1.1.1:/data Brick2: 2.2.2.2:/data Options Reconfigured: cluster.data-self-heal-algorithm: diff diagnostics.client-log-level: INFO diagnostics.brick-log-level: INFO auth.allow: 1.1.1.1,2.2.2.2 Version-Release number of selected component (if applicable): 3.3.1 How reproducible: Run write.php in a loop on both servers - create random files, tries to read its contents once written, then moves the files to another gluster dir ie php -q write.php At the same time, run read.php in a loop on both servers - just tries to read file contents from either directory (original & moved) ie for i in {1..200}; do php -q read.php; done read.php will show some files where there might have been a problem reading in files. Doing a less shows content of the files blank. Steps to Reproduce: 1. php -q write.php 2. for i in {1..200}; do php -q read.php; done 3. read.php will output an array of files where there was some issue reading in the contents of the file. When I 'less' the files found on the server (will be different files on each server) I find content is blank Actual results (file size/permissions/timestamp identical): server 1: $ ll /mnt/glusterfs/prx_stage/data/testing/done/51b0fa738c28e -rw-rw-r-- 1 daz daz 16 Jun 6 21:09 /mnt/glusterfs/prx_stage/data/testing/done/51b0fa738c28e server 2: $ ll /mnt/glusterfs/prx_stage/data/testing/done/51b0fa738c28e -rw-rw-r-- 1 daz daz 16 Jun 6 21:09 /mnt/glusterfs/prx_stage/data/testing/done/51b0fa738c28e However 'less' on both files shows: server 1 (shows blank): less /mnt/glusterfs/prx_stage/data/testing/done/51b0fa738c28e /mnt/glusterfs/prx_stage/data/testing/done/51b0fa738c28e (END) server 2 (shows 'this is a test'): less /mnt/glusterfs/prx_stage/data/testing/done/51b0fa738c28e this is a test /mnt/glusterfs/prx_stage/data/testing/done/51b0fa738c28e (END) Expected results (file size/permissions/timestamp identical): server 1: $ ll /mnt/glusterfs/prx_stage/data/testing/done/51b0fa738c28e -rw-rw-r-- 1 daz daz 16 Jun 6 21:09 /mnt/glusterfs/prx_stage/data/testing/done/51b0fa738c28e server 2: $ ll /mnt/glusterfs/prx_stage/data/testing/done/51b0fa738c28e -rw-rw-r-- 1 daz daz 16 Jun 6 21:09 /mnt/glusterfs/prx_stage/data/testing/done/51b0fa738c28e less should show content in both files: server 1 (shows 'this is a test'): less /mnt/glusterfs/prx_stage/data/testing/done/51b0fa738c28e this is a test /mnt/glusterfs/prx_stage/data/testing/done/51b0fa738c28e (END) server 2 (shows 'this is a test'): less /mnt/glusterfs/prx_stage/data/testing/done/51b0fa738c28e this is a test /mnt/glusterfs/prx_stage/data/testing/done/51b0fa738c28e (END) Additional info:
Created attachment 757963 [details] writes files to gluster dir
Created attachment 757964 [details] reads files
After advice from gluster-users list, I installed Gluster 3.4 beta3, and the problem I was seeing has been fixed. I guess it means it is a genuine potential bug in 3.3.1, but has been fixed in the latest release.
The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug. If there has been no update before 9 December 2014, this bug will get automatocally closed.