Version-Release number of selected component (if applicable): 3.8.9 Description of problem: I created a single-brick setup & copied all the files into it (29 Gb). Then did add-brick with replica 2 on a second node and weird things started to happen. The replicated brick is smaller (23 Gb) than the original (29 Gb). I did a rsync to compare the folders and found that some files are empty (0 bytes) and have wrong permissions. original block: root@haproxy-01 /glusterfs/www $ ls -l archive/nightparty/wwwold/images/movies/temp/989.121-15x10.jpg -rw-r--r-- 2 www www 7897 Feb 29 2008 archive/nightparty/wwwold/images/movies/temp/989.121-15x10.jpg replicated block: root@haproxy-02 /glusterfs/www $ ls -l archive/nightparty/wwwold/images/movies/temp/989.121-15x10.jpg -rw-r--r-- 2 root root 0 Mar 1 15:44 archive/nightparty/wwwold/images/movies/temp/989.121-15x10.jpg
brick*
I upgraded to 3.10.0 and recreated the replicated brick. Now I have 25 Gb out of 29 Gb. And according to diff, some files are missing on the replicated block.
Last lines in /var/log/glusterfs/bricks/glusterfs-www.log on the original brick: [2017-03-03 21:17:13.622882] E [MSGID: 138003] [index.c:610:index_link_to_base] 0-www-index: /glusterfs/www/.glusterfs/indices/xattrop/46b0994d-d3c1-4970-8958-de42b70541d9: Not able to add to index [Too many links] [2017-03-03 21:17:13.632007] E [MSGID: 138003] [index.c:610:index_link_to_base] 0-www-index: /glusterfs/www/.glusterfs/indices/xattrop/1c39f3c4-ed13-45bd-a9b9-15354ebe448a: Not able to add to index [Too many links] [2017-03-03 21:17:13.640919] E [MSGID: 138003] [index.c:610:index_link_to_base] 0-www-index: /glusterfs/www/.glusterfs/indices/xattrop/0ca82d30-f7d6-4478-abe7-8d5b2bb99c03: Not able to add to index [Too many links] [2017-03-03 21:17:37.074396] E [MSGID: 138003] [index.c:610:index_link_to_base] 0-www-index: /glusterfs/www/.glusterfs/indices/xattrop/1975672d-7c29-4ec5-8f04-230c955e5ec6: Not able to add to index [Too many links] Stat of the only base entry in /glusterfs/www/.glusterfs/indices/xattrop: root@haproxy-01 /glusterfs/www/.glusterfs/indices/xattrop $ stat 00006344-01fb-43a4-89ff-ac7a3e7643ee File: «00006344-01fb-43a4-89ff-ac7a3e7643ee» Size: 0 Blocks: 0 IO Block: 4096 пустой обычный файл Device: 701h/1793d Inode: 4274770 Links: 43288 Access: (0000/----------) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2017-03-03 22:27:46.528950099 +0300 Modify: 2017-03-04 00:17:37.071081609 +0300 Change: 2017-03-04 09:46:38.587743069 +0300
My bad. I am performing a rsync compare right now, so "Links: 43288" is wrong. It was "Links: 1" this morning, when everything was idle.
Looks like self-heal suddenly decided to resume after being stalled for around 12 hours right when I decided to do a rsync compare of both bricks for missing files.
Not being clear enough, sorry: 1. I recreated to brick 12 hours ago and it started to fill up. 2. It stopped filling up at around 25 Gb. 3. Today (after 12 hours) I decided to do a rsync compare to see what files are missing. 4. Self-heal suddenly resumed and now my brick is filling up again.
Self-healing suddenly stopped again. Here is what I have in /var/log/glusterfs/glustershd.log 2017-03-04 07:48:18.938492] I [rpc-clnt.c:1964:rpc_clnt_reconfig] 0-www-client-2: changing port to 49153 (from 0) [2017-03-04 07:48:18.947048] E [socket.c:2310:socket_connect_finish] 0-www-client-2: connection to 10.0.0.61:49153 failed (Connection refused) [2017-03-04 07:48:22.978300] I [rpc-clnt.c:1964:rpc_clnt_reconfig] 0-www-client-2: changing port to 49153 (from 0) [2017-03-04 07:48:22.984621] E [socket.c:2310:socket_connect_finish] 0-www-client-2: connection to 10.0.0.61:49153 failed (Connection refused) [2017-03-04 07:48:26.972125] I [rpc-clnt.c:1964:rpc_clnt_reconfig] 0-www-client-2: changing port to 49153 (from 0) [2017-03-04 07:48:26.976995] E [socket.c:2310:socket_connect_finish] 0-www-client-2: connection to 10.0.0.61:49153 failed (Connection refused)
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.