Bug 1428561 - gluster creates empty files on replicated brick
Summary: gluster creates empty files on replicated brick
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.8
Hardware: x86_64
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: Ravishankar N
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-02 20:09 UTC by Arkadiy Night
Modified: 2017-11-07 10:36 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-07 10:36:18 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Arkadiy Night 2017-03-02 20:09:55 UTC
Version-Release number of selected component (if applicable):
3.8.9

Description of problem:

I created a single-brick setup & copied all the files into it (29 Gb).
Then did add-brick with replica 2 on a second node and weird things started to happen. The replicated brick is smaller (23 Gb) than the original (29 Gb).

I did a rsync to compare the folders and found that some files are empty (0 bytes) and have wrong permissions.

original block:
root@haproxy-01 /glusterfs/www $  ls -l archive/nightparty/wwwold/images/movies/temp/989.121-15x10.jpg
-rw-r--r-- 2 www www 7897 Feb 29  2008 archive/nightparty/wwwold/images/movies/temp/989.121-15x10.jpg

replicated block:
root@haproxy-02 /glusterfs/www $  ls -l archive/nightparty/wwwold/images/movies/temp/989.121-15x10.jpg
-rw-r--r-- 2 root root 0 Mar  1 15:44 archive/nightparty/wwwold/images/movies/temp/989.121-15x10.jpg

Comment 1 Arkadiy Night 2017-03-02 20:10:56 UTC
brick*

Comment 2 Arkadiy Night 2017-03-04 05:39:35 UTC
I upgraded to 3.10.0 and recreated the replicated brick.

Now I have 25 Gb out of 29 Gb. And according to diff, some files are missing on the replicated block.

Comment 3 Arkadiy Night 2017-03-04 06:47:35 UTC
Last lines in /var/log/glusterfs/bricks/glusterfs-www.log on the original brick:


[2017-03-03 21:17:13.622882] E [MSGID: 138003] [index.c:610:index_link_to_base] 0-www-index: /glusterfs/www/.glusterfs/indices/xattrop/46b0994d-d3c1-4970-8958-de42b70541d9: Not able to add to index [Too many links]
[2017-03-03 21:17:13.632007] E [MSGID: 138003] [index.c:610:index_link_to_base] 0-www-index: /glusterfs/www/.glusterfs/indices/xattrop/1c39f3c4-ed13-45bd-a9b9-15354ebe448a: Not able to add to index [Too many links]
[2017-03-03 21:17:13.640919] E [MSGID: 138003] [index.c:610:index_link_to_base] 0-www-index: /glusterfs/www/.glusterfs/indices/xattrop/0ca82d30-f7d6-4478-abe7-8d5b2bb99c03: Not able to add to index [Too many links]
[2017-03-03 21:17:37.074396] E [MSGID: 138003] [index.c:610:index_link_to_base] 0-www-index: /glusterfs/www/.glusterfs/indices/xattrop/1975672d-7c29-4ec5-8f04-230c955e5ec6: Not able to add to index [Too many links]


Stat of the only base entry in /glusterfs/www/.glusterfs/indices/xattrop:

root@haproxy-01 /glusterfs/www/.glusterfs/indices/xattrop $  stat 00006344-01fb-43a4-89ff-ac7a3e7643ee
  File: «00006344-01fb-43a4-89ff-ac7a3e7643ee»
  Size: 0         	Blocks: 0          IO Block: 4096   пустой обычный файл
Device: 701h/1793d	Inode: 4274770     Links: 43288
Access: (0000/----------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2017-03-03 22:27:46.528950099 +0300
Modify: 2017-03-04 00:17:37.071081609 +0300
Change: 2017-03-04 09:46:38.587743069 +0300

Comment 4 Arkadiy Night 2017-03-04 06:48:44 UTC
My bad. I am performing a rsync compare right now, so "Links: 43288" is wrong. It was "Links: 1" this morning, when everything was idle.

Comment 5 Arkadiy Night 2017-03-04 07:07:31 UTC
Looks like self-heal suddenly decided to resume after being stalled for around 12 hours right when I decided to do a rsync compare of both bricks for missing files.

Comment 6 Arkadiy Night 2017-03-04 07:09:07 UTC
Not being clear enough, sorry:

1. I recreated to brick 12 hours ago and it started to fill up.
2. It stopped filling up at around 25 Gb.
3. Today (after 12 hours) I decided to do a rsync compare to see what files are missing.
4. Self-heal suddenly resumed and now my brick is filling up again.

Comment 7 Arkadiy Night 2017-03-04 07:49:23 UTC
Self-healing suddenly stopped again.

Here is what I have in  /var/log/glusterfs/glustershd.log

2017-03-04 07:48:18.938492] I [rpc-clnt.c:1964:rpc_clnt_reconfig] 0-www-client-2: changing port to 49153 (from 0)
[2017-03-04 07:48:18.947048] E [socket.c:2310:socket_connect_finish] 0-www-client-2: connection to 10.0.0.61:49153 failed (Connection refused)
[2017-03-04 07:48:22.978300] I [rpc-clnt.c:1964:rpc_clnt_reconfig] 0-www-client-2: changing port to 49153 (from 0)
[2017-03-04 07:48:22.984621] E [socket.c:2310:socket_connect_finish] 0-www-client-2: connection to 10.0.0.61:49153 failed (Connection refused)
[2017-03-04 07:48:26.972125] I [rpc-clnt.c:1964:rpc_clnt_reconfig] 0-www-client-2: changing port to 49153 (from 0)
[2017-03-04 07:48:26.976995] E [socket.c:2310:socket_connect_finish] 0-www-client-2: connection to 10.0.0.61:49153 failed (Connection refused)

Comment 8 Niels de Vos 2017-11-07 10:36:18 UTC
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.


Note You need to log in before you can comment on or make changes to this bug.