Bug 764409 (GLUSTER-2677) - Service is down when programs are running
Summary: Service is down when programs are running
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: GLUSTER-2677
Deadline: 2011-04-08
Product: GlusterFS
Classification: Community
Component: io-cache
Version: 3.1.2
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-04-05 22:27 UTC by YAN LUO
Modified: 2012-10-11 09:54 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-10-11 09:54:18 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
fix to race-conditions in io-cache (14.73 KB, application/octet-stream)
2011-06-02 02:55 UTC, Raghavendra G
no flags Details

Description YAN LUO 2011-04-05 22:27:19 UTC
Service is down when programs are running

System: Ubuntu 10.04 (Linux version 2.6.32-21-server (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 2010)
Mount type: Gluster Native Client

Our service was down when we were running several programs, one machine was running that would generate thousands files(maybe more), connected with the main sever(Samba) and saved data in gluster-vol. The machine would run at least one week, couple days ago, it was down. By the way, we also opened other sessions to do the data backup(sftp) and some computing. 

When I checked the gluster-vol, I got:
$df -ah
df: `/mnt/gluster-vol': Transport endpoint is not connected

Then I checked the mount point:
$ ls -la
ls: cannot access gluster-vol: Transport endpoint is not connected
d?????????  ? ?    ?       ?                ? gluster-vol

I also checked the peers from the main server, it couldn't see the peers, when I ssh to data storages that can see each other.
$ sudo gluster volume info all
No volumes present
$ sudo gluster peer status
No peers present

Then I checked the status
$ sudo /etc/init.d/glusterd status
 * glusterd service is not running.

After I restarted the services, unmount the gluster-vol, and mount it. It works. My questions is: Is there any restriction for the tasks loaded for glusterfs 3.1.2? For example, how many files can be accessed at the same time from gluster-vol? and how many programs can we running at the same time?

Please find the attached log file for your reference at that time.

Note: Someone has reported "I have been testing 3.1.2 over the last few days. My overall impression is that it resolved several bugs from 3.1.1, but the latest version is still prone to crashing under moderate to heavy loads. " (http://www.opensubscriber.com/message/gluster-users@gluster.org/15035860.html). Is it same issue?



--------Related content of log file--------
[2011-03-31 20:57:05.462233] I [afr-common.c:716:afr_lookup_done] gluster-vol-replicate-4: background  meta-data data entry self-heal triggered. path: /home/junzhu/Projects/Mito_Xing_03_23_11/data
[2011-03-31 20:57:05.465182] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] gluster-vol-replicate-7: background  meta-data data entry self-heal completed on /home/junzhu/Projects/Mito_Xing_03_23_11/data
[2011-03-31 20:57:05.465748] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] gluster-vol-replicate-8: background  meta-data data entry self-heal completed on /home/junzhu/Projects/Mito_Xing_03_23_11/data
[2011-03-31 20:57:05.465788] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] gluster-vol-replicate-6: background  meta-data data entry self-heal completed on /home/junzhu/Projects/Mito_Xing_03_23_11/data
[2011-03-31 20:57:05.466198] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] gluster-vol-replicate-0: background  meta-data data entry self-heal completed on /home/junzhu/Projects/Mito_Xing_03_23_11/data
[2011-03-31 20:57:05.466243] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] gluster-vol-replicate-2: background  meta-data data entry self-heal completed on /home/junzhu/Projects/Mito_Xing_03_23_11/data
[2011-03-31 20:57:05.496131] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] gluster-vol-replicate-4: background  meta-data data entry self-heal completed on /home/junzhu/Projects/Mito_Xing_03_23_11/data
[2011-03-31 20:57:05.529025] I [afr-common.c:662:afr_lookup_done] gluster-vol-replicate-8: entries are missing in lookup of /home/junzhu/Projects/Mito_Xing_03_23_11/data/read1.txt.
[2011-03-31 20:57:05.529097] I [afr-common.c:716:afr_lookup_done] gluster-vol-replicate-8: background  meta-data data entry self-heal triggered. path: /home/junzhu/Projects/Mito_Xing_03_23_11/data/read1.txt
[2011-03-31 20:57:05.535098] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] gluster-vol-replicate-8: background  meta-data data entry self-heal completed on /home/junzhu/Projects/Mito_Xing_03_23_11/data/read1.txt
pending frames:

frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(CREATE)
frame : type(1) op(CREATE)
frame : type(1) op(CREATE)
frame : type(1) op(CREATE)
frame : type(1) op(CREATE)
frame : type(1) op(READ)
frame : type(1) op(READ)

patchset: v3.1.1-64-gf2a067c
signal received: 6
time of crash: 2011-03-31 22:10:01
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.1.2
/lib/libc.so.6(+0x33af0)[0x7ff413899af0]
/lib/libc.so.6(gsignal+0x35)[0x7ff413899a75]
/lib/libc.so.6(abort+0x180)[0x7ff41389d5c0]
/lib/libc.so.6(__assert_fail+0xf1)[0x7ff413892941]
/lib/libpthread.so.0(pthread_mutex_lock+0x7b)[0x7ff413bf243b]
/usr/lib64/glusterfs/3.1.2/xlator/performance/io-cache.so(ioc_create_cbk+0xb0)[0x7ff4101d9540]
/usr/lib64/glusterfs/3.1.2/xlator/performance/read-ahead.so(ra_create_cbk+0x1ba)[0x7ff4103e4daa]
/usr/lib64/glusterfs/3.1.2/xlator/performance/write-behind.so(wb_create_cbk+0x10b)[0x7ff4105eee0b]
/usr/lib64/glusterfs/3.1.2/xlator/cluster/distribute.so(dht_create_cbk+0x2b8)[0x7ff410816b18]
/usr/lib64/glusterfs/3.1.2/xlator/cluster/replicate.so(afr_create_unwind+0x12b)[0x7ff410a2fc9b]
/usr/lib64/glusterfs/3.1.2/xlator/cluster/replicate.so(afr_create_wind_cbk+0x128)[0x7ff410a32d78]
/usr/lib64/glusterfs/3.1.2/xlator/protocol/client.so(client3_1_create_cbk+0x92f)[0x7ff410c939cf]
/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7ff41422dc15]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xc9)[0x7ff41422de69]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2d)[0x7ff41422902d]
/usr/lib64/glusterfs/3.1.2/rpc-transport/socket.so(socket_event_poll_in+0x34)[0x7ff411abd344]
/usr/lib64/glusterfs/3.1.2/rpc-transport/socket.so(socket_event_handler+0xb3)[0x7ff411abd413]
/usr/lib64/libglusterfs.so.0(+0x38592)[0x7ff41446d592]
/usr/sbin/glusterfs(main+0x247)[0x4055a7]
/lib/libc.so.6(__libc_start_main+0xfd)[0x7ff413884c4d]
/usr/sbin/glusterfs[0x4032b9]
---------
[2011-04-01 08:43:10.105965] W [io-stats.c:1644:init] gluster-vol: dangling volume. check volfile 
[2011-04-01 08:43:10.106106] W [dict.c:1205:data_to_str] dict: @data=(nil)
[2011-04-01 08:43:10.106142] W [dict.c:1205:data_to_str] dict: @data=(nil)
Given volfile:
+------------------------------------------------------------------------------+
  1: volume gluster-vol-client-0
  2:     type protocol/client
  3:     option remote-host dscbc-storage1
  4:     option remote-subvolume /mnt/gluster1
  5:     option transport-type tcp
  6: end-volume
  7: 
  8: volume gluster-vol-client-1
  9:     type protocol/client
 10:     option remote-host dscbc-storage2
 11:     option remote-subvolume /mnt/gluster1
 12:     option transport-type tcp
 13: end-volume
 14: 
 15: volume gluster-vol-client-2
 16:     type protocol/client
 17:     option remote-host dscbc-storage3
 18:     option remote-subvolume /mnt/gluster1
 19:     option transport-type tcp
 20: end-volume
 21: 
 22: volume gluster-vol-client-3
 23:     type protocol/client
 24:     option remote-host dscbc-storage4
 25:     option remote-subvolume /mnt/gluster1
 26:     option transport-type tcp
 27: end-volume
 28: 
 29: volume gluster-vol-client-4
 30:     type protocol/client
 31:     option remote-host dscbc-storage1
 32:     option remote-subvolume /mnt/gluster2
 33:     option transport-type tcp
 34: end-volume
 35: 
 36: volume gluster-vol-client-5
 37:     type protocol/client
 38:     option remote-host dscbc-storage2
 39:     option remote-subvolume /mnt/gluster2
 40:     option transport-type tcp
 41: end-volume
 42: 
 43: volume gluster-vol-client-6
 44:     type protocol/client
 45:     option remote-host dscbc-storage3
 46:     option remote-subvolume /mnt/gluster2
 47:     option transport-type tcp
 48: end-volume
 49: 
 50: volume gluster-vol-client-7
 51:     type protocol/client
 52:     option remote-host dscbc-storage4
 53:     option remote-subvolume /mnt/gluster2
 54:     option transport-type tcp
 55: end-volume
 56: 
 57: volume gluster-vol-client-8
 58:     type protocol/client
 59:     option remote-host dscbc-storage1
 60:     option remote-subvolume /mnt/gluster3
 61:     option transport-type tcp
 62: end-volume
 63: 
 64: volume gluster-vol-client-9
 65:     type protocol/client
 66:     option remote-host dscbc-storage2
 67:     option remote-subvolume /mnt/gluster3
 68:     option transport-type tcp
 69: end-volume
 70: 
 71: volume gluster-vol-client-10
 72:     type protocol/client
 73:     option remote-host dscbc-storage3
 74:     option remote-subvolume /mnt/gluster3
 75:     option transport-type tcp
 76: end-volume
 77: 
 78: volume gluster-vol-client-11
 79:     type protocol/client
 80:     option remote-host dscbc-storage4
 81:     option remote-subvolume /mnt/gluster3
 82:     option transport-type tcp
 83: end-volume
 84: 
 85: volume gluster-vol-client-12
 86:     type protocol/client
 87:     option remote-host dscbc-storage1
 88:     option remote-subvolume /mnt/gluster4
 89:     option transport-type tcp
 90: end-volume
 91: 
 92: volume gluster-vol-client-13
 93:     type protocol/client
 94:     option remote-host dscbc-storage2
 95:     option remote-subvolume /mnt/gluster4
 96:     option transport-type tcp
 97: end-volume
 98: 
 99: volume gluster-vol-client-14
100:     type protocol/client
101:     option remote-host dscbc-storage1
102:     option remote-subvolume /mnt/gluster5
103:     option transport-type tcp
104: end-volume
105: 
106: volume gluster-vol-client-15
107:     type protocol/client
108:     option remote-host dscbc-storage2
109:     option remote-subvolume /mnt/gluster5
110:     option transport-type tcp
111: end-volume
112: 
113: volume gluster-vol-client-16
114:     type protocol/client
115:     option remote-host dscbc-storage1
116:     option remote-subvolume /mnt/gluster6
117:     option transport-type tcp
118: end-volume
119: 
120: volume gluster-vol-client-17
121:     type protocol/client
122:     option remote-host dscbc-storage2
123:     option remote-subvolume /mnt/gluster6
124:     option transport-type tcp
125: end-volume
126: 
127: volume gluster-vol-replicate-0
128:     type cluster/replicate
129:     subvolumes gluster-vol-client-0 gluster-vol-client-1
130: end-volume
131: 
132: volume gluster-vol-replicate-1
133:     type cluster/replicate
134:     subvolumes gluster-vol-client-2 gluster-vol-client-3
135: end-volume
136: 
137: volume gluster-vol-replicate-2
138:     type cluster/replicate
139:     subvolumes gluster-vol-client-4 gluster-vol-client-5
140: end-volume
141: 
142: volume gluster-vol-replicate-3
143:     type cluster/replicate
144:     subvolumes gluster-vol-client-6 gluster-vol-client-7
145: end-volume
146: 
147: volume gluster-vol-replicate-4
148:     type cluster/replicate
149:     subvolumes gluster-vol-client-8 gluster-vol-client-9
150: end-volume
151: 
152: volume gluster-vol-replicate-5
153:     type cluster/replicate
154:     subvolumes gluster-vol-client-10 gluster-vol-client-11
155: end-volume
156: 
157: volume gluster-vol-replicate-6
158:     type cluster/replicate
159:     subvolumes gluster-vol-client-12 gluster-vol-client-13
160: end-volume
161: 
162: volume gluster-vol-replicate-7
163:     type cluster/replicate
164:     subvolumes gluster-vol-client-14 gluster-vol-client-15
165: end-volume
166: 
167: volume gluster-vol-replicate-8
168:     type cluster/replicate
169:     subvolumes gluster-vol-client-16 gluster-vol-client-17
170: end-volume
171: 
172: volume gluster-vol-dht
173:     type cluster/distribute
174:     subvolumes gluster-vol-replicate-0 gluster-vol-replicate-1 gluster-vol-replicate-2 gluster-vol-replicate-3 gluster-vol-replicate-4 gluster-vol-replicate-5 gluster-vol-replicate-6 gluster-vol-replicate-7 gluster-vol-replicate-8
175: end-volume
176: 
177: volume gluster-vol-write-behind
178:     type performance/write-behind
179:     subvolumes gluster-vol-dht
180: end-volume
181: 
182: volume gluster-vol-read-ahead
183:     type performance/read-ahead
184:     subvolumes gluster-vol-write-behind
185: end-volume
186: 
187: volume gluster-vol-io-cache
188:     type performance/io-cache
189:     subvolumes gluster-vol-read-ahead
190: end-volume
191: 
192: volume gluster-vol-quick-read
193:     type performance/quick-read
194:     subvolumes gluster-vol-io-cache
195: end-volume
196: 
197: volume gluster-vol-stat-prefetch
198:     type performance/stat-prefetch
199:     subvolumes gluster-vol-quick-read
200: end-volume
201: 
202: volume gluster-vol
203:     type debug/io-stats
204:     subvolumes gluster-vol-stat-prefetch
205: end-volume

+------------------------------------------------------------------------------+
[2011-04-01 08:43:14.48158] I [client-handshake.c:1005:select_server_supported_programs] gluster-vol-client-13: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-04-01 08:43:14.48607] I [client-handshake.c:1005:select_server_supported_programs] gluster-vol-client-17: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-04-01 08:43:14.49124] I [client-handshake.c:1005:select_server_supported_programs] gluster-vol-client-0: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-04-01 08:43:14.49581] I [client-handshake.c:841:client_setvolume_cbk] gluster-vol-client-13: Connected to 165.112.107.217:24012, attached to remote volume '/mnt/gluster4'.
[2011-04-01 08:43:14.49694] I [afr-common.c:2572:afr_notify] gluster-vol-replicate-6: Subvolume 'gluster-vol-client-13' came back up; going online.
[2011-04-01 08:43:14.49937] I [client-handshake.c:841:client_setvolume_cbk] gluster-vol-client-17: Connected to 165.112.107.217:24014, attached to remote volume '/mnt/gluster6'.
[2011-04-01 08:43:14.49981] I [afr-common.c:2572:afr_notify] gluster-vol-replicate-8: Subvolume 'gluster-vol-client-17' came back up; going online.
[2011-04-01 08:43:14.50182] I [client-handshake.c:841:client_setvolume_cbk] gluster-vol-client-0: Connected to 165.112.107.216:24009, attached to remote volume '/mnt/gluster1'.
[2011-04-01 08:43:14.50221] I [afr-common.c:2572:afr_notify] gluster-vol-replicate-0: Subvolume 'gluster-vol-client-0' came back up; going online.
[2011-04-01 08:43:14.50834] I [client-handshake.c:1005:select_server_supported_programs] gluster-vol-client-3: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-04-01 08:43:14.51365] I [client-handshake.c:1005:select_server_supported_programs] gluster-vol-client-8: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-04-01 08:43:14.51719] I [client-handshake.c:841:client_setvolume_cbk] gluster-vol-client-3: Connected to 165.112.107.214:24009, attached to remote volume '/mnt/gluster1'.
[2011-04-01 08:43:14.51791] I [afr-common.c:2572:afr_notify] gluster-vol-replicate-1: Subvolume 'gluster-vol-client-3' came back up; going online.
[2011-04-01 08:43:14.51970] I [client-handshake.c:1005:select_server_supported_programs] gluster-vol-client-10: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-04-01 08:43:14.52103] I [client-handshake.c:841:client_setvolume_cbk] gluster-vol-client-8: Connected to 165.112.107.216:24011, attached to remote volume '/mnt/gluster3'.
[2011-04-01 08:43:14.52140] I [afr-common.c:2572:afr_notify] gluster-vol-replicate-4: Subvolume 'gluster-vol-client-8' came back up; going online.
[2011-04-01 08:43:14.52264] I [client-handshake.c:1005:select_server_supported_programs] gluster-vol-client-14: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-04-01 08:43:14.52840] I [client-handshake.c:841:client_setvolume_cbk] gluster-vol-client-14: Connected to 165.112.107.216:24013, attached to remote volume '/mnt/gluster5'.
[2011-04-01 08:43:14.52948] I [afr-common.c:2572:afr_notify] gluster-vol-replicate-7: Subvolume 'gluster-vol-client-14' came back up; going online.
[2011-04-01 08:43:14.53177] I [client-handshake.c:841:client_setvolume_cbk] gluster-vol-client-10: Connected to 165.112.107.213:24011, attached to remote volume '/mnt/gluster3'.
[2011-04-01 08:43:14.53218] I [afr-common.c:2572:afr_notify] gluster-vol-replicate-5: Subvolume 'gluster-vol-client-10' came back up; going online.
[2011-04-01 08:43:14.53449] I [client-handshake.c:1005:select_server_supported_programs] gluster-vol-client-4: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-04-01 08:43:14.54084] I [client-handshake.c:841:client_setvolume_cbk] gluster-vol-client-4: Connected to 165.112.107.216:24010, attached to remote volume '/mnt/gluster2'.
[2011-04-01 08:43:14.54147] I [afr-common.c:2572:afr_notify] gluster-vol-replicate-2: Subvolume 'gluster-vol-client-4' came back up; going online.
[2011-04-01 08:43:14.54307] I [client-handshake.c:1005:select_server_supported_programs] gluster-vol-client-12: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-04-01 08:43:14.54487] I [client-handshake.c:1005:select_server_supported_programs] gluster-vol-client-16: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-04-01 08:43:14.54973] I [client-handshake.c:841:client_setvolume_cbk] gluster-vol-client-12: Connected to 165.112.107.216:24012, attached to remote volume '/mnt/gluster4'.
[2011-04-01 08:43:14.55087] I [client-handshake.c:1005:select_server_supported_programs] gluster-vol-client-11: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-04-01 08:43:14.55223] I [client-handshake.c:841:client_setvolume_cbk] gluster-vol-client-16: Connected to 165.112.107.216:24014, attached to remote volume '/mnt/gluster6'.
[2011-04-01 08:43:14.55797] I [client-handshake.c:1005:select_server_supported_programs] gluster-vol-client-2: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-04-01 08:43:14.56001] I [client-handshake.c:841:client_setvolume_cbk] gluster-vol-client-11: Connected to 165.112.107.214:24011, attached to remote volume '/mnt/gluster3'.
[2011-04-01 08:43:14.56299] I [client-handshake.c:1005:select_server_supported_programs] gluster-vol-client-6: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-04-01 08:43:14.56455] I [client-handshake.c:841:client_setvolume_cbk] gluster-vol-client-2: Connected to 165.112.107.213:24009, attached to remote volume '/mnt/gluster1'.
[2011-04-01 08:43:14.56732] I [client-handshake.c:1005:select_server_supported_programs] gluster-vol-client-7: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-04-01 08:43:14.56909] I [client-handshake.c:841:client_setvolume_cbk] gluster-vol-client-6: Connected to 165.112.107.213:24010, attached to remote volume '/mnt/gluster2'.
[2011-04-01 08:43:14.56943] I [afr-common.c:2572:afr_notify] gluster-vol-replicate-3: Subvolume 'gluster-vol-client-6' came back up; going online.
[2011-04-01 08:43:14.69629] I [fuse-bridge.c:2821:fuse_init] glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13
[2011-04-01 08:43:14.69745] I [client-handshake.c:841:client_setvolume_cbk] gluster-vol-client-7: Connected to 165.112.107.214:24010, attached to remote volume '/mnt/gluster2'.
[2011-04-01 08:43:14.69961] I [client-handshake.c:1005:select_server_supported_programs] gluster-vol-client-15: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2011-04-01 08:43:14.70467] I [client-handshake.c:841:client_setvolume_cbk] gluster-vol-client-15: Connected to 165.112.107.217:24013, attached to remote volume '/mnt/gluster5'.
[2011-04-01 08:43:14.70627] I [afr-common.c:819:afr_fresh_lookup_cbk] gluster-vol-replicate-0: added root inode
[2011-04-01 08:43:14.72159] I [afr-common.c:819:afr_fresh_lookup_cbk] gluster-vol-replicate-3: added root inode
[2011-04-01 08:43:14.72267] I [afr-common.c:819:afr_fresh_lookup_cbk] gluster-vol-replicate-2: added root inode
[2011-04-01 08:43:14.72350] I [afr-common.c:819:afr_fresh_lookup_cbk] gluster-vol-replicate-4: added root inode
[2011-04-01 08:43:14.72440] I [afr-common.c:819:afr_fresh_lookup_cbk] gluster-vol-replicate-1: added root inode
[2011-04-01 08:43:14.72551] I [afr-common.c:819:afr_fresh_lookup_cbk] gluster-vol-replicate-5: added root inode
[2011-04-01 08:43:14.72654] I [afr-common.c:819:afr_fresh_lookup_cbk] gluster-vol-replicate-7: added root inode
[2011-04-01 08:43:14.72712] I [afr-common.c:819:afr_fresh_lookup_cbk] gluster-vol-replicate-6: added root inode
[2011-04-01 08:43:14.72981] I [afr-common.c:819:afr_fresh_lookup_cbk] gluster-vol-replicate-8: added root inode
[2011-04-01 08:43:14.148489] I [afr-dir-read.c:171:afr_examine_dir_readdir_cbk] gluster-vol-replicate-0:  entry self-heal triggered. path: /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Thumbnail_Images/L005, reason: checksums of directory differ, forced merge option set
[2011-04-01 08:43:14.149294] I [afr-dir-read.c:171:afr_examine_dir_readdir_cbk] gluster-vol-replicate-2:  entry self-heal triggered. path: /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Thumbnail_Images/L005, reason: checksums of directory differ, forced merge option set
[2011-04-01 08:43:14.149411] I [afr-dir-read.c:171:afr_examine_dir_readdir_cbk] gluster-vol-replicate-4:  entry self-heal triggered. path: /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Thumbnail_Images/L005, reason: checksums of directory differ, forced merge option set
[2011-04-01 08:43:14.149529] E [afr-common.c:110:afr_set_split_brain] gluster-vol-replicate-0: invalid argument: inode
[2011-04-01 08:43:14.149563] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] gluster-vol-replicate-0: background  entry self-heal completed on /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Thumbnail_Images/L005
[2011-04-01 08:43:14.150346] E [afr-common.c:110:afr_set_split_brain] gluster-vol-replicate-2: invalid argument: inode
[2011-04-01 08:43:14.150409] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] gluster-vol-replicate-2: background  entry self-heal completed on /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Thumbnail_Images/L005
[2011-04-01 08:43:14.150474] E [afr-common.c:110:afr_set_split_brain] gluster-vol-replicate-4: invalid argument: inode
[2011-04-01 08:43:14.150503] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] gluster-vol-replicate-4: background  entry self-heal completed on /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Thumbnail_Images/L005
[2011-04-01 08:43:14.725449] I [afr-dir-read.c:171:afr_examine_dir_readdir_cbk] gluster-vol-replicate-0:  entry self-heal triggered. path: /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Thumbnail_Images/L005/C104.1, reason: checksums of directory differ, forced merge option set
[2011-04-01 08:43:14.725637] I [afr-dir-read.c:171:afr_examine_dir_readdir_cbk] gluster-vol-replicate-2:  entry self-heal triggered. path: /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Thumbnail_Images/L005/C104.1, reason: checksums of directory differ, forced merge option set
[2011-04-01 08:43:14.726038] I [afr-dir-read.c:171:afr_examine_dir_readdir_cbk] gluster-vol-replicate-4:  entry self-heal triggered. path: /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Thumbnail_Images/L005/C104.1, reason: checksums of directory differ, forced merge option set
[2011-04-01 08:43:14.726518] E [afr-common.c:110:afr_set_split_brain] gluster-vol-replicate-0: invalid argument: inode
[2011-04-01 08:43:14.726582] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] gluster-vol-replicate-0: background  entry self-heal completed on /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Thumbnail_Images/L005/C104.1
[2011-04-01 08:43:14.726641] E [afr-common.c:110:afr_set_split_brain] gluster-vol-replicate-2: invalid argument: inode
[2011-04-01 08:43:14.726670] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] gluster-vol-replicate-2: background  entry self-heal completed on /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Thumbnail_Images/L005/C104.1
[2011-04-01 08:43:14.727285] E [afr-common.c:110:afr_set_split_brain] gluster-vol-replicate-4: invalid argument: inode
[2011-04-01 08:43:14.727351] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] gluster-vol-replicate-4: background  entry self-heal completed on /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Thumbnail_Images/L005/C104.1
[2011-04-01 08:43:15.50797] I [afr-dir-read.c:171:afr_examine_dir_readdir_cbk] gluster-vol-replicate-2:  entry self-heal triggered. path: /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Data/Intensities/L005, reason: checksums of directory differ, forced merge option set
[2011-04-01 08:43:15.51193] I [afr-dir-read.c:171:afr_examine_dir_readdir_cbk] gluster-vol-replicate-4:  entry self-heal triggered. path: /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Data/Intensities/L005, reason: checksums of directory differ, forced merge option set
[2011-04-01 08:43:15.51713] I [afr-dir-read.c:171:afr_examine_dir_readdir_cbk] gluster-vol-replicate-0:  entry self-heal triggered. path: /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Data/Intensities/L005, reason: checksums of directory differ, forced merge option set
[2011-04-01 08:43:15.52627] E [afr-common.c:110:afr_set_split_brain] gluster-vol-replicate-2: invalid argument: inode
[2011-04-01 08:43:15.52690] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] gluster-vol-replicate-2: background  entry self-heal completed on /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Data/Intensities/L005
[2011-04-01 08:43:15.52753] E [afr-common.c:110:afr_set_split_brain] gluster-vol-replicate-4: invalid argument: inode
[2011-04-01 08:43:15.52781] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] gluster-vol-replicate-4: background  entry self-heal completed on /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Data/Intensities/L005
[2011-04-01 08:43:15.53061] E [afr-common.c:110:afr_set_split_brain] gluster-vol-replicate-0: invalid argument: inode
[2011-04-01 08:43:15.53123] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] gluster-vol-replicate-0: background  entry self-heal completed on /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Data/Intensities/L005
[2011-04-01 08:43:15.490907] I [afr-dir-read.c:171:afr_examine_dir_readdir_cbk] gluster-vol-replicate-0:  entry self-heal triggered. path: /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Data/Intensities/L005/C104.1, reason: checksums of directory differ, forced merge option set
[2011-04-01 08:43:15.491743] I [afr-dir-read.c:171:afr_examine_dir_readdir_cbk] gluster-vol-replicate-2:  entry self-heal triggered. path: /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Data/Intensities/L005/C104.1, reason: checksums of directory differ, forced merge option set
[2011-04-01 08:43:15.492202] I [afr-dir-read.c:171:afr_examine_dir_readdir_cbk] gluster-vol-replicate-4:  entry self-heal triggered. path: /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Data/Intensities/L005/C104.1, reason: checksums of directory differ, forced merge option set
[2011-04-01 08:43:15.492399] E [afr-common.c:110:afr_set_split_brain] gluster-vol-replicate-0: invalid argument: inode
[2011-04-01 08:43:15.492434] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] gluster-vol-replicate-0: background  entry self-heal completed on /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Data/Intensities/L005/C104.1
[2011-04-01 08:43:15.492789] E [afr-common.c:110:afr_set_split_brain] gluster-vol-replicate-2: invalid argument: inode
[2011-04-01 08:43:15.492850] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] gluster-vol-replicate-2: background  entry self-heal completed on /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Data/Intensities/L005/C104.1
[2011-04-01 08:43:15.493051] E [afr-common.c:110:afr_set_split_brain] gluster-vol-replicate-4: invalid argument: inode
[2011-04-01 08:43:15.493105] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] gluster-vol-replicate-4: background  entry self-heal completed on /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Data/Intensities/L005/C104.1
[2011-04-01 08:43:15.553051] I [afr-common.c:716:afr_lookup_done] gluster-vol-replicate-0: background  entry self-heal triggered. path: /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Thumbnail_Images/L005
[2011-04-01 08:43:15.553465] I [afr-common.c:716:afr_lookup_done] gluster-vol-replicate-4: background  entry self-heal triggered. path: /gluster/data/hiseq/110328_SN183_0222_BC008HABXX/Thumbnail_Images/L005
[2011-04-01 08:43:15.553640] I [afr-common.c:716:afr_lookup_done] gluster-vol-replicate-2: background  entry self-heal triggered. path: /

Comment 1 Raghavendra G 2011-06-02 02:54:05 UTC
Hi,

can you apply this patch, rebuild and check whether this fixes your issue? you can apply this patch by:

# cd <glusterfs-src>
# patch -p1 <         0001-performance-io-cache-hold-lock-on-ioc_inode-whereve.patch
# (make && sudo make install) > /dev/null

or if you've a git repository you can just do a:
# cd <glusterfs-src>
# git am 0001-performance-io-cache-hold-lock-on-ioc_inode-whereve.patch
# (make && sudo make install) > /dev/null

regards,        
Raghavendra.

Comment 2 Raghavendra G 2011-06-02 02:55:05 UTC
Created attachment 504

Comment 3 Amar Tumballi 2011-09-28 04:14:46 UTC
Hi Yan,

Can you try with latest version of GlusterFS and let us know if the issue persist? There are few of the race condition fixes which has made it to 3.2.x (latest), and 3.1.x(latest) releases.

Regards,

Comment 4 Amar Tumballi 2012-10-11 09:54:18 UTC
no update from last one year, hence closing the bug. Please reopen if seen again.


Note You need to log in before you can comment on or make changes to this bug.