Bug 762609 (GLUSTER-877)

Summary: Out of memory, self heal troubles, random errors and crashes with 3.0.3
Product: [Community] GlusterFS Reporter: Dushyanth Harinath <dushyanth.h>
Component: replicateAssignee: Pavan Vilas Sondur <pavan>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: 3.0.3CC: amarts, gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
dovecot-plain.conf
none
1-lsof-gluster6@directi.com.txt
none
2-lsof-gluster6@directi.com.txt
none
lsof-files-open-now.txt
none
postal binary
none
dovecot-2.0.beta5-1.x86_64.rpm
none
logs.tar.gz.bz2 none

Description Dushyanth Harinath 2010-04-30 16:13:40 UTC
Hey guys,

We are using gluster replicate on two nodes. Gluster client & server are both on the same nodes. Each server has four disks - 2 SATA's and 2 SSD's.

Environment and config is as mentioned in Bug 727 (http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=727)

Below is a brief summary of what happened. Eventually, we had to stop gluster and use the disks directly as dovecot lmtp delivery process were getting hung during index rebuilds etc. The dovecot lmtp delivery process uses flock. 

We are preparing to reproduce this behavior locally. But i think there are some serious issues here particular to our type of usage. Let mw know if any particular details are required.

The memory used by each gluster client on node1 was as below

# node1

top - 06:54:02 up 23 days, 19:10,  9 users,  load average: 0.82, 1.18, 1.30
Tasks: 295 total,   2 running, 293 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.6%us,  1.1%sy,  0.0%ni, 97.7%id,  0.2%wa,  0.1%hi,  0.2%si,  0.0%st
Mem:  12317460k total, 12203364k used,   114096k free,  1475964k buffers
Swap:  4192956k total,   277832k used,  3915124k free,  2664444k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3907 root      20   0 1108m 915m 1040 R 107.1  7.6   1979:13 /usr/sbin/glusterf
27474 root      20   0 3300m 3.0g 1292 S 19.5 25.6 492:38.71 /usr/sbin/glusterfs
27539 root      20   0  157m 1272 1052 S  0.0  0.0   0:01.98 /usr/sbin/glusterfs
27620 root      20   0 2641m 2.4g 1252 S  0.0 20.3  46:05.56 /usr/sbin/glusterfs
27685 root      20   0  223m 1724 1232 S  0.0  0.0   0:13.22 /usr/sbin/glusterfs

1. We had linux memory over commit turned off (vm.over_commit = 2), and iocache size allocated per gluster client was 2.1 GB. We started seeing OOM's on other apps and gluster cos all gluster clients used up lots of memory.

[2010-04-26 05:41:16] W [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 254391963: GETATTR 145184592 (fuse_loc_fill() fail
ed)
[2010-04-26 05:41:36] W [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 254434221: GETATTR 145184592 (fuse_loc_fill() fail
ed)
[2010-04-26 06:54:35] E [fuse-bridge.c:3107:fuse_thread_proc] fuse: Out of memory

2. Post this we restored linux memory overcommit to defaults9 (vm.over_commit = 0) on both nodes and Stopped all applications accessing gluster mountpoints. On mounting the gluster mountpoints and node1 (the node which had OOM issues) we were seeing the below errors

+------------------------------------------------------------------------------+
[2010-04-26 07:19:46] N [glusterfsd.c:1396:main] glusterfs: Successfully started
[2010-04-26 07:19:46] N [client-protocol.c:6246:client_setvolume_cbk] node1-ssd1-brick: Connected to 172.16.1.12:6996, attached to remote volume 'ssd1-brick'.
[2010-04-26 07:19:46] N [afr.c:2627:notify] ssd1-mirror-1: Subvolume 'node1-ssd1-brick' came back up; going online.
[2010-04-26 07:19:46] N [client-protocol.c:6246:client_setvolume_cbk] node1-ssd1-brick: Connected to 172.16.1.12:6996, attached to remote volume 'ssd1-brick'.
[2010-04-26 07:19:46] N [afr.c:2627:notify] ssd1-mirror-1: Subvolume 'node1-ssd1-brick' came back up; going online.
[2010-04-26 07:19:46] N [fuse-bridge.c:2942:fuse_init] glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13
[2010-04-26 07:19:46] N [client-protocol.c:6246:client_setvolume_cbk] node2-ssd1-brick: Connected to 172.16.1.13:6996, attached to remote volume 'ssd1-brick'.
[2010-04-26 07:19:46] N [client-protocol.c:6246:client_setvolume_cbk] node2-ssd1-brick: Connected to 172.16.1.13:6996, attached to remote volume 'ssd1-brick'.
[2010-04-26 07:21:58] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse: 3: LOOKUP() / => -1 (Stale NFS file handle)
[2010-04-26 07:22:11] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse: 4: LOOKUP() / => -1 (Stale NFS file handle)
[2010-04-26 07:22:33] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse: 6: LOOKUP() / => -1 (Stale NFS file handle)
[2010-04-26 07:23:20] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse: 8: LOOKUP() / => -1 (Stale NFS file handle)
[2010-04-26 07:23:20] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse: 9: LOOKUP() / => -1 (Stale NFS file handle)
[2010-04-26 07:23:20] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse: 10: LOOKUP() / => -1 (Stale NFS file handle)
[2010-04-26 07:25:51] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse: 14: LOOKUP() / => -1 (Stale NFS file handle)

3. We had to reboot node1 and start gluster and mounted all gluster clients and started our services.

4. After few hours, node1 mountpoints were unmounted and remounted (not sure why)

# node1
[2010-04-26 11:25:27] E [client-protocol.c:313:call_bail] node1-ssd1-brick: bailing out frame LK(26) frame sent = 2010-04-26 10:55:26. frame-timeout = 1800
[2010-04-26 11:25:27] E [client-protocol.c:313:call_bail] node1-ssd1-brick: bailing out frame LK(26) frame sent = 2010-04-26 10:55:26. frame-timeout = 1800
[2010-04-26 11:25:27] E [client-protocol.c:313:call_bail] node1-ssd1-brick: bailing out frame LK(26) frame sent = 2010-04-26 10:55:26. frame-timeout = 1800
[2010-04-26 11:25:37] E [client-protocol.c:313:call_bail] node1-ssd1-brick: bailing out frame LK(26) frame sent = 2010-04-26 10:55:33. frame-timeout = 1800
[2010-04-26 11:55:29] E [client-protocol.c:313:call_bail] node2-ssd1-brick: bailing out frame LK(26) frame sent = 2010-04-26 11:25:27. frame-timeout = 1800
[2010-04-26 11:55:29] W [fuse-bridge.c:2845:fuse_setlk_cbk] glusterfs-fuse: 22694143: ERR => -1 (Transport endpoint is not connected)
connected)
[2010-04-26 11:55:29] E [client-protocol.c:313:call_bail] node2-ssd1-brick: bailing out frame LK(26) frame sent = 2010-04-26 11:25:27. frame-timeout = 1800
[2010-04-26 11:55:29] W [fuse-bridge.c:2845:fuse_setlk_cbk] glusterfs-fuse: 22694838: ERR => -1 (Transport endpoint is notconnected)

# node2

[2010-04-26 11:35:30] E [client-protocol.c:313:call_bail] node2-ssd2-brick: bailing out frame LK(26) frame sent = 2010-04-26 11:05:29. frame-timeout = 1800
[2010-04-26 11:35:30] E [client-protocol.c:313:call_bail] node2-ssd2-brick: bailing out frame LK(26) frame sent = 2010-04-26 11:05:29. frame-timeout = 1800
[2010-04-26 11:35:50] E [client-protocol.c:313:call_bail] node2-ssd2-brick: bailing out frame LK(26) frame sent = 2010-04-26 11:05:43. frame-timeout = 1800
[2010-04-26 11:35:50] E [client-protocol.c:313:call_bail] node2-ssd2-brick: bailing out frame LK(26) frame sent = 2010-04-26 11:05:43. frame-timeout = 1800
[2010-04-26 12:04:24] E [saved-frames.c:165:saved_frames_unwind] node1-ssd2-brick: forced unwinding frame type(1) op(LK)
[2010-04-26 12:04:24] W [fuse-bridge.c:2845:fuse_setlk_cbk] glusterfs-fuse: 24591680: ERR => -1 (Transport endpoint is notconnected)
[2010-04-26 12:04:24] E [saved-frames.c:165:saved_frames_unwind] node1-ssd2-brick: forced unwinding frame type(1) op(LK)
[2010-04-26 12:04:24] W [fuse-bridge.c:2845:fuse_setlk_cbk] glusterfs-fuse: 24590767: ERR => -1 (Transport endpoint is notconnected)

5. Meanwhile, node2 had below such errors

[2010-04-26 12:04:24] E [saved-frames.c:165:saved_frames_unwind] node1-ssd2-brick: forced unwinding frame type(1) op(LK)
[2010-04-26 12:04:24] W [fuse-bridge.c:2845:fuse_setlk_cbk] glusterfs-fuse: 27800369: ERR => -1 (Transport endpoint is notconnected)
[2010-04-26 12:04:24] E [saved-frames.c:165:saved_frames_unwind] node1-ssd2-brick: forced unwinding frame type(1) op(FSTAT)
[2010-04-26 12:04:24] E [saved-frames.c:165:saved_frames_unwind] node1-ssd2-brick: forced unwinding frame type(1) op(LK)
[2010-04-26 12:04:24] N [client-protocol.c:6994:notify] node1-ssd2-brick: disconnected
[2010-04-26 12:04:24] E [socket.c:762:socket_connect_finish] node1-ssd2-brick: connection to 172.16.2.12:6997 failed (Connection refused)
[2010-04-26 12:04:24] W [client-protocol.c:6518:protocol_client_interpret] node2-ssd2-brick: no frame for callid=20493699 type=4 op=26
[2010-04-26 12:04:24] W [client-protocol.c:6518:protocol_client_interpret] node2-ssd2-brick: no frame for callid=20494639 type=4 op=26
[2010-04-26 12:04:24] W [client-protocol.c:6518:protocol_client_interpret] node2-ssd2-brick: no frame for callid=20810240 type=4 op=26
[2010-04-26 12:04:27] E [socket.c:762:socket_connect_finish] node1-ssd2-brick: connection to 172.16.2.12:6997 failed (Connection refused)
[2010-04-26 12:04:24] E [socket.c:762:socket_connect_finish] node1-ssd2-brick: connection to 172.16.2.12:6997 failed (Connection refused)
[2010-04-26 12:04:24] W [client-protocol.c:6518:protocol_client_interpret] node2-ssd2-brick: no frame for callid=20493699 type=4 op=26
[2010-04-26 12:04:24] W [client-protocol.c:6518:protocol_client_interpret] node2-ssd2-brick: no frame for callid=20494639 type=4 op=26
[2010-04-26 12:04:24] W [client-protocol.c:6518:protocol_client_interpret] node2-ssd2-brick: no frame for callid=20810240 type=4 op=26
[2010-04-26 12:04:27] E [socket.c:762:socket_connect_finish] node1-ssd2-brick: connection to 172.16.2.12:6997 failed (Connection refused)
[2010-04-26 12:06:54] N [client-protocol.c:6246:client_setvolume_cbk] node1-ssd2-brick: Connected to 172.16.2.12:6997, attached to remote volume 'ssd2-brick'.
[2010-04-26 12:06:55] N [client-protocol.c:6246:client_setvolume_cbk] node1-ssd2-brick: Connected to 172.16.2.12:6997, attached to remote volume 'ssd2-brick'.
[2010-04-26 12:06:57] E [afr-self-heal-common.c:1214:sh_missing_entries_create] ssd2-mirror-1: no missing files - /n/ns/xxx.com/j/ja/jalgaon/189385/dbox/mailboxes/Drafts/dbox-Mails/dovecot.index.log. proceeding to metadata check
[2010-04-26 12:07:37] E [inode.c:687:__inode_link] fuse/inode: inode_link called with stbuf->st_ino = 1. inode=0/0parent=0/1 name=.landfill
[2010-04-26 12:07:43] W [fuse-bridge.c:491:fuse_entry_cbk] glusterfs-fuse: LOOKUP(/m/ma/xxxx.net/a/ar/articles/190707/dbox/mailboxes/INBOX/dbox-Mails/dovecot.index.log.2) inode (ptr=0x7f39943d3130, ino=13438078, gen=5455527146544610103) found  conflict (ptr=0x7f3a4c1d87a0, ino=13438078, gen=5455527146544610103)

6. Soon after this, node1 clients process crashed

[2010-04-26 13:34:50] E [client-protocol.c:313:call_bail] node1-ssd1-brick: bailing out frame LK(26) frame sent = 2010-04-26 13:04:40. frame-timeout = 1800
[2010-04-26 13:34:50] E [client-protocol.c:313:call_bail] node1-ssd1-brick: bailing out frame LK(26) frame sent = 2010-04-26 13:04:40. frame-timeout = 1800
[2010-04-26 13:34:50] E [client-protocol.c:313:call_bail] node1-ssd1-brick: bailing out frame LK(26) frame sent = 2010-04-26 13:04:40. frame-timeout = 1800
[2010-04-26 13:34:50] E [client-protocol.c:313:call_bail] node1-ssd1-brick: bailing out frame LK(26) frame sent = 2010-04-26 13:04:40. frame-timeout = 1800
[2010-04-26 13:34:50] E [client-protocol.c:313:call_bail] node1-ssd1-brick: bailing out frame LK(26) frame sent = 2010-04-26 13:04:40. frame-timeout = 1800
[2010-04-26 13:34:50] W [client-protocol.c:6518:protocol_client_interpret] node1-ssd1-brick: no frame for callid=5006701type=4 op=26
[2010-04-26 13:34:50] W [client-protocol.c:6518:protocol_client_interpret] node1-ssd1-brick: no frame for callid=1232876type=4 op=26
[2010-04-26 13:34:50] W [fuse-bridge.c:491:fuse_entry_cbk] glusterfs-fuse: LOOKUP(/n/ni/abc.com/a/ad/admin/203133/dbox/storage/m.4) inode (ptr=0x7f3c6113f7e0, ino=3048950, gen=5464416509071740469) found conflict (ptr=0x7f3c68f33e80, ino=3048950, gen=5464416509071740469)
pending frames:
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(STAT)
frame : type(1) op(WRITE)
frame : type(1) op(WRITE)
frame : type(1) op(FSTAT)
frame : type(1) op(FSTAT)
frame : type(1) op(LK)
frame : type(1) op(LK)
frame : type(1) op(LK)
....
patchset: v3.0.2-41-g029062c
signal received: 11
time of crash: 2010-04-26 13:34:50
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.0.3
/lib64/libc.so.6[0x31da2302d0]
/lib64/libc.so.6(index+0x50)[0x31da278fe0]
/lib64/libc.so.6(strrchr+0x2e)[0x31da279ade]
/usr/lib64/glusterfs/3.0.3/xlator/cluster/replicate.so(afr_up_down_flush_post_post_op+0x4d)[0x7f3c6e8aa20d]
/usr/lib64/glusterfs/3.0.3/xlator/cluster/replicate.so(afr_changelog_post_op_cbk+0x88)[0x7f3c6e8ac008]
/usr/lib64/glusterfs/3.0.3/xlator/protocol/client.so(client_fxattrop_cbk+0x117)[0x7f3c6ead8497]
/usr/lib64/glusterfs/3.0.3/xlator/protocol/client.so(protocol_client_pollin+0xca)[0x7f3c6ead324a]
/usr/lib64/glusterfs/3.0.3/xlator/protocol/client.so(notify+0x212)[0x7f3c6eada6c2]
/usr/lib64/libglusterfs.so.0(xlator_notify+0x43)[0x3db90133e3]
/usr/lib64/glusterfs/3.0.3/transport/socket.so(socket_event_handler+0xd3)[0x7f3c6dbd2173]
/usr/lib64/libglusterfs.so.0[0x3db902d045]
/usr/sbin/glusterfs(main+0xa28)[0x404268]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x31da21d994]
/usr/sbin/glusterfs[0x402749]


and node2 came up with below errors

[2010-04-26 13:01:27] W [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 36030253: GETATTR 139888234561760 (fuse_loc_fill() failed)
[2010-04-26 13:01:27] W [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 36030254: GETATTR 57618304 (fuse_loc_fill() failed)
[2010-04-26 13:01:27] W [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 36030255: GETATTR 139888278866464 (fuse_loc_fill() failed)
[2010-04-26 13:01:27] W [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 36030256: GETATTR 139885275510352 (fuse_loc_fill() failed)
[2010-04-26 13:01:27] W [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 36030257: GETATTR 139885275456496 (fuse_loc_fill() failed)
[2010-04-26 13:01:27] W [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 36030258: GETATTR 139888400472512 (fuse_loc_fill() failed)
[2010-04-26 13:01:27] W [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 36030259: GETATTR 58986832 (fuse_loc_fill() failed)
[2010-04-26 13:04:40] W [glusterfsd.c:953:cleanup_and_exit] glusterfs: shutting down
[2010-04-26 13:04:40] N [fuse-bridge.c:3532:fini] fuse: Unmounting '/gluster//home61'.

7. During all those self healing activities, dovecot delivery lmtp processes were getting stuck on the folder where conflict was found. Iamnot sure how a conflict can occur as node1's gluster ssd1/sata1 mountpoints are accessed only on node1 and node2's ssd2/sata2 mountpoints are accessed only on node2. The copy on other node is only for failover sake as we are still not very sure about the conditions in a active/active scenario.

8. We then tried upgrading to 3.0.4 and setting favourite-child option on each of the gluster vol files thinking that gluster would address the conflicts by replicating the latest copy of the conflicting files from favourite-child. Then again we started seeing the same errors and dovecots lmtp process hang when it tries to rebuild indexes on the files having conflicts. Eventually we had to stop using gluster as this was a production system.

Comment 1 Pavan Vilas Sondur 2010-05-07 02:35:50 UTC
Thanks for the detailed information you have provided. You obviously seem to be running into multiple problems -

* The crash you've reported has been seen before and the fix has been sent for review. It should be in the next 3.0.x release. See http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=803

* About the memory usage - Can you run glusterfs without io-cache and check the memory usage with the same use-case?

* I just needed a little more information on dovecot getting stuck trying to access the folder during self-heal:
  - It looks like lock requests have bailed out while one or more servers went down and came up. Is it possible to tell us when did the server go down, whether self-heal completed etc.
  - Does this readily happen? Does dovecot hang each time a 'down' server is brought up and self heal is in play?
  - Can you post the whole client/server logs, run in 'TRACE' loglevel
  - There is an option to the locks translator, 'option trace on' . Please turn  it on and post the log files when the issue is hit.

* Also, can you let us know in brief how you have setup dovecot with glusterfs, so that we'll try and reproduce the problems in-house.

Comment 2 Dushyanth Harinath 2010-05-10 06:17:43 UTC
Running additional tests you have requested - will keep you posted - Thx

Comment 3 Dushyanth Harinath 2010-05-10 11:56:50 UTC
I was successful in replicating this issue with dovecot. Here are the details + entire log files in TRACE mode and with trace option enabled in locks translator

1. We have two email accounts with home directory set to folders under glusterfs mountpoits. mail1 & mail2 are the two nodes and mail2 is passive.

/etc/glusterfs/glusterfs-home51.vol on /gluster/home51 type fuse.glusterfs (rw,allow_other,default_permissions,max_read=131072)
/etc/glusterfs/glusterfs-home61.vol on /gluster/home61 type fuse.glusterfs (rw,allow_other,default_permissions,max_read=131072)

2. First we tested by sending few mails manually to this accounts to make sure gluster works fine. I then ran a postal test to send continuous mails to these accounts. postal command line was `postal -p LMTP_PORT -t 1 -c 1 MAIL1_IP email.txt`. The file email.txt contains the below two rows

gluster5
gluster6

With both gluster severs running i wasn't seeing any errors. After a while, we unmounted the gluster disks and stopped gluster server on mail2. Note that mails are being delivered to mail1 only and mail2 is passive.

[2010-05-10 16:38:56] W [glusterfsd.c:965:cleanup_and_exit] glusterfs: shutting down

3. We then changed postal command line to below to attempt multiple mail deliveries at the same time - `postal -p LMTP_PORT -t 10 -c 1 mail1_IP email.txt`. This invokes 10 postal threads and delivers one email per connection. The test continues till its killed.

4. We immediately started seeing the below errors in dovecot logs

May 10 16:50:19 mail dovecot: lmtp(30603, gluster6): Error: Corrupted transaction log file /gluster/home61/glus
ter6/dbox/mailboxes/INBOX/dbox-Mails/dovecot.index.log seq 8: Unexpected garbage at EOF (sync_offset=99760)
May 10 16:50:19 mail dovecot: lmtp(30603, gluster6): Warning: fscking index file /gluster/home61/gluster6/dbox/
mailboxes/INBOX/dbox-Mails/dovecot.index
May 10 16:50:19 mail dovecot: lmtp(30596, gluster6): Warning: mdbox /gluster/home61/gluster6/dbox/storage: Inco
nsistency in map index (8,37032 != 8,37144)
May 10 16:50:19 mail dovecot: lmtp(30596, gluster6): Warning: mdbox /gluster/home61/gluster6/dbox/storage: rebu
ilding indexes
May 10 16:50:19 mail dovecot: lmtp(30598, gluster6): Error: Transaction log file /gluster/home61/gluster6/dbox/
mailboxes/INBOX/dbox-Mails/dovecot.index.log: marked corrupted
May 10 16:50:19 mail dovecot: lmtp(24892, gluster6): Error: open() failed with file /gluster/home61/gluster6/db
ox/mailboxes/INBOX/dbox-Mails/dovecot.index.log: Transport endpoint is not connected
May 10 16:50:19 mail dovecot: lmtp(24892, gluster6): 6qweBiDp50s8YQAATEXDjg: msgid=<4BE7EBEE.1FB.002.gluster5:d
irecti.com.pw> from=<gluster5@SYNTAX_ERROR> size=9465: save failed to INBOX: Internal error occurred. Refer
 to server log for more information. [2010-05-10 16:50:14]
May 10 16:50:19 mail dovecot: lmtp(24892, gluster6): Error: BUG: Saving failed to unknown storage

5. At this time, the errors on mail1 was as below. 

The lmtp processes which were delivering mail to this home dir got stuck after the first error we encountered above - refer to the attached lsof file.

[2010-05-10 16:40:02] T [socket.c:176:__socket_disconnect] mail2-home61-brick: shutdown() returned -1. set connection state
 to -1
[2010-05-10 16:40:02] T [client-protocol.c:6374:protocol_client_cleanup] mail2-home61-brick: cleaning up state in transport
 object 0x6151f0
[2010-05-10 16:40:02] T [socket.c:91:__socket_rwv] mail2-home61-brick: EOF from peer 172.16.2.206:6997
[2010-05-10 16:40:02] T [socket.c:580:__socket_proto_state_machine] mail2-home61-brick: read (Transport endpoint is not con
nected) in state 1 (172.16.2.206:6997)
[2010-05-10 16:40:02] T [client-protocol.c:6374:protocol_client_cleanup] mail2-home61-brick: cleaning up state in transport
 object 0x6151f0
[2010-05-10 16:40:02] T [fuse-bridge.c:702:fuse_attr_cbk] glusterfs-fuse: 399891: LOOKUP() / => 1
[2010-05-10 16:40:02] T [fuse-bridge.c:600:fuse_lookup] glusterfs-fuse: 399892: LOOKUP /gluster6(407699638)
[2010-05-10 16:40:02] T [fuse-bridge.c:468:fuse_entry_cbk] glusterfs-fuse: 399892: LOOKUP() /gluster6 => 407699638 (4076996
38)
[2010-05-10 16:40:02] T [fuse-bridge.c:600:fuse_lookup] glusterfs-fuse: 399893: LOOKUP /gluster6/dbox(407699640)
[2010-05-10 16:40:02] T [afr-self-heal-common.c:1557:afr_self_heal] home61-mirror-1: performing self heal on /gluster6/dbox
 (metadata=0 data=0 entry=1)

Why is it doing self-heal when the remote volume is still down ? Glusterfsd was started at the below time on node2

[2010-05-10 17:08:35] N [glusterfsd.c:1408:main] glusterfs: Successfully started

7. The self-heal also seems to be taking a lot of time 

# On mail1 - the amt of mails delivered to these mailboxes

# grep ^Received /gluster/home51/gluster5/dbox/storage/m.* | wc -l ; grep ^Received /gluster/home61/gluster6/dbox/storage/m.* | wc -l
3831
3653

# On mail2 - the amt of mails delivered to these mailboxes

# grep ^Received /gluster/home51/gluster5/dbox/storage/m.* | wc -l ; grep ^Received /gluster/home61/gluster6/dbox/storage/m.* | wc -l
3831
3277

When glusterfsd was started on mail2, gluster6 mailbox on mail had about 2.6K mails. Its been nearly 2 hrs since then and it seems to be catching up very slowly. 

The files open for the account gluster6 as of now (2 hrs after the mail2 volumes came up) is also attached in file - lsof-files-open-now.txt.

Iam attaching the below files 

* logs.tar.gz contains full logs with TRACE mode on both nodes
* lsof outputs
* Dovecot configuration file to replicate this issue in-house
* Dovecot RPM 
* Postal Binary - Works under Centos 5.X 64 Bit

Haven't got around to test the out of mem issue with iocache turned off yet. Will do and let you know.

Thanks
Dushyanth

Comment 4 Dushyanth Harinath 2010-05-10 11:57:44 UTC
Created attachment 193 [details]
md5sums

Comment 5 Dushyanth Harinath 2010-05-10 11:58:03 UTC
Created attachment 194 [details]
strace output for kudzu run that doesn't detect the modem

Comment 6 Dushyanth Harinath 2010-05-10 11:58:26 UTC
Created attachment 195 [details]
new patch for tmpwatch-2.2-1

Comment 7 Dushyanth Harinath 2010-05-10 11:58:51 UTC
Created attachment 196 [details]
patch that corrects that tupo...

Comment 8 Dushyanth Harinath 2010-05-10 11:59:27 UTC
Created attachment 197 [details]
cleanup diff for /etc/rc.d/init.d/functions

Comment 9 Dushyanth Harinath 2010-05-10 11:59:53 UTC
Comment on attachment 197 [details]
cleanup diff for /etc/rc.d/init.d/functions

Patched postal for LMTP delivery

Comment 10 Dushyanth Harinath 2010-05-10 12:02:07 UTC
Created attachment 198 [details]
strace of kudzu-0.38-1 not detectingmodem

Comment 11 Dushyanth Harinath 2010-05-10 12:11:34 UTC
Created attachment 199 [details]
add as patch5 to source rpm and rebuild

Comment 12 Dushyanth Harinath 2010-05-17 05:46:08 UTC
Hi,

Any updates on this issue ? Have you been able to test & verify this bug ?

Do let me know if you need any other information. Iam able to succesfully reproduce this issue everytime even without iocache, writebehind, readahead & stats prefetch translators.

I currently have only replicate translator enabled.

Thanks
Dushyanth

Comment 13 Pavan Vilas Sondur 2010-05-21 09:01:41 UTC
Hi Dushyanth,
Thank you for the detailed info. I am looking into the logs and isolating multiple bugs. We're using your dovecot configuration to reproduce in-house. Will update soon with the root causes of the problems seen.

Comment 14 Pavan Vilas Sondur 2010-05-23 12:02:15 UTC
>Haven't got around to test the out of mem issue with iocache turned off yet.
>Will do and let you know.

Apart from the other things you've reported, did you get a chance to run the tests without io-cache and specifically have you seen memory usage problems?

Comment 15 Dushyanth Harinath 2010-05-25 09:25:32 UTC
I haven't done a extensive test with iocache turned off. Will do that this week or early next week.

Comment 16 Dushyanth Harinath 2010-06-16 10:50:24 UTC
* testing without iocache & quickread translators does not show any OOM issues

Comment 17 Amar Tumballi 2010-10-05 06:01:15 UTC
Most of the self-heal (replicate related) bugs are now fixed with 3.1.0 branch. As we are just week behind the GA release time.. we would like you to test the particular bug in 3.1.0RC releases, and let us know if its fixed.

Comment 18 Amar Tumballi 2010-10-05 09:49:41 UTC
With 3.1 the code where the crash happened is removed.. Marking it as fixed for 3.1