Description of problem: The dovecot message indexes become corrupt and cause the imap executable that is part of the dovecot message to SEGV as can be seen in the following syslog output: May 31 15:57:53 waterloo dovecot: child 9438 (imap) killed with signal 11 May 31 15:57:53 waterloo kernel: imap[9438]: segfault at 8 ip 080a32fd sp bfaa5a80 error 4 in imap[8048000+95000] During this failure, I did not see this message, but there are several entries in syslog like May 29 23:40:47 waterloo dovecot: IMAP(crow): Corrupted index cache file /home/crow/.imapIndexes/.imap/INBOX/dovecot.index.cache: invalid record size Whatever is corrupt is only aggravated when I connect via my Samsung Blackjack running Windows Mobile 6.0. Using Thunderbird on Windows Vista, Windows XP, or Fedora 9 does not cause the problem. To recover, I can remove the dovecot indexes and things will work for a few days. My mail_location definition in /etc/dovecot.conf is mail_location = mbox:~/.imap/:INBOX=/var/spool/mail/%u:INDEX=~/.imapIndexes Version-Release number of selected component (if applicable): dovecot-1.0.13-6.fc9.i386 How reproducible: When it happens is unpredictable, but it will always eventually fail. Typically it takes 2-3 days. I cannot make the indexes become corrupt on demand, though. I've been running this same dovecot configuration in Fedora 7 for quite a while without issue. This just started happening when I upgraded to Fedora 9. Steps to Reproduce: 1. Use various imap clients to connect to dovecot. 2. Continue this for 2-3 days. 3. Notice that synchronization using the Windows Mobile imap client stops working due to a SEGV in the dovecot imap executable. Actual results: Synchronization fails. Expected results: Synchronization succeeds. Additional info: I used rawlog to see the IMAP protocol being used by the Windows Mobile client. The client performs a NAMESPACE command and several LIST commands to fetch the folder names, then for each folder selected for synchronization, it does a command like A67 SELECT "INBOX" A68 FETCH 1:166 (INTERNALDATE UID FLAGS RFC822.SIZE BODY.PEEK[HEADER.FIELDS (DATE FROM SUBJECT MESSAGE-ID CONTENT-TYPE X-MS-TNEF-Correlator CONTENT-CLASS IMPORTANCE PRIORITY X-PRIORITY)] BODYSTRUCTURE) This succeeds for one mailbox and then fails for my INBOX. It downloaded 151 of the 166 messages per the rawlog output log.
I'm a software developer in my day job, so I'd be happy to debug, but I wasn't sure the best way to insert a debugger into the executable chain since the imap executable is launched internally.
You should be able to attach the the debugger to the imap process that is processing your requests. For the first attempt you should let the system generate a core dump that can be analyzed after the crash. So you need to - enable coredump creating - edit /etc/init.d/dovecot and put somewhere at beginning a line with "DAEMON_COREFILE_LIMIT=unlimited", then restart the dovecot service, - install the debuginfo with "yum --enablerepo=fedora-debuginfo install dovecot-debuginfo". After the crash you should be able to run "gdb /usr/libexec/dovecot/imap /path/to/coredump/file". You should see the filename of the coredump in the logs.
I've been unable to get a core dump of the imap process. I set the core_pattern to make sure that the process would be able to write the core and validated that the ulimit was set as triggered by the DAEMON_COREFILE_LIMIT setting, but still no luck. A 'kill -ABRT <dovecotPID>' did generate a core file as expected for the dovecot process, so I think the configuration is correct, but no such luck for the child imap process. I did find the gdbhelper wrapper in the dovecot distribution and was able to get a stack trace. I will attach the gdbhelper output file to this bug.
Created attachment 308874 [details] Output from dovecot gdbhelper wrapper when used to wrap the imap executable.
Could you try to update dovecot to current stable version 1.0.14 with "yum --enablerepo=updates-testing,updates-testing-debuginfo update dovecot"? Just to be sure that this issue wasn't fixed there :-)
Moving to 1.0.14 caused the problem to go away, but when I went back to 1.0.13, it didn't happen any more either, so I'm not sure I've proved anything. Looking at the dovecot changelog for 1.0.14 (specifically the changeset documented at http://hg.dovecot.org/dovecot-1.0/rev/538f8892a2f1), it is likely that the problem is fixed. I'll run with 1.0.14 for a while (it typically takes 3-7 days for the index cache to get corrupted again) and see what happens. You can move this bug back to the "waiting on me" state. Thanks for your help.
Ok, let me know both the good and bad news :-)
So what are the results?
I'm still getting corrupted index cache files, but I have not received a segfault due to it since installing 1.0.14-8.fc9. Feel free to close this bug. Thanks!
OK, closing the bug. And you can open corrupted index cache files on the dovecot's mailing list to get into the direct contact with author.