449247 – SIGSEGV in dovecot when downloading messages due to corrup index

Bug 449247 - SIGSEGV in dovecot when downloading messages due to corrup index

Summary: SIGSEGV in dovecot when downloading messages due to corrup index

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	dovecot
Sub Component:
Version:	9
Hardware:	i386
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Dan Horák
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-05-31 21:38 UTC by David L. Crow
Modified:	2008-06-27 15:54 UTC (History)
CC List:	0 users
Fixed In Version:	1.0.14-8.fc9
Clone Of:
Environment:
Last Closed:	2008-06-27 15:54:02 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Output from dovecot gdbhelper wrapper when used to wrap the imap executable. (3.96 KB, text/plain) 2008-06-10 23:58 UTC, David L. Crow	no flags	Details
View All

Description David L. Crow 2008-05-31 21:38:12 UTC

Description of problem:

The dovecot message indexes become corrupt and cause the imap executable that is
part of the dovecot message to SEGV as can be seen in the following syslog output:

  May 31 15:57:53 waterloo dovecot: child 9438 (imap) killed with signal 11
  May 31 15:57:53 waterloo kernel: imap[9438]: segfault at 8 ip 080a32fd sp
bfaa5a80 error 4 in imap[8048000+95000]

During this failure, I did not see this message, but there are several entries
in syslog like

May 29 23:40:47 waterloo dovecot: IMAP(crow): Corrupted index cache file
/home/crow/.imapIndexes/.imap/INBOX/dovecot.index.cache: invalid record size

Whatever is corrupt is only aggravated when I connect via my Samsung Blackjack
running Windows Mobile 6.0.  Using Thunderbird on Windows Vista, Windows XP, or
Fedora 9 does not cause the problem.

To recover, I can remove the dovecot indexes and things will work for a few days.

My mail_location definition in /etc/dovecot.conf is

mail_location = mbox:~/.imap/:INBOX=/var/spool/mail/%u:INDEX=~/.imapIndexes

Version-Release number of selected component (if applicable):

dovecot-1.0.13-6.fc9.i386

How reproducible:

When it happens is unpredictable, but it will always eventually fail.  Typically
it takes 2-3 days.  I cannot make the indexes become corrupt on demand, though.

I've been running this same dovecot configuration in Fedora 7 for quite a while
without issue. This just started happening when I upgraded to Fedora 9.

Steps to Reproduce:
1. Use various imap clients to connect to dovecot.
2. Continue this for 2-3 days.
3. Notice that synchronization using the Windows Mobile imap client stops
working due to a SEGV in the dovecot imap executable.
  
Actual results:

Synchronization fails.

Expected results:

Synchronization succeeds.

Additional info:

I used rawlog to see the IMAP protocol being used by the Windows Mobile client.
 The client performs a NAMESPACE command and several LIST commands to fetch the
folder names, then for each folder selected for synchronization, it does a
command like

A67 SELECT "INBOX"
A68 FETCH 1:166 (INTERNALDATE UID FLAGS RFC822.SIZE BODY.PEEK[HEADER.FIELDS
(DATE FROM SUBJECT MESSAGE-ID CONTENT-TYPE X-MS-TNEF-Correlator CONTENT-CLASS
IMPORTANCE PRIORITY X-PRIORITY)] BODYSTRUCTURE)

This succeeds for one mailbox and then fails for my INBOX.  It downloaded 151 of
the 166 messages per the rawlog output log.

Comment 1 David L. Crow 2008-05-31 21:40:06 UTC

I'm a software developer in my day job, so I'd be happy to debug, but I wasn't
sure the best way to insert a debugger into the executable chain since the imap
executable is launched internally.

Comment 2 Dan Horák 2008-06-01 08:23:52 UTC

You should be able to attach the the debugger to the imap process that is
processing your requests. For the first attempt you should let the system
generate a core dump that can be analyzed after the crash.

So you need to
- enable coredump creating - edit /etc/init.d/dovecot and put somewhere at
beginning a line with "DAEMON_COREFILE_LIMIT=unlimited", then restart the
dovecot service,
- install the debuginfo with "yum --enablerepo=fedora-debuginfo install
dovecot-debuginfo".

After the crash you should be able to run "gdb /usr/libexec/dovecot/imap
/path/to/coredump/file". You should see the filename of the coredump in the logs.

Comment 3 David L. Crow 2008-06-10 23:57:27 UTC

I've been unable to get a core dump of the imap process.  I set the core_pattern
to make sure that the process would be able to write the core and validated that
the ulimit was set as triggered by the DAEMON_COREFILE_LIMIT setting, but still
no luck.

A 'kill -ABRT <dovecotPID>' did generate a core file as expected for the dovecot
process, so I think the configuration is correct, but no such luck for the child
imap process.

I did find the gdbhelper wrapper in the dovecot distribution and was able to get
a stack trace.  I will attach the gdbhelper output file to this bug.

Comment 4 David L. Crow 2008-06-10 23:58:41 UTC

Created attachment 308874 [details]
Output from dovecot gdbhelper wrapper when used to wrap the imap executable.

Comment 5 Dan Horák 2008-06-11 06:49:15 UTC

Could you try to update dovecot to current stable version 1.0.14 with "yum
--enablerepo=updates-testing,updates-testing-debuginfo update dovecot"? Just to
be sure that this issue wasn't fixed there :-)

Comment 6 David L. Crow 2008-06-11 15:04:47 UTC

Moving to 1.0.14 caused the problem to go away, but when I went back to 1.0.13,
it didn't happen any more either, so I'm not sure I've proved anything.

Looking at the dovecot changelog for 1.0.14 (specifically the changeset
documented at http://hg.dovecot.org/dovecot-1.0/rev/538f8892a2f1), it is likely
that the problem is fixed.

I'll run with 1.0.14 for a while (it typically takes 3-7 days for the index
cache to get corrupted again) and see what happens.  You can move this bug back
to the "waiting on me" state.

Thanks for your help.

Comment 7 Dan Horák 2008-06-11 15:28:19 UTC

Ok, let me know both the good and bad news :-)

Comment 8 Dan Horák 2008-06-27 08:43:27 UTC

So what are the results?

Comment 9 David L. Crow 2008-06-27 15:42:56 UTC

I'm still getting corrupted index cache files, but I have not received a
segfault due to it since installing 1.0.14-8.fc9.

Feel free to close this bug.  Thanks!

Comment 10 Dan Horák 2008-06-27 15:54:02 UTC

OK, closing the bug. And you can open corrupted index cache files on the
dovecot's mailing list to get into the direct contact with author.

Note You need to log in before you can comment on or make changes to this bug.