Bug 235750 - dovecot LF bug
Summary: dovecot LF bug
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: dovecot
Version: 4.4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Dan Horák
QA Contact:
URL:
Whiteboard:
: 243168 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-04-09 22:20 UTC by Steevithak
Modified: 2008-06-10 08:43 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-06-10 08:43:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Steevithak 2007-04-09 22:20:17 UTC
Description of problem: dovecot occasionally generates a spurious linefeed at
the top of an mbox file. When this occurs all subsequent attempts by dovecot to
read or write to the mbox file fail with an error similar to this:

  Apr  9 16:37:01 vogon pop3(joe): Error indexing mbox file /var/mail/joe: LF
not found where expected

Once this occurs, the only apparent fix is to manually edit the mbox file to
remove the LF. On servers with only a few mailboxes this doesn't seem to crop up
much but on boxes with dozens or hundreds of heavily used mailboxes, the problem
seems to occur almost daily in one mailbox or another.  We run a couple of such
heavily loaded mail servers and managing this problem is becoming a signficant
time sink for us.

Googling reveals many references to this problem in dovecot on several distros.
The 1.0 version of dovecot appears to have fixed this bug but the fix has not
been backported to 0.99 for RHEL yet apparently.


Version-Release number of selected component (if applicable):

dovecot-0.99.11-4

How reproducible:

Not easily reproducible. It's unclear what set of circumstances trigger the
spurious line feed. However, it is possible to manually insert a linefeed to
reproduce the subsequent mbox read/write failures.

Steps to Reproduce:
1. Insert a linefeed at the top of an mbox file that contains mail
2. Try to send mail to the mbox or access the mbox via POP3
3.
  
Actual results:

dovecot logs an error and mbox access fails

Expected results:

dovecot should ignore or remove the spurious linefeed

Additional info:

Same problem affecting dovecot 0.99 on debian
 http://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg97324.html

Report of problem on dovecot list (a reply suggests that the cause is email
containing multilpe adjacent "From:" headers but this is incorrect)
 http://www.dovecot.org/list/dovecot/2005-June/007775.html

Report of problem occuring on Centos 4.2
 http://lists.centos.org/pipermail/centos/2006-February/060076.html

Report of problem occuring on Fedora Core 3
 http://www.linux.org.za/Lists-Archives/glug-tech-0604/msg00163.html

Report of problem occuring on Fedora Core 4
 http://www.fedoraforum.org/forum/archive/index.php/t-122656.html

Comment 1 Tomas Janousek 2007-06-08 08:48:58 UTC
*** Bug 243168 has been marked as a duplicate of this bug. ***

Comment 2 Havard Sorli 2007-06-11 10:16:24 UTC
Same problem occuring on Redhat Enterprise - 4.5
OS: redhat-release, release: 4ES,  CPU Arch: i686-redhat-linux



Comment 3 Tomas Janousek 2007-06-12 12:53:12 UTC
The 1.0 version is more or less rewritten, so backporting would basically mean
finding the problem, fixing it and then searching for a similar fix in the 1.0
branch :)

Since we don't know how to reproduce this, I have to honestly say that I don't
know what might be causing this and what the fix might be. It may be possible to
simply remove the spurious line feeds but that's quite ugly.

I looked at the code, looked at the changes in 0.99.14, looked at the history of
1.0 branch near the end of 0.99, and found nothing.

The only thing that crosses my mind is whether you are sure you don't have any
locking problem. Of course, it's probably a bug, I'd just like to be sure.

Comment 4 John Hinton 2007-06-12 14:36:54 UTC
I can't find it now (I ran across this during my hours and hours of Googling),
but there was a bug report regarding this issue listed with dovecot. It was
closed (or maybe not recorded?) since it was a .99 bug that was fixed in version
1.0. The comment about it being closed was supposedly by someone within the
dovecot project.

I've been having this problem for several years and only recently figured out
what the actual problem was. Up until now, I had been moving the mail from the
user, deleting the mbox for that user, recreating the mbox then moving the mail
back.

I am running 14 servers, it randomly happens on any one of them. There is a lot
to be found regarding this issue... little to be found on the cause up until
just recently.

It for sure is a bug and from what I've heard it disappears if you upgrade to
dovecot 1.0. I have not yet had an issue with dovecot 1.0 to date.. but the
randomness makes it hard to know. I can only assume since the bug report was
closed at dovecot with the release of 1.0, that 1.0 will not have this problem.

Most of my users are using some version of Outlook. I've seen reference to this
with regards to Outlook. Every time it has happened to me, it has been an
Outlook user. Is this just the odds or something that is limited to only Outlook
clients?

I believe this is happening during the previous retrieval of email. Apparently
something within that/those email(s) is leaving behind that LF. The trouble with
this of course is we don't have what caused the problem left behind to reproduce
it. There is simply the LF at the top of the mailfile.

Comment 5 Havard Sorli 2007-06-12 15:57:08 UTC
Locking issues
-----------------
Possible solution: mail group: (Do not work for me..)
http://www.dovecot.org/list/dovecot/2004-December/005703.html

http://wiki.dovecot.org/QuickConfiguration
# Grant access to these extra groups for mail processes. Typical use would be
# to give "mail" group write access to /var/mail to be able to create dotlocks.
mail_extra_groups = mail

I have changed this setting and restarted dovecot. The error is still there.

I have not tried this sesttin: pop3_lock_session=yes
Not sure if it's a .99 or 1.0 feature. (It's not in the default .99 config file)

Source: http://wiki.dovecot.org/POP3Server#uidl
Session locking
By default Dovecot allows multiple POP3 connections to the same mailbox. This is
(was?) especially useful for dialup connections which die in the middle of the
download, because the half-dead connections won't keep the mailbox locked.

Setting pop3_lock_session=yes makes Dovecot lock the mailbox for the whole
session. This is also what the POP3 RFC specifies that should be done. If
another connection comes while the mailbox is locked, Dovecot waits until the
locking times out (2 minutes with Maildir, mbox_lock_timeout with mbox). In
future there will be a separate pop3_lock_timeout setting which allows timing
out sooner.


"Catch" the error
-------------------
I have a user that get this error almost on a daily basis.
The client is MS Exchange POP3 connector on Small Business Server.

I could set up a monitor on the mbox file if someone give a hint how to do it.

Comment 6 Tomas Janousek 2007-06-13 11:15:58 UTC
By the locking issues I meant that something might be writing to the mbox file
and not using the same locking mechanism as dovecot.

And regarding to catching the error -- it'd be great if you can do that. I would
suggest making a backup of the mbox file every 5 minutes and catching the
POP3/IMAP connection using tcpdump or something similar (you would have to avoid
SSL though). That might lead to the exact sequence of commands needed to
reproduce the problem.

Comment 7 Havard Sorli 2007-06-20 18:17:22 UTC
Is this ok ?  Do you need other options ?
#tcpdump -X port 110

Comment 8 Tomas Janousek 2007-06-20 18:23:06 UTC
I think `tcpdump -n -w dump.pcap -s 0 port 110' would be better, as it logs to a
wireshark/tcpick-readable file and the packets would be stripped without the -s
option.

Comment 9 Tomas Janousek 2007-07-16 12:35:10 UTC
(this seems to be a duplicate of bug 178683 in fact, so please look whether you
can catch the error or tell me to close this as a duplicate -- this is really
something that might not be worth fixing in 0.99.xx dovecot)

Comment 10 Steevithak 2007-07-16 14:59:44 UTC
Yes, this bug and bug 178683 do appear to be duplicates. I'm not sure what you
mean by saying the bug is not being worth fixing, however. This is a signficant
problem for us and certainly seems like it's worth fixing to me. Or do you mean
that you could avoid fixing 0.99 by upgrading dovecot to a newer version? I'd be
happy with that. All I care about is that the problem stops occurring, but I
don't care whether that happens by fixing 0.99 or replacing 0.99 with a newer
version.

Comment 11 John Hinton 2007-07-16 15:47:00 UTC
Yes it does appear to be duplicated. I pray the resolution is not duplicated.

I understand the problem. As I understand it, dovecot 1.x uses a different
config file vs. the .99 versions, so a direct upgrade may not be an option. The
.99 versions were a beta program. I guess I have to ask why Redhat chose to
deliver a beta program when cyrus is so well developed. I do understand that
dovecot outperforms cyrus which is positive.

However, this does not help us with the problem of the random one in several
hundred email accounts which gets corrupted every couple of weeks. It's an
embarrassment and not quality service to our clients.

Multiply that by 1000s or ten of thousands of email accounts and the issue
becomes a headache. Trying to catch the problem randomly across that many email
accounts is no small task.

If this issue is not going to be resolved, what would you suggest we do? Switch
to cyrus? Manually upgrade to dovecot 1.0?

Comment 12 Havard Sorli 2007-07-17 07:51:28 UTC
I got this log entries when there is a problem with the user (anon.)

Jul 13 15:01:45 machine pop3-login: Login: usernamex [xx.xx.xx.xx]
Jul 13 15:01:46 machine dovecot: child 21936 (pop3) killed with signal 11



Comment 13 Tomas Janousek 2007-07-17 10:54:57 UTC
Hello Steven, John,
you're right, as I pointed out in bug 178683 cmt 19, we can't rebase to 1.0 in
RHEL4. I think the best workaround I can recommend is to take the dovecot 1.0.2
SRPM from Fedora and rpmbuild that. (the config format is different though)

I understand your disappointance when I said it might not be worth fixing, so
I'll make it clear. At first, I'm really sorry to having had to say it. The
problem is that:
 - the upstream decided to fix these issues by more or less completely rewriting
the relevant parts of the code, so backporting is impossible
 - neither you nor I have an idea where the problem is
 - neither you nor I know any concrete steps to reproduce it
So, if the effort we (and really we, since I already looked at it and found that
I wouldn't be able to fix it without your collaboration) will have to invest in
fixing this in 0.99.xx is bigger than the effort you would invest in an upgrade
to 1.0, then it's quite obvious, what I meant by "not being worth".

To put in in a nutshell, the mbox support in 0.99.xx is broken and we may not be
able to fix it. I hope you understand the situation.

John, as to your question 'why did we chose to deliver beta program':
dovecot was in Fedora since the very beginning and it supports, as opposed to
cyrus-imapd (as far as i know), mbox (not that well, as we found out, but...)
and Maildir, making the migration from uw-imapd easy and allowing one to use
clients like mutt/pine/mailx on the mailserver locally. There was no other
dovecot version than a 'beta' one and, well, we did deliver cyrus-imapd as well.

Comment 14 Tomas Janousek 2007-07-17 10:59:54 UTC
Havard,
could you get me a backtrace, using gdb, please?

(you'll have to set "mail_drop_priv_before_exec=yes" in the config file, put a
"ulimit -c unlimited" in the init script to allow dumping core and then use "gdb
/usr/libexec/dovecot/pop3 core.XXX" to examine the core dump. then, the "bt
full" command prints the backtrace. you'll also have to have the
dovecot-debuginfo package installed)

Comment 15 Steevithak 2007-07-17 16:21:27 UTC
Thanks for the info Tomas. Not to complain, but why did RHEL drop the stable and
working uw-imapd for something that doesn't work? We've been running large,
stable mail servers for years using Redhat products and never had problems like
this until you guys switched to dovecot. The whole point of EL is to be
relatively stable for enterprise use like large mail servers isn't it. :)

As for a solution, is there any way Redhat could offer either the working 1.0
version of dovecot or a working version of uw-imapd as an OPTION to RHEL users
without actually rebasing the default version of dovecot? I seem to recall that
Red Hat has done this before with certain programs where you could choose
between a couple of different versions, one being the official supported one and
the other an optional version. I think this was done with both Apache 1.3/2.0
and PHP 4/5 in the past. Any chance we could do that here? 

I've considered compiling my own version dovecot or imap from source or from a
Fedora SRPM but that greatly complicates maintenance because we can't rely on
the normal system update functions to notify us of newer versions.

And, one last question: what version of dovecot is included in RHEL 5? If v5
uses a working version of dovecot, it might make more sense for me to upgrade
our server OS rather than try to run RHEL 4.x with an unsupported version of
dovecot.


Comment 16 Tomas Janousek 2007-07-18 14:40:31 UTC
I think the newer apache/php is in the RH App. Stack, which is a separate
product. Seems there's no official way to get dovecot 1.0 for EL4. It would be
possible to set up a yum repo on my people.redhat.com page, though.

RHEL5 includes dovecot 1.0.rc15, which should not have this bug. I hope we'll
upgrade to 1.0.x series in RHEL5.2.

Comment 17 Steevithak 2007-07-18 16:49:20 UTC
Okay, I think we'll plan to migrate our mail servers to RHEL5 as it sounds like
that will be the best way to solve the problem for the long term. 

Comment 18 Dan Horák 2008-06-10 08:43:29 UTC
It is almost one year from last message, so I hope you have already successfully
upgraded to RHEL-5 or newer dovecot on RHEL-4. The situation with dovecot in
RHEL-4 will not change, so I am closing this bug.


Note You need to log in before you can comment on or make changes to this bug.