Description of problem: dovecot occasionally generates a spurious linefeed at the top of an mbox file. When this occurs all subsequent attempts by dovecot to read or write to the mbox file fail with an error similar to this: Apr 9 16:37:01 vogon pop3(joe): Error indexing mbox file /var/mail/joe: LF not found where expected Once this occurs, the only apparent fix is to manually edit the mbox file to remove the LF. On servers with only a few mailboxes this doesn't seem to crop up much but on boxes with dozens or hundreds of heavily used mailboxes, the problem seems to occur almost daily in one mailbox or another. We run a couple of such heavily loaded mail servers and managing this problem is becoming a signficant time sink for us. Googling reveals many references to this problem in dovecot on several distros. The 1.0 version of dovecot appears to have fixed this bug but the fix has not been backported to 0.99 for RHEL yet apparently. Version-Release number of selected component (if applicable): dovecot-0.99.11-4 How reproducible: Not easily reproducible. It's unclear what set of circumstances trigger the spurious line feed. However, it is possible to manually insert a linefeed to reproduce the subsequent mbox read/write failures. Steps to Reproduce: 1. Insert a linefeed at the top of an mbox file that contains mail 2. Try to send mail to the mbox or access the mbox via POP3 3. Actual results: dovecot logs an error and mbox access fails Expected results: dovecot should ignore or remove the spurious linefeed Additional info: Same problem affecting dovecot 0.99 on debian http://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg97324.html Report of problem on dovecot list (a reply suggests that the cause is email containing multilpe adjacent "From:" headers but this is incorrect) http://www.dovecot.org/list/dovecot/2005-June/007775.html Report of problem occuring on Centos 4.2 http://lists.centos.org/pipermail/centos/2006-February/060076.html Report of problem occuring on Fedora Core 3 http://www.linux.org.za/Lists-Archives/glug-tech-0604/msg00163.html Report of problem occuring on Fedora Core 4 http://www.fedoraforum.org/forum/archive/index.php/t-122656.html
*** Bug 243168 has been marked as a duplicate of this bug. ***
Same problem occuring on Redhat Enterprise - 4.5 OS: redhat-release, release: 4ES, CPU Arch: i686-redhat-linux
The 1.0 version is more or less rewritten, so backporting would basically mean finding the problem, fixing it and then searching for a similar fix in the 1.0 branch :) Since we don't know how to reproduce this, I have to honestly say that I don't know what might be causing this and what the fix might be. It may be possible to simply remove the spurious line feeds but that's quite ugly. I looked at the code, looked at the changes in 0.99.14, looked at the history of 1.0 branch near the end of 0.99, and found nothing. The only thing that crosses my mind is whether you are sure you don't have any locking problem. Of course, it's probably a bug, I'd just like to be sure.
I can't find it now (I ran across this during my hours and hours of Googling), but there was a bug report regarding this issue listed with dovecot. It was closed (or maybe not recorded?) since it was a .99 bug that was fixed in version 1.0. The comment about it being closed was supposedly by someone within the dovecot project. I've been having this problem for several years and only recently figured out what the actual problem was. Up until now, I had been moving the mail from the user, deleting the mbox for that user, recreating the mbox then moving the mail back. I am running 14 servers, it randomly happens on any one of them. There is a lot to be found regarding this issue... little to be found on the cause up until just recently. It for sure is a bug and from what I've heard it disappears if you upgrade to dovecot 1.0. I have not yet had an issue with dovecot 1.0 to date.. but the randomness makes it hard to know. I can only assume since the bug report was closed at dovecot with the release of 1.0, that 1.0 will not have this problem. Most of my users are using some version of Outlook. I've seen reference to this with regards to Outlook. Every time it has happened to me, it has been an Outlook user. Is this just the odds or something that is limited to only Outlook clients? I believe this is happening during the previous retrieval of email. Apparently something within that/those email(s) is leaving behind that LF. The trouble with this of course is we don't have what caused the problem left behind to reproduce it. There is simply the LF at the top of the mailfile.
Locking issues ----------------- Possible solution: mail group: (Do not work for me..) http://www.dovecot.org/list/dovecot/2004-December/005703.html http://wiki.dovecot.org/QuickConfiguration # Grant access to these extra groups for mail processes. Typical use would be # to give "mail" group write access to /var/mail to be able to create dotlocks. mail_extra_groups = mail I have changed this setting and restarted dovecot. The error is still there. I have not tried this sesttin: pop3_lock_session=yes Not sure if it's a .99 or 1.0 feature. (It's not in the default .99 config file) Source: http://wiki.dovecot.org/POP3Server#uidl Session locking By default Dovecot allows multiple POP3 connections to the same mailbox. This is (was?) especially useful for dialup connections which die in the middle of the download, because the half-dead connections won't keep the mailbox locked. Setting pop3_lock_session=yes makes Dovecot lock the mailbox for the whole session. This is also what the POP3 RFC specifies that should be done. If another connection comes while the mailbox is locked, Dovecot waits until the locking times out (2 minutes with Maildir, mbox_lock_timeout with mbox). In future there will be a separate pop3_lock_timeout setting which allows timing out sooner. "Catch" the error ------------------- I have a user that get this error almost on a daily basis. The client is MS Exchange POP3 connector on Small Business Server. I could set up a monitor on the mbox file if someone give a hint how to do it.
By the locking issues I meant that something might be writing to the mbox file and not using the same locking mechanism as dovecot. And regarding to catching the error -- it'd be great if you can do that. I would suggest making a backup of the mbox file every 5 minutes and catching the POP3/IMAP connection using tcpdump or something similar (you would have to avoid SSL though). That might lead to the exact sequence of commands needed to reproduce the problem.
Is this ok ? Do you need other options ? #tcpdump -X port 110
I think `tcpdump -n -w dump.pcap -s 0 port 110' would be better, as it logs to a wireshark/tcpick-readable file and the packets would be stripped without the -s option.
(this seems to be a duplicate of bug 178683 in fact, so please look whether you can catch the error or tell me to close this as a duplicate -- this is really something that might not be worth fixing in 0.99.xx dovecot)
Yes, this bug and bug 178683 do appear to be duplicates. I'm not sure what you mean by saying the bug is not being worth fixing, however. This is a signficant problem for us and certainly seems like it's worth fixing to me. Or do you mean that you could avoid fixing 0.99 by upgrading dovecot to a newer version? I'd be happy with that. All I care about is that the problem stops occurring, but I don't care whether that happens by fixing 0.99 or replacing 0.99 with a newer version.
Yes it does appear to be duplicated. I pray the resolution is not duplicated. I understand the problem. As I understand it, dovecot 1.x uses a different config file vs. the .99 versions, so a direct upgrade may not be an option. The .99 versions were a beta program. I guess I have to ask why Redhat chose to deliver a beta program when cyrus is so well developed. I do understand that dovecot outperforms cyrus which is positive. However, this does not help us with the problem of the random one in several hundred email accounts which gets corrupted every couple of weeks. It's an embarrassment and not quality service to our clients. Multiply that by 1000s or ten of thousands of email accounts and the issue becomes a headache. Trying to catch the problem randomly across that many email accounts is no small task. If this issue is not going to be resolved, what would you suggest we do? Switch to cyrus? Manually upgrade to dovecot 1.0?
I got this log entries when there is a problem with the user (anon.) Jul 13 15:01:45 machine pop3-login: Login: usernamex [xx.xx.xx.xx] Jul 13 15:01:46 machine dovecot: child 21936 (pop3) killed with signal 11
Hello Steven, John, you're right, as I pointed out in bug 178683 cmt 19, we can't rebase to 1.0 in RHEL4. I think the best workaround I can recommend is to take the dovecot 1.0.2 SRPM from Fedora and rpmbuild that. (the config format is different though) I understand your disappointance when I said it might not be worth fixing, so I'll make it clear. At first, I'm really sorry to having had to say it. The problem is that: - the upstream decided to fix these issues by more or less completely rewriting the relevant parts of the code, so backporting is impossible - neither you nor I have an idea where the problem is - neither you nor I know any concrete steps to reproduce it So, if the effort we (and really we, since I already looked at it and found that I wouldn't be able to fix it without your collaboration) will have to invest in fixing this in 0.99.xx is bigger than the effort you would invest in an upgrade to 1.0, then it's quite obvious, what I meant by "not being worth". To put in in a nutshell, the mbox support in 0.99.xx is broken and we may not be able to fix it. I hope you understand the situation. John, as to your question 'why did we chose to deliver beta program': dovecot was in Fedora since the very beginning and it supports, as opposed to cyrus-imapd (as far as i know), mbox (not that well, as we found out, but...) and Maildir, making the migration from uw-imapd easy and allowing one to use clients like mutt/pine/mailx on the mailserver locally. There was no other dovecot version than a 'beta' one and, well, we did deliver cyrus-imapd as well.
Havard, could you get me a backtrace, using gdb, please? (you'll have to set "mail_drop_priv_before_exec=yes" in the config file, put a "ulimit -c unlimited" in the init script to allow dumping core and then use "gdb /usr/libexec/dovecot/pop3 core.XXX" to examine the core dump. then, the "bt full" command prints the backtrace. you'll also have to have the dovecot-debuginfo package installed)
Thanks for the info Tomas. Not to complain, but why did RHEL drop the stable and working uw-imapd for something that doesn't work? We've been running large, stable mail servers for years using Redhat products and never had problems like this until you guys switched to dovecot. The whole point of EL is to be relatively stable for enterprise use like large mail servers isn't it. :) As for a solution, is there any way Redhat could offer either the working 1.0 version of dovecot or a working version of uw-imapd as an OPTION to RHEL users without actually rebasing the default version of dovecot? I seem to recall that Red Hat has done this before with certain programs where you could choose between a couple of different versions, one being the official supported one and the other an optional version. I think this was done with both Apache 1.3/2.0 and PHP 4/5 in the past. Any chance we could do that here? I've considered compiling my own version dovecot or imap from source or from a Fedora SRPM but that greatly complicates maintenance because we can't rely on the normal system update functions to notify us of newer versions. And, one last question: what version of dovecot is included in RHEL 5? If v5 uses a working version of dovecot, it might make more sense for me to upgrade our server OS rather than try to run RHEL 4.x with an unsupported version of dovecot.
I think the newer apache/php is in the RH App. Stack, which is a separate product. Seems there's no official way to get dovecot 1.0 for EL4. It would be possible to set up a yum repo on my people.redhat.com page, though. RHEL5 includes dovecot 1.0.rc15, which should not have this bug. I hope we'll upgrade to 1.0.x series in RHEL5.2.
Okay, I think we'll plan to migrate our mail servers to RHEL5 as it sounds like that will be the best way to solve the problem for the long term.
It is almost one year from last message, so I hope you have already successfully upgraded to RHEL-5 or newer dovecot on RHEL-4. The situation with dovecot in RHEL-4 will not change, so I am closing this bug.