Bug 162781

Summary: mbox corruption
Product: Red Hat Enterprise Linux 4 Reporter: daryl herzmann <akrherz>
Component: dovecotAssignee: John Dennis <jdennis>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: grios
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-07-22 19:06:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description daryl herzmann 2005-07-08 16:20:47 UTC
Description of problem:
Using either pine, Squirrelmail, or Eudora clients to my dovecot imap server, we
are experiencing frequent (5-10 per week) corruptions of the mbox files on the
server.  This corruption involves the loss of the first 7 characters from the
file and then these messages show up in syslog:

imap(akrherz): File isn't in mbox format: /home/akrherz/mail//INBOX.old

The fix is to open the file in vi and add "From xx" to the begining of the file.

Version-Release number of selected component (if applicable):
dovecot-0.99.11-2.EL4.1

How reproducible:
I can't seem to find something concrete.  Usually, deleting the first message in
the folder will trigger it, but not always.

Steps to Reproduce:
Try to read mail from a folder, leave the folder, and then watch the client dump
"Internal Errors"
  
Actual results:
corrupt mbox

Expected results:
happy mbox

Additional info:
With Squirrelmail, I have users complaining that they are having emails being
lost from the folder.  I can't prove this though.  The imap server was happily
running uw-imap on RHEL3 before my recent upgrade and thus migration to
troubles.  The user folder come from a NFS RHEL3 server.  The INBOX folder is
local on the server.  We have experienced corruption of both.

Comment 1 John Dennis 2005-07-08 20:09:13 UTC
There have been a number of bug fixes in the upstream release of dovecot since
the version in RHEL4. I believe some of the bug fixes include fixes for
corruption. The latest dovecot official release is 0.99.14 and is scheduled to
be included in RHEL4 Update 2. I suggest you try updating your dovecot release
with this latest rpm and see if it solves your problems, you may down load it
here in advance of the update:

ftp://people.redhat.com/jdennis/dovecot-0.99.14-1.EL4.i386.rpm

Please update this bugzilla with your results. Thank you.

Comment 2 daryl herzmann 2005-07-08 20:55:19 UTC
Hi John,

Wow!  Thanks for the fast response.  After I posted this bug, I found a 100%
repeatable mbox corruption with one of my folders.  Simply delete the first
message in the folder, expunge, and then corruption.

Having installed the advanced update, I am not able to repeat it.  So we are
back in business :)  I will update this bug if I find another repeatable
example, otherwise feel free to close it appropriately.

Thanks so much.  You really made my day!

daryl

Comment 3 daryl herzmann 2005-07-11 20:49:33 UTC
Hi,

Unfortunately, this bug is still biting us. :(  For one user, the same bug of
having the "From" part of the email chopped off appeared.  For me, it just
occured with having the last 3 lines of a previously deleted email show up at
the beginning of the mbox.

I am not much for you :(  Both of us were using pine today when this happened,
which I know is not supported anymore and perhaps is causing trouble?

I am now getting a new error in my /var/log/messages file on the server:

imap(akrherz): Error syncing mbox file /home/akrherz/mail//INBOX.old: LF not
found where expected

This cropped up with having a blank line as the first line in my mbox.  Oye,
something flaky is going on :(

sorry to be of little help debugging this,
  daryl

Comment 4 John Dennis 2005-07-11 21:27:18 UTC
I'm not a pine user so I'm not sure how pine can be configured to access a
user's mail, but I suspect it can be configured either to access the mbox
directly or via the IMAP protocol. The only corruption I'm aware of with dovecot
mbox's occurs when processes other than the MTA and dovecot manipulate the mbox
file. As indicated earlier procmail is a good example of this, but I suspect
pine falls into the same category. Can you configure pine such that it uses IMAP
and not direct file manipulation, this would restrict mbox edits to just dovecot
and the MTA and not other processes.

Also what file locking is pine configured to use? Have you changed dovecot's
file locking parameters?

Comment 5 daryl herzmann 2005-07-11 21:38:21 UTC
Sorry for the confusion.  We are using pine to access our folders and INBOX over
IMAP.  

I assume the pine locking question is not necessary with only IMAP involved...

I am using the default dovecot.conf file, with only changes for:

default_mail_env = mbox:~/mail/:INBOX=/var/spool/mail/%u

for dovecot locking, I am using 'fcntl'.

My ~/mail/[] folders come from a RHEL3 NFS server.  The /var/spool/mail/[] is
local on the RHEL4 ext3 partition with "defaults" used for mounting.

thanks for the continued help.... 

Comment 6 Genghis Rios 2005-07-15 01:40:08 UTC
I have the same problem, I'm usign the dovecot-0.99.14-1.EL4.i386.rpm package 
and I have "mbox_locks = dotlock" in the dovecot.conf file, but the problem 
persist, some users show the follow line in maillog:

file mbox-rewrite.c: line 429 (mbox_write_header): assertion failed: 
(hdr_parsed_size.physical_size == hdr_size)



 

Comment 7 John Dennis 2005-07-15 20:05:01 UTC
With reference to comment #6:

1) Is any other process accessing the mbox besides dovecot? (e.g. procmail or
some other MUA?)

2) Are your mboxes files local to the machine dovecot is running on or are they
NFS mounted?

You should set your locking to fcntl (mbox_locks = fcntl). As an aside flock is
NOT NFS safe.

Comment 8 John Dennis 2005-07-15 20:19:54 UTC
With reference to comment #5:

You are correct, if pine is using IMAP then pine locking is irrelevant.

I noticed that your mbox files are in your users home directories. Are the users
who are reporting corruption also running procmail?

To all who are seeing this problem: Can you tell me which MTA you're using and
what local delivery mechanism it's configured to use and what locking mechanism
the MTA's local delivery is using.

Comment 9 daryl herzmann 2005-07-15 20:32:27 UTC
Hi,

Thanks for the continued help.  I just tried to reproduce the corruption and was
able to again by just deleting the first message in the folder.  I have a folder
that this is somewhat frequent on :)

None of our users use procmail.

MTA: sendmail-8.13.1-2 (I am not sure about locking mechanism, it is stock RHEL4)

daryl

Comment 10 Genghis Rios 2005-07-16 01:02:31 UTC
(In reply to comment #7)
> With reference to comment #6:
> 1) Is any other process accessing the mbox besides dovecot? (e.g. procmail or
> some other MUA?)

Yes, my MTA (Postfix) call the procmail as LDA, procmail use dotlock (creates 
*.lock files).

> 2) Are your mboxes files local to the machine dovecot is running on or are 
they
> NFS mounted?

The mboxes are local, in mailbox format.

> You should set your locking to fcntl (mbox_locks = fcntl). As an aside flock 
> is NOT NFS safe.

In dovecot.conf says:

#  dotlock: Create <mailbox>.lock file. This is the oldest and most NFS-safe
#           solution. If you want to use /var/mail/ like directory, the users
#           will need write access to that directory.

I don't use NFS, but procmail use *.lock files and must be more safe that 
Dovecot use dotlock too if both, procmail and Dovecot, work together. I have 
near of 30,000 users, and only very few present the problem, random, generaly 
webmail clients with imap when they are deleting messages.

A last question, when the users download their messages by POP with the
"Leave in the Server" option, the messages continue as a new messages in the
Webmail that works with IMAP. Before, in RHEL 3 with the wu-imap server, the 
messages appear like open or read with the Webmail. I don't know is this is a 
bug or is a normal procedure with Dovecot, is possible to change it?

Thanks



Comment 11 John Dennis 2005-07-18 19:57:45 UTC
I have not reproduced this problem yet, Daryl your reproducer of deleting the
first message sounded ideal for reproducing, but so far I have yet to reproduce
the problem by the method you suggested. Would you please attach a copy of an
mbox file which if you delete the first message will corrupt the file. If the
mbox file has anything sensitive in it you do not want to appear in this
bugzilla you may email the file to me privately.

For those seeing problems with squirrelmail could you please attach a copy of
/etc/squirrelmail/config.php. In particular I'm curious if you've set your imap
server type to something and if so what.

Comment 12 John Dennis 2005-07-18 21:01:37 UTC
I think this may be related to bug #163550 and the mbox files have bad
content-length values in them.

Comment 13 John Dennis 2005-07-22 19:06:42 UTC
I have spent a lot of time looking at this as I view corruption as a serious
problem deserving attention. I regret to say I have not reproduced the
corruption problems reported. Daryl kindly sent me his mbox file privately and I
could not reproduce the corruption using the method he suggested. I used both
evolution and squirrelmail. This was tried on both RHEL4 and FC4. I also wrote a
utility to verify the content-length fields in his mbox were all valid and they
were.

I've also researched this topic on the dovecot mailing list, procmail sites,
etc. Here are some references you might find useful:

http://dovecot.org/pipermail/dovecot/2005-April/007139.html
http://dovecot.org/pipermail/dovecot/2005-February/006186.html
http://www.ii.com/internet/robots/procmail/qs/

I cannot find any problems with corruption with the 0.99.14 release and as such
I'm going to close this bug out because I don't think its a dovecot problem, but
I'm not suggesting folks aren't see corruption, but rather this is most likely
locking problems with other processes which dovecot can't be held responsible for.

Here are a collection of thoughs and observations you may want to take into
consideration:

* Don't use the mbox format. It never was a good design, its inefficient, it
suffers from multiple incompatible formats, and is susceptable to locking
corruption problems. If you can consider switching to maildir. 

Here are places you can look for help doing the conversion:

/usr/share/doc/dovecot-0.99.14/UW-to-Dovecot-Migration
http://www.gerg.ca/hacks/mb2md
http://home.uninet.ee/~ragnar/2md

* If you are using procmail make sure BOTH procmail and dovecot are using
dotlocks AND make sure the .lock files are being created in the same directory
AND make sure any procmail rule that writes the mbox file has locking enabled
(trailing colon on filter) AND make sure a lock file name is not specified in
the filter option see:

http://pm-doc.sourceforge.net/pm-tips-body.html#local_lockfile_usage

* Keep your dovecot indexes on a local disk
(see /usr/share/doc/dovecot-0.99.14/mail-storages.txt)

* I cannot explain why squirrelmail would be a culprit in mbox corruption. What
should be occuring in this scenario is squirrelmail will be creating a new imap
client connection to the SAME dovecot server which may already have one or more
other client connections open on the same mail storage. But the inter-client
locking in dovecot is known to be fine (to the best of my knowledge). The only
other likely process that might create mailbox locking corruption is an LDA, of
which procmail is the most likely suspect.

With respect to the question in comment #10 concerning the "\Seen" flag in
conjunction with pop3. Dovecot does not modify any of the per message flags when
the mail store is accessed via pop3, this includes the \Seen flag. There is no
way to change this behavior. Plus I believe the behavior is logically
consistent, even though the same mail store can be accessed via either pop3 or
imap4 those flags are a property of imap only, it would be inconsistent for pop3
to modify something which does not belong to it.


Comment 14 John Dennis 2005-07-22 19:58:05 UTC
One more item of interest, you probably will want to verify your dotlock files
are actually being created, especially if its in the spool directory. See bug
#143707 comment #9 for an explanation.

Comment 15 daryl herzmann 2005-07-22 22:29:25 UTC
Hi,

Thanks *so* much for looking at this issue and the very detailed summary.  We
migrated to maildir/postfix this evening and everybody seems to be happy.  Sorry
that this was sort of a wild goose chase.  For others reading, maildir is so
much faster :)

Thanks again!

daryl