Red Hat Bugzilla – Bug 163550
dovecot seems to parse mbox file incorrectly
Last modified: 2014-01-21 17:52:04 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050524 Fedora/1.0.4-4 Firefox/1.0.4
Description of problem:
I've got this mbox file that I access from evolution via dovecot. It's a simple mbox with three messages. Problem is that evolution/dovecot displays two messages with the middle one appeneded to the first one. If from a shell I look at the mbox file with 'mail -f mbox', it shows all three messages correctly. I'll attach said mbox file next.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. access this certain mbox file via an imap client and dovecot
Actual Results: Two messages instead of three
Expected Results: Three messages
I've seen this before when I was migrating from local evolution mbox files to an imap setup. This is just a simple case that should be good for debugging.
Created attachment 116892 [details]
mbox file with three messages
Thank you for attaching a reproducer. I've been trying to reproduce some of the
problems in bug # 162781 and I have not come up with a reproducer, this should help.
It's because the first message has Content-Length header that includes the whole
second message. Something has written wrong Content-Length header to it, and the
bug is really with it.. I don't think it's a bug to use that header if it's
there and it's valid (ie. it points exactly to the beginning of next From-line).
What timing! I had just figured out that it was the content-length that was the
Timo is there a way to configure dovecot not to use content-length? It's
As to who is writing it, I think its procmail, this may suggest a solution
(http://www.mhonarc.org/archive/html/procmail/1997-08/msg00691.html), not sure
100% yet, still investigating.
The first discussion is almost 10 years old. Strict From-line parsing works
nowadays very well. I have had to do only a couple of minor changes to the
detection in last 3 years.
So, Dovecot uses Content-Length header *IF* it points to beginning of next From_
line, otherwise it's ignored (and recreated). This works well even with bogus
Content-Length headers as it's very unlikely that it contains the message body's
+ next message's exact length.
I agree anyway that Dovecot should do >From line quoting for APPENDed messages,
and it's in my TODO. Should probably do dequoting as well. Just not that high
priority since the current way works pretty well too..
The reason why the attached mbox file got broken was because some software read
the mbox file, but for some reason didn't notice the second message's From_ line
and instead thought that it belonged to the first message. Then it wrote a
broken Content-Length header.
I suppose the above could have been done by Dovecot. I haven't heard that exact
problem before, but 0.99.x's mbox code isn't all that great..
No way to configure Dovecot to ignore Content-Length, but it'd be simple to
remove it from code.
FWIW, here is some history. I switched from using Evolution with local files to
Evolution with IMAP. I made a script that copied the local evo files to an imap
directory and I noticed that some of the mbox files had a few less messages in
them when they were read back in via Evo/IMAP. If from within Evo I copied all
the messages from a local mbox to a new imap mbox, I never had a problem. The
ones where I copied the mbox files outside of evolution sometimes had problems.
I don't know if Evolution wrote the bad content-length in the mbox files or if
an older version of dovecot wrote them upon initial import.
I'm closing this out as not a bug. The content-length header in the first
message is clearly wrong. The fact it points to the 3rd message means dovecot
can't detect its bogus. Based on comment #6 I suspect some other agent other
than dovecot incorrectly parsed the mbox file. I have not see any other examples
of dovecot mangling content-length.
FWIW you may want to consider using maildir instead of mbox, its much more
efficient and robust. Plus you won't have any problem with content-length. A
quick google will reveal many conversion maildir conversion tools.
So I tried removing the Content-Length: line from the affected file so that
dovecot could regenerate it and it seems dovecot regenerates it incorrectly.
(I'm assuming that evolution via imap using dovecot = dovecot generating the
Content-Length field. Is this correct?) I'll attach four files next. First is
the original mbox with incorrect Content-Length:, should be the same as the
original attachment to this bug. Second is the simple script used to remove the
Content-Length line. Third is the mbox with Content-Length: lines removed. Last
is the file after I start up evolution and the Content-Length: is regenerated by
dovecot (note assumption above.)
Created attachment 117912 [details]
original mbox with incorrect Content-Length:
Created attachment 117913 [details]
simple script to remove Content-Length:
Created attachment 117914 [details]
mbox with Content-Length: removed
Created attachment 117915 [details]
mbox after Evo/IMAP/Dovecot regenerates Content-Length:
mbox as parsed by mail:
wintermute> mail -f Tech\ Tips
Mail version 8.1 6/6/93. Type ? for help.
"Tech Tips": 3 messages
> 1 email@example.com. Fri Nov 6 14:52: 78/2421 "cid 1035147"
2 firstname.lastname@example.org Fri Feb 4 18:18: 103/4093 "Re: SCSI Question"
3 CrystalMail2@sgi.com Wed Oct 20 11:37 64/2797 "Regarding Case Number"
Looks like that mbox has a bit special From-line. Dovecot assumes that if the
day has only one digit, it's prefixed by either zero or an extra space. Your
mbox doesn't have either. Do you know what program generated the From-lines? I
can change the parser to accept the digit without either zero or space prefix,
but I haven't before seen mbox without either of them..
Going by dates, I would guess that the first mail I read with MediaMail on an
SGI IRIX machine, the second and third were probably read with Evolution.
P.S. I guess it's possible the second was with MediaMail too. When did Evolution
I'll throw my 2 cents in here, worth what you paid for it :-) To me this is a
classic example of why mbox is a less than wonderful choice of mail storage
format. There is no official standardization. The more pieces of software that
get a chance to munge it with their own notion of what the format is the greater
the chance of problems. You can add to this the problems of performance and
If you can switch to maildir I think you'll be happier. There are many tools to
convert mbox to maildir.
I've search and found many mbox to maildir converters. Any experience with any
of them? I just want to use one that is know to work well.
And what about the system mailbox in /var/mail? Does it need to be converted,
can sendmail handle that, can dovecot handle an mbox system mailbox and maildir
personal mailboxes? Depending on the answers, converting to maildir may be more
involved/harder to do for widespread use. TIA...
I used http://batleth.sapienti-sat.org/projects/mb2md/ to do the conversion and
simply modified my .procmailrc to do maildir delivery. Google, as always, was my
friend. If fixed the problem with that mailbox too. For future reference, the
.procmailrc key ingredients were these:
So this didn't fix the bug but worked around it. I don't know if it should be
closed or not.
Since the mbox file that fails to parse is incorrect, i suppose this is not
a bug in dovecot. The From line parsing may be a bug in either the generator
or parser. I am not sure if the changes in 1.0beta2 which is part of fc5
affected this, if someone needs dovecot to parse such mbox files, i would
suggest to try fc5 dovecot first and if it still does not work, file an
enhancement request. Not that it is too likely.