Bug 163550 - dovecot seems to parse mbox file incorrectly
dovecot seems to parse mbox file incorrectly
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: dovecot (Show other bugs)
4
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Petr Rockai
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-07-18 15:47 EDT by Thomas J. Baker
Modified: 2014-01-21 17:52 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-04-03 07:08:25 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
mbox file with three messages (9.13 KB, text/plain)
2005-07-18 15:49 EDT, Thomas J. Baker
no flags Details
original mbox with incorrect Content-Length: (9.13 KB, text/plain)
2005-08-19 11:06 EDT, Thomas J. Baker
no flags Details
simple script to remove Content-Length: (108 bytes, text/plain)
2005-08-19 11:09 EDT, Thomas J. Baker
no flags Details
mbox with Content-Length: removed (9.09 KB, text/plain)
2005-08-19 11:11 EDT, Thomas J. Baker
no flags Details
mbox after Evo/IMAP/Dovecot regenerates Content-Length: (9.09 KB, text/plain)
2005-08-19 11:12 EDT, Thomas J. Baker
no flags Details

  None (edit)
Description Thomas J. Baker 2005-07-18 15:47:32 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050524 Fedora/1.0.4-4 Firefox/1.0.4

Description of problem:
I've got this mbox file that I access from evolution via dovecot. It's a simple mbox with three messages. Problem is that evolution/dovecot displays two messages with the middle one appeneded to the first one. If from a shell I look at the mbox file with 'mail -f mbox', it shows all three messages correctly. I'll attach said mbox file next.

Version-Release number of selected component (if applicable):
dovecot-0.99.14-4.fc4

How reproducible:
Always

Steps to Reproduce:
1. access this certain mbox file via an imap client and dovecot
2.
3.
  

Actual Results:  Two messages instead of three

Expected Results:  Three messages

Additional info:

I've seen this before when I was migrating from local evolution mbox files to an imap setup. This is just a simple case that should be good for debugging.
Comment 1 Thomas J. Baker 2005-07-18 15:49:13 EDT
Created attachment 116892 [details]
mbox file with three messages
Comment 2 John Dennis 2005-07-18 15:50:57 EDT
Thank you for attaching a reproducer. I've been trying to reproduce some of the
problems in bug # 162781 and I have not come up with a reproducer, this should help.
Comment 3 Timo Sirainen 2005-07-18 16:46:28 EDT
It's because the first message has Content-Length header that includes the whole
second message. Something has written wrong Content-Length header to it, and the
bug is really with it.. I don't think it's a bug to use that header if it's
there and it's valid (ie. it points exactly to the beginning of next From-line).
Comment 4 John Dennis 2005-07-18 16:58:28 EDT
What timing! I had just figured out that it was the content-length that was the
culprit.

Timo is there a way to configure dovecot not to use content-length? It's
problematical (see
http://wp.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html)

As to who is writing it, I think its procmail, this may suggest a solution
(http://www.mhonarc.org/archive/html/procmail/1997-08/msg00691.html), not sure
100% yet, still investigating.
Comment 5 Timo Sirainen 2005-07-18 17:16:13 EDT
The first discussion is almost 10 years old. Strict From-line parsing works
nowadays very well. I have had to do only a couple of minor changes to the
detection in last 3 years.

So, Dovecot uses Content-Length header *IF* it points to beginning of next From_
line, otherwise it's ignored (and recreated). This works well even with bogus
Content-Length headers as it's very unlikely that it contains the message body's
+ next message's exact length.

I agree anyway that Dovecot should do >From line quoting for APPENDed messages,
and it's in my TODO. Should probably do dequoting as well. Just not that high
priority since the current way works pretty well too..

The reason why the attached mbox file got broken was because some software read
the mbox file, but for some reason didn't notice the second message's From_ line
and instead thought that it belonged to the first message. Then it wrote  a
broken Content-Length header.

I suppose the above could have been done by Dovecot. I haven't heard that exact
problem before, but 0.99.x's mbox code isn't all that great..

No way to configure Dovecot to ignore Content-Length, but it'd be simple to
remove it from code.
Comment 6 Thomas J. Baker 2005-07-19 08:37:10 EDT
FWIW, here is some history. I switched from using Evolution with local files to
Evolution with IMAP. I made a script that copied the local evo files to an imap
directory and I noticed that some of the mbox files had a few less messages in
them when they were read back in via Evo/IMAP. If from within Evo I copied all
the messages from a local mbox to a new imap mbox, I never had a problem. The
ones where I copied the mbox files outside of evolution sometimes had problems.
I don't know if Evolution wrote the bad content-length in the mbox files or if
an older version of dovecot wrote them upon initial import. 
Comment 7 John Dennis 2005-07-22 15:44:07 EDT
I'm closing this out as not a bug. The content-length header in the first
message is clearly wrong. The fact it points to the 3rd message means dovecot
can't detect its bogus. Based on comment #6 I suspect some other agent other
than dovecot incorrectly parsed the mbox file. I have not see any other examples
of dovecot mangling content-length.

FWIW you may want to consider using maildir instead of mbox, its much more
efficient and robust. Plus you won't have any problem with content-length. A
quick google will reveal many conversion maildir conversion tools.
Comment 8 Thomas J. Baker 2005-08-19 11:01:59 EDT
So I tried removing the Content-Length: line from the affected file so that
dovecot could regenerate it and it seems dovecot regenerates it incorrectly.
(I'm assuming that evolution via imap using dovecot = dovecot generating the
Content-Length field. Is this correct?) I'll attach four files next. First is
the original mbox with incorrect Content-Length:, should be the same as the
original attachment to this bug. Second is the simple script used to remove the
Content-Length line. Third is the mbox with Content-Length: lines removed. Last
is the file after I start up evolution and the Content-Length: is regenerated by
dovecot (note assumption above.)
Comment 9 Thomas J. Baker 2005-08-19 11:06:33 EDT
Created attachment 117912 [details]
original mbox with incorrect Content-Length:
Comment 10 Thomas J. Baker 2005-08-19 11:09:43 EDT
Created attachment 117913 [details]
simple script to remove Content-Length:
Comment 11 Thomas J. Baker 2005-08-19 11:11:24 EDT
Created attachment 117914 [details]
mbox with Content-Length: removed
Comment 12 Thomas J. Baker 2005-08-19 11:12:29 EDT
Created attachment 117915 [details]
mbox after Evo/IMAP/Dovecot regenerates Content-Length:
Comment 13 Thomas J. Baker 2005-08-19 11:14:31 EDT
mbox as parsed by mail:

wintermute> mail -f Tech\ Tips
Mail version 8.1 6/6/93.  Type ? for help.
"Tech Tips": 3 messages
>   1 carolynn@moomoo.csd.  Fri Nov 6 14:52:  78/2421  "cid 1035147"
    2 hewitt@sgi.com        Fri Feb 4 18:18: 103/4093  "Re: SCSI Question"
    3 CrystalMail2@sgi.com  Wed Oct 20 11:37  64/2797  "Regarding Case Number"
& x
wintermute>
Comment 14 Timo Sirainen 2005-08-19 11:40:38 EDT
Looks like that mbox has a bit special From-line. Dovecot assumes that if the
day has only one digit, it's prefixed by either zero or an extra space. Your
mbox doesn't have either. Do you know what program generated the From-lines? I
can change the parser to accept the digit without either zero or space prefix,
but I haven't before seen mbox without either of them..
Comment 15 Thomas J. Baker 2005-08-19 14:05:05 EDT
Going by dates, I would guess that the first mail I read with MediaMail on an
SGI IRIX machine, the second and third were probably read with Evolution.

Comment 16 Thomas J. Baker 2005-08-19 14:05:55 EDT
P.S. I guess it's possible the second was with MediaMail too. When did Evolution
come out?
Comment 17 John Dennis 2005-08-19 14:14:24 EDT
<2-cents>

I'll throw my 2 cents in here, worth what you paid for it :-) To me this is a
classic example of why mbox is a less than wonderful choice of mail storage
format. There is no official standardization. The more pieces of software that
get a chance to munge it with their own notion of what the format is the greater
the chance of problems. You can add to this the problems of performance and
catastrophic loss.

If you can switch to maildir I think you'll be happier. There are many tools to
convert mbox to maildir.

</ 2-cents>
Comment 18 Thomas J. Baker 2005-08-24 09:08:54 EDT
I've search and found many mbox to maildir converters. Any experience with any
of them? I just want to use one that is know to work well.
Comment 19 Thomas J. Baker 2005-08-24 09:12:58 EDT
And what about the system mailbox in /var/mail? Does it need to be converted,
can sendmail handle that, can dovecot handle an mbox system mailbox and maildir
personal mailboxes? Depending on the answers, converting to maildir may be more
involved/harder to do for widespread use. TIA...
Comment 20 Thomas J. Baker 2005-08-24 09:55:52 EDT
I used http://batleth.sapienti-sat.org/projects/mb2md/ to do the conversion and
simply modified my .procmailrc to do maildir delivery. Google, as always, was my
friend. If fixed the problem with that mailbox too. For future reference, the
.procmailrc key ingredients were these:

MAILDIR=$HOME/Maildir/
DEFAULT=$MAILDIR

So this didn't fix the bug but worked around it. I don't know if it should be
closed or not.
Comment 21 Petr Rockai 2006-04-03 07:08:25 EDT
Since the mbox file that fails to parse is incorrect, i suppose this is not 
a bug in dovecot. The From line parsing may be a bug in either the generator 
or parser. I am not sure if the changes in 1.0beta2 which is part of fc5 
affected this, if someone needs dovecot to parse such mbox files, i would 
suggest to try fc5 dovecot first and if it still does not work, file an 
enhancement request. Not that it is too likely.

Note You need to log in before you can comment on or make changes to this bug.