From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.8) Gecko/20050524 Fedora/1.0.4-4 Firefox/1.0.4 Description of problem: I've got this mbox file that I access from evolution via dovecot. It's a simple mbox with three messages. Problem is that evolution/dovecot displays two messages with the middle one appeneded to the first one. If from a shell I look at the mbox file with 'mail -f mbox', it shows all three messages correctly. I'll attach said mbox file next. Version-Release number of selected component (if applicable): dovecot-0.99.14-4.fc4 How reproducible: Always Steps to Reproduce: 1. access this certain mbox file via an imap client and dovecot 2. 3. Actual Results: Two messages instead of three Expected Results: Three messages Additional info: I've seen this before when I was migrating from local evolution mbox files to an imap setup. This is just a simple case that should be good for debugging.
Created attachment 116892 [details] mbox file with three messages
Thank you for attaching a reproducer. I've been trying to reproduce some of the problems in bug # 162781 and I have not come up with a reproducer, this should help.
It's because the first message has Content-Length header that includes the whole second message. Something has written wrong Content-Length header to it, and the bug is really with it.. I don't think it's a bug to use that header if it's there and it's valid (ie. it points exactly to the beginning of next From-line).
What timing! I had just figured out that it was the content-length that was the culprit. Timo is there a way to configure dovecot not to use content-length? It's problematical (see http://wp.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html) As to who is writing it, I think its procmail, this may suggest a solution (http://www.mhonarc.org/archive/html/procmail/1997-08/msg00691.html), not sure 100% yet, still investigating.
The first discussion is almost 10 years old. Strict From-line parsing works nowadays very well. I have had to do only a couple of minor changes to the detection in last 3 years. So, Dovecot uses Content-Length header *IF* it points to beginning of next From_ line, otherwise it's ignored (and recreated). This works well even with bogus Content-Length headers as it's very unlikely that it contains the message body's + next message's exact length. I agree anyway that Dovecot should do >From line quoting for APPENDed messages, and it's in my TODO. Should probably do dequoting as well. Just not that high priority since the current way works pretty well too.. The reason why the attached mbox file got broken was because some software read the mbox file, but for some reason didn't notice the second message's From_ line and instead thought that it belonged to the first message. Then it wrote a broken Content-Length header. I suppose the above could have been done by Dovecot. I haven't heard that exact problem before, but 0.99.x's mbox code isn't all that great.. No way to configure Dovecot to ignore Content-Length, but it'd be simple to remove it from code.
FWIW, here is some history. I switched from using Evolution with local files to Evolution with IMAP. I made a script that copied the local evo files to an imap directory and I noticed that some of the mbox files had a few less messages in them when they were read back in via Evo/IMAP. If from within Evo I copied all the messages from a local mbox to a new imap mbox, I never had a problem. The ones where I copied the mbox files outside of evolution sometimes had problems. I don't know if Evolution wrote the bad content-length in the mbox files or if an older version of dovecot wrote them upon initial import.
I'm closing this out as not a bug. The content-length header in the first message is clearly wrong. The fact it points to the 3rd message means dovecot can't detect its bogus. Based on comment #6 I suspect some other agent other than dovecot incorrectly parsed the mbox file. I have not see any other examples of dovecot mangling content-length. FWIW you may want to consider using maildir instead of mbox, its much more efficient and robust. Plus you won't have any problem with content-length. A quick google will reveal many conversion maildir conversion tools.
So I tried removing the Content-Length: line from the affected file so that dovecot could regenerate it and it seems dovecot regenerates it incorrectly. (I'm assuming that evolution via imap using dovecot = dovecot generating the Content-Length field. Is this correct?) I'll attach four files next. First is the original mbox with incorrect Content-Length:, should be the same as the original attachment to this bug. Second is the simple script used to remove the Content-Length line. Third is the mbox with Content-Length: lines removed. Last is the file after I start up evolution and the Content-Length: is regenerated by dovecot (note assumption above.)
Created attachment 117912 [details] original mbox with incorrect Content-Length:
Created attachment 117913 [details] simple script to remove Content-Length:
Created attachment 117914 [details] mbox with Content-Length: removed
Created attachment 117915 [details] mbox after Evo/IMAP/Dovecot regenerates Content-Length:
mbox as parsed by mail: wintermute> mail -f Tech\ Tips Mail version 8.1 6/6/93. Type ? for help. "Tech Tips": 3 messages > 1 carolynn. Fri Nov 6 14:52: 78/2421 "cid 1035147" 2 hewitt Fri Feb 4 18:18: 103/4093 "Re: SCSI Question" 3 CrystalMail2 Wed Oct 20 11:37 64/2797 "Regarding Case Number" & x wintermute>
Looks like that mbox has a bit special From-line. Dovecot assumes that if the day has only one digit, it's prefixed by either zero or an extra space. Your mbox doesn't have either. Do you know what program generated the From-lines? I can change the parser to accept the digit without either zero or space prefix, but I haven't before seen mbox without either of them..
Going by dates, I would guess that the first mail I read with MediaMail on an SGI IRIX machine, the second and third were probably read with Evolution.
P.S. I guess it's possible the second was with MediaMail too. When did Evolution come out?
<2-cents> I'll throw my 2 cents in here, worth what you paid for it :-) To me this is a classic example of why mbox is a less than wonderful choice of mail storage format. There is no official standardization. The more pieces of software that get a chance to munge it with their own notion of what the format is the greater the chance of problems. You can add to this the problems of performance and catastrophic loss. If you can switch to maildir I think you'll be happier. There are many tools to convert mbox to maildir. </ 2-cents>
I've search and found many mbox to maildir converters. Any experience with any of them? I just want to use one that is know to work well.
And what about the system mailbox in /var/mail? Does it need to be converted, can sendmail handle that, can dovecot handle an mbox system mailbox and maildir personal mailboxes? Depending on the answers, converting to maildir may be more involved/harder to do for widespread use. TIA...
I used http://batleth.sapienti-sat.org/projects/mb2md/ to do the conversion and simply modified my .procmailrc to do maildir delivery. Google, as always, was my friend. If fixed the problem with that mailbox too. For future reference, the .procmailrc key ingredients were these: MAILDIR=$HOME/Maildir/ DEFAULT=$MAILDIR So this didn't fix the bug but worked around it. I don't know if it should be closed or not.
Since the mbox file that fails to parse is incorrect, i suppose this is not a bug in dovecot. The From line parsing may be a bug in either the generator or parser. I am not sure if the changes in 1.0beta2 which is part of fc5 affected this, if someone needs dovecot to parse such mbox files, i would suggest to try fc5 dovecot first and if it still does not work, file an enhancement request. Not that it is too likely.