Bug 192151

Summary: dovecot causing duplicate message to be created
Product: [Fedora] Fedora Reporter: Paul Dickson <paul>
Component: dovecotAssignee: Tomas Janousek <tjanouse>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: tss, wtogami
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-04-24 07:25:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Maillog output from dovecot
none
Dovecot maillog entries
none
Maillog with dovecot download of 21 messages that has duplications none

Description Paul Dickson 2006-05-17 21:50:23 UTC
Description of problem:
Whenever I'm off-line a while and messages are queued up, when I download all
new message using POP3, not all of the messages get deleted after downloading.
So the second time I download my E-mail, I get a bunch of duplicate messages.

Version-Release number of selected component (if applicable):
dovecot-1.0-0.beta7.1

How reproducible:
Always

Steps to Reproduce:
1. Have at least a 1000 messages queued to download.
2. Download the messages
3.
  
Actual results:
All messages are downloaded the first time.  The second time, part of the first
download is again downloaded.

Expected results:
Messages are doenloaded only once.

Additional info:
I have been trying to track down the source of the duplicate messages for weeks.
 I noticed this morning that after downloading all 1820 message that I had 471
after just 50 minutes!  I get a lot of E-mail, but never this much in an hour.

Comment 1 Paul Dickson 2006-05-17 21:50:23 UTC
Created attachment 129368 [details]
Maillog output from dovecot

Comment 2 Paul Dickson 2006-05-18 00:59:42 UTC
Created attachment 129379 [details]
Dovecot maillog entries

I was read my mail and noticed more duplicate messages on Monday.  Here are
more entries in maillog from dovecot.  See second attached file.

Comment 3 Petr Rockai 2006-10-10 16:09:15 UTC
Could you please re-test with current dovecot release from devel? It should be 
up to rc7 and the problem may have been addressed upstream. In the case it 
wasn't, i will try to reproduce locally and fix the issue.

Thanks!

Comment 4 Paul Dickson 2006-10-10 23:29:42 UTC
This might not have been a dovecot bug.  I removed the arguments "--expunge 50
-B 256" from fetchmail about a month ago and it seems to have stopped occuring.
 I'm not absolutely sure because some of the duplicates I found later may been
added to mail folder prior to this change, and the few others may have happened
upstream.  I have definately not seen as many duplicates.

Comment 5 Petr Rockai 2006-10-11 06:15:35 UTC
Okey, since there are no other reports of dupes appearing due to dovecot, i'll 
close this report. If you ever run into the problem again and you find any 
hints that it may be dovecot fault, please reopen the report.

Comment 6 Paul Dickson 2007-03-09 18:25:50 UTC
The duplicate messages continues to be a problem.

I'm currently using dovecot-1.0-3.rc22.fc7

For a long time now, I've been getting duplicate E-mail messages.  I've
been trying to determine where the problem lies.  It seems to be coming
from an interaction between dovecot and sylpheed.

The problem is most noticeable when I have several hundred messages
queued to be downloaded by sylpheed (I get slightly more than 100 in
an hour; the dovecot mailbox gets filled every 12 minutes via fetchmail;
sylpheed empties it every 30 minutes, but only while it's running).  
I do not see this problem when everything is behaving normally and
sylpheed is downloading relatively small numbers of messages.  I only
see problems when there's a backlog of email (either fetchmail could
not download messages for a period or sylpheed was not running).

I did a diff of a duplicated message and both copies are identical, so
the problem lies between dovecot and sylpheed.

I've done some checking in the past, and the first pass of downloading
messages appears download the correct number of messages.  But checking
the time stamps on the above duplicated message says the duplicate was
downloaded again 31 minutes after the first.

[dickson@white 10:50:53 test]$ ll 55793 55866
-rw------- 1 dickson dickson 4933 2007-03-05 17:59 55793
-rw------- 1 dickson dickson 4933 2007-03-05 18:30 55866
[dickson@white 11:10:14 test]$ diff -c 55793 55866
[dickson@white 11:13:22 test]$

It seems that dovecot might not be deleting ALL messages after the
first downloading completes.

Comment 7 Paul Dickson 2007-03-12 04:02:02 UTC
Yesterday, after turning exiting sylpheed for a couple of hours, I restarted it.
 There were 376 messages to be downloaded.  Upon completion, there was still 95
messages remaining.  Getting those messages ended up retrieving duplicate
messages (of the those messages, the sizes of the first 5 and the last 5 had
matching sizes with the corresponding files log entries of the first download.

Today, I hibernated my notebook.  After resuming, I had sylpheed download 1024
messages.  After the download, a "mail -d /var/mail/dickson" said I still had
944 unread messages (I moved those messages and deleted them).

dovecot is not deleting the downloaded e-mail.

Comment 8 Timo Sirainen 2007-03-12 19:02:38 UTC
Usually problems like these have been because the connection has hanged, so that the client didn't have a 
chance to do QUIT command. Look at the POP3 logout messages, they tell what the reason was for the 
client quit, how many messages were downloaded and how many were deleted. What does it say for those 
sessions where it didn't delete all the mails?

Do your mails have CR+LF linefeeds in the server? I just found a bug related to that which could cause the 
connection to hang.


Comment 9 Paul Dickson 2007-03-12 23:43:42 UTC
I pretty sure the sylpheed got as far as sending the QUIT, but that part of
yesterday's activity has already scrolled out of the log.  I've just now
disabled automatic downloading, so I'll attempt a big download after several hours.

Looking at the block of duplicate messages shows no ^Ms in the mbox formated file.

I'm going to look to see if dovecot is doing any logging activity...

Comment 10 Paul Dickson 2007-03-12 23:51:39 UTC
Here are the logging activity that dovecot did when I downloaded the 1024
messages yesterday:

Mar 11 20:48:01 red dovecot: pop3-login: Login: user=<dickson>, method=PLAIN,
rip=::ffff:192.168.1.9, lip=::ffff:192.168.1.3, TLS
Mar 11 20:49:01 red dovecot: POP3(dickson): mbox sync: UID inserted in the
middle of mailbox /var/mail/dickson (540600 > 182, seq=82, i
dx_msgs=1024)
Mar 11 20:49:02 red dovecot: POP3(dickson): Disconnected: Logged out top=0/0,
retr=1024/6678735, del=1024/1024, size=6661165


Comment 11 Timo Sirainen 2007-03-12 23:59:27 UTC
That explains the problem. Now you'd just have to figure out why that happens.
See my reply to a similar problem:

http://dovecot.org/list/dovecot/2007-March/020592.html

One thing that comes to my mind is that you're using a root dir (or index dir)
that is shared by multiple pop3 users.

Comment 12 Paul Dickson 2007-03-13 00:27:16 UTC
Why the extremely large X-UID (540600)?  I certainly did not have 540600
messages, or might this be a byte offset into the mbox file?  Any way to see the
index file that dovecot is using?  

In my case, sendmail is passing the message to procmail before it gets placed in
/var/mail/dickson.  No other (known) processes are accessing this file except
dovecot.  I don't even read email on this server.  


Looking at last week's maillog file, the above event (comment #10) happened 4
times, once while it was transferring only 21 messages.  I'm going to see if I
can track down the last "official" message in this group from the sendmail
logged info.  Maybe I see something noteworthy (like UTF-8 or something).

Comment 13 Paul Dickson 2007-03-13 02:19:13 UTC
Created attachment 149895 [details]
Maillog with dovecot download of 21 messages that has duplications

The excerpt from maillog has 21 messages added to /var/mail/dickson (plus one
additional that went to root).	Of these 21, 5 were duplicated.

16  I  python-list
17  L  linux-kernel
18  M  linux-kernel
19  K  linux-ide
20  N  python-list
21  G  fedora-maintainers

The letter in column 2 is the 8th char in the sendmail unique ID.

Assuming I have the mbox order correct above, message 16 was not duplicated,
but the rest were (I only verified 16 through 18).

The only thing noteworthy of message 16 was that the subject line spanned two
lines.

Comment 14 Paul Dickson 2007-03-14 23:51:16 UTC
Well after nearly 48 hours of not downloading E-mail, I had 2968 message in
mailbox (plus the sentinel message, for 2969).  I downloaded the messages in
sylpheed:

[16:40:42] POP3> DELE 2967
[16:40:42] POP3< +OK Marked to be deleted.
[16:40:42] POP3> RETR 2968
[16:40:42] POP3< +OK 5126 octets
[16:40:42] POP3> DELE 2968
[16:40:42] POP3< +OK Marked to be deleted.
[16:40:42] POP3> QUIT
[16:40:44] POP3< +OK Logging out, messages deleted.

I went back to the mail box:

[root@red 16:42:17 log]# mail -f /var/mail/dickson 
Mail version 8.1 6/6/93.  Type ? for help.
"/var/mail/dickson": 2483 messages 1 new 2482 unread

Mar 14 16:36:37 red dovecot: pop3-login: Login: user=<dickson>, method=PLAIN,
rip=::ffff:192.168.1.9, lip=::ffff:192.168.1.3, TLS
Mar 14 16:40:47 red dovecot: POP3(dickson): mbox sync: UID inserted in the
middle of mailbox /var/mail/dickson (544514 > 285, seq=489, 
idx_msgs=2968)
Mar 14 16:40:48 red dovecot: POP3(dickson): Disconnected: Logged out top=0/0,
retr=2968/18360893, del=2968/2968, size=18309982

Apparently dovecot is downloading the E-mail, but is getting confused later as
to what it has done (the times on the two systems are synchronized to the same
time source).

Comment 15 Timo Sirainen 2007-03-15 00:22:13 UTC
I'd suggest you try if 1.0.rc27 has fixed this. There have been several fixes
related to this error since rc22.

Comment 16 Tomas Janousek 2007-03-19 15:59:27 UTC
(there's dovecot-1.0-7.rc27.fc7 available in dovecot since now)

Comment 17 Paul Dickson 2007-03-22 05:05:48 UTC
I've switched to dovecot-1.0-7.rc27.fc7, but in the past 27 hours I've only
accumulated 1700 messages.  I'm waiting for at least 2000.  So it appears I
won't be testing (or reading new email) for 20 more hours (when I'm available
again for testing).

I should note that with dovecot-1.0-3.rc22.fc7, I discovered I could avoid the
problem if sylpheed's logging window was open and watching each message getting
downloaded.  I guess this might have slowed sylpheed down enough that dovecot
didn't choke on itself. :=)

Comment 18 Paul Dickson 2007-03-23 01:04:42 UTC
After 47 hours I had 2962 messages queued, so I let sylpheed download them.

Afterwards, there was still 1625 messages left on the server.


Mar 22 17:32:26 red dovecot: pop3-login: Login: user=<dickson>, method=PLAIN,
rip=::ffff:192.168.1.9, lip=::ffff:192.168.1.3, TLS
Mar 22 17:35:42 red dovecot: POP3(dickson): mbox sync: UID inserted in the
middle of mailbox /var/mail/dickson (559678 > 1314, seq=1339, idx_msgs=2962)
Mar 22 17:35:42 red dovecot: POP3(dickson): Disconnected: Logged out top=0/0,
retr=2962/17659757, del=2962/2962, size=17609024


Sylpheed's log entries from when it starts (this is from the next email download):

* Connecting to POP3 server: red.pwd.internal...
[17:50:08] POP3< +OK Dovecot ready.
[17:50:08] POP3> STLS
[17:50:08] POP3< +OK Begin TLS negotiation now.
* SSL certificate of red.pwd.internal previously accepted
[17:50:08] POP3> USER dickson
[17:50:08] POP3< +OK
[17:50:08] POP3> PASS ********
[17:50:08] POP3< +OK Logged in.
[17:50:08] POP3> STAT
[17:50:08] POP3< +OK 11 83558
[17:50:08] POP3> UIDL
[17:50:08] POP3< +OK
[17:50:09] POP3> LIST
[17:50:09] POP3< +OK 11 messages:
[17:50:09] POP3> RETR 1
[17:50:09] POP3< +OK 581 octets
[17:50:09] POP3> DELE 1
[17:50:09] POP3< +OK Marked to be deleted.
...

Could there be a race condition within dovecot so at higher transfer speeds I
hit this problem?  I switched from a hub to switch more than a year ago.  And
the notebook running sylpheed has a newer cpu with slight better clock speed
than that of the dovecot server (1.6GHz Pentium M vs 1.333GHz Athon), plus 150%
more RAM.

Comment 19 Timo Sirainen 2007-03-23 02:48:13 UTC
Thanks, fixed: http://dovecot.org/list/dovecot-cvs/2007-March/008300.html

It would anyway be a good idea to strip X-UID headers before they're written to
mbox file (see http://wiki.dovecot.org/MboxProblems)


Comment 20 Paul Dickson 2007-03-23 16:33:20 UTC
I'm added this procmailrc recipe to see if I can capture some of these before
dovecot filters them out:

:0 c
* ^X-UID
/home/dickson/x-uid-messages


Comment 21 Tomas Janousek 2007-04-20 09:24:55 UTC
Paul, what's the status of this bug?

Comment 22 Paul Dickson 2007-04-20 18:58:34 UTC
The last time I saw the logged issue that indicated the duplication was March 26.

I installed dovecot-1.0-8.rc28.fc7 on March 28.

I've captured (meaning I've saved clones) 23 messages that have come through
with X-UID: headers during this period.

Comment 23 Tomas Janousek 2007-04-23 10:48:09 UTC
Does that mean you are no longer able to reproduce the issue with 1.0.rc28?

Comment 24 Paul Dickson 2007-04-24 06:26:00 UTC
> Does that mean you are no longer able to reproduce the issue with 1.0.rc28?

Yes, I haven't been able to reproduce this problem since updating.  I have not
done any exhaustive testing, but I'll reopen this bug report if this problem
reoccurs (I'm optimistic that I won't need to since it was fairly common before).

Comment 25 Tomas Janousek 2007-04-24 07:25:55 UTC
Ok, thx.