Bug 192151
Summary: | dovecot causing duplicate message to be created | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Paul Dickson <paul> | ||||||||
Component: | dovecot | Assignee: | Tomas Janousek <tjanouse> | ||||||||
Status: | CLOSED RAWHIDE | QA Contact: | |||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | rawhide | CC: | tss, wtogami | ||||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2007-04-24 07:25:55 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Paul Dickson
2006-05-17 21:50:23 UTC
Created attachment 129368 [details]
Maillog output from dovecot
Created attachment 129379 [details]
Dovecot maillog entries
I was read my mail and noticed more duplicate messages on Monday. Here are
more entries in maillog from dovecot. See second attached file.
Could you please re-test with current dovecot release from devel? It should be up to rc7 and the problem may have been addressed upstream. In the case it wasn't, i will try to reproduce locally and fix the issue. Thanks! This might not have been a dovecot bug. I removed the arguments "--expunge 50 -B 256" from fetchmail about a month ago and it seems to have stopped occuring. I'm not absolutely sure because some of the duplicates I found later may been added to mail folder prior to this change, and the few others may have happened upstream. I have definately not seen as many duplicates. Okey, since there are no other reports of dupes appearing due to dovecot, i'll close this report. If you ever run into the problem again and you find any hints that it may be dovecot fault, please reopen the report. The duplicate messages continues to be a problem. I'm currently using dovecot-1.0-3.rc22.fc7 For a long time now, I've been getting duplicate E-mail messages. I've been trying to determine where the problem lies. It seems to be coming from an interaction between dovecot and sylpheed. The problem is most noticeable when I have several hundred messages queued to be downloaded by sylpheed (I get slightly more than 100 in an hour; the dovecot mailbox gets filled every 12 minutes via fetchmail; sylpheed empties it every 30 minutes, but only while it's running). I do not see this problem when everything is behaving normally and sylpheed is downloading relatively small numbers of messages. I only see problems when there's a backlog of email (either fetchmail could not download messages for a period or sylpheed was not running). I did a diff of a duplicated message and both copies are identical, so the problem lies between dovecot and sylpheed. I've done some checking in the past, and the first pass of downloading messages appears download the correct number of messages. But checking the time stamps on the above duplicated message says the duplicate was downloaded again 31 minutes after the first. [dickson@white 10:50:53 test]$ ll 55793 55866 -rw------- 1 dickson dickson 4933 2007-03-05 17:59 55793 -rw------- 1 dickson dickson 4933 2007-03-05 18:30 55866 [dickson@white 11:10:14 test]$ diff -c 55793 55866 [dickson@white 11:13:22 test]$ It seems that dovecot might not be deleting ALL messages after the first downloading completes. Yesterday, after turning exiting sylpheed for a couple of hours, I restarted it. There were 376 messages to be downloaded. Upon completion, there was still 95 messages remaining. Getting those messages ended up retrieving duplicate messages (of the those messages, the sizes of the first 5 and the last 5 had matching sizes with the corresponding files log entries of the first download. Today, I hibernated my notebook. After resuming, I had sylpheed download 1024 messages. After the download, a "mail -d /var/mail/dickson" said I still had 944 unread messages (I moved those messages and deleted them). dovecot is not deleting the downloaded e-mail. Usually problems like these have been because the connection has hanged, so that the client didn't have a chance to do QUIT command. Look at the POP3 logout messages, they tell what the reason was for the client quit, how many messages were downloaded and how many were deleted. What does it say for those sessions where it didn't delete all the mails? Do your mails have CR+LF linefeeds in the server? I just found a bug related to that which could cause the connection to hang. I pretty sure the sylpheed got as far as sending the QUIT, but that part of yesterday's activity has already scrolled out of the log. I've just now disabled automatic downloading, so I'll attempt a big download after several hours. Looking at the block of duplicate messages shows no ^Ms in the mbox formated file. I'm going to look to see if dovecot is doing any logging activity... Here are the logging activity that dovecot did when I downloaded the 1024 messages yesterday: Mar 11 20:48:01 red dovecot: pop3-login: Login: user=<dickson>, method=PLAIN, rip=::ffff:192.168.1.9, lip=::ffff:192.168.1.3, TLS Mar 11 20:49:01 red dovecot: POP3(dickson): mbox sync: UID inserted in the middle of mailbox /var/mail/dickson (540600 > 182, seq=82, i dx_msgs=1024) Mar 11 20:49:02 red dovecot: POP3(dickson): Disconnected: Logged out top=0/0, retr=1024/6678735, del=1024/1024, size=6661165 That explains the problem. Now you'd just have to figure out why that happens. See my reply to a similar problem: http://dovecot.org/list/dovecot/2007-March/020592.html One thing that comes to my mind is that you're using a root dir (or index dir) that is shared by multiple pop3 users. Why the extremely large X-UID (540600)? I certainly did not have 540600 messages, or might this be a byte offset into the mbox file? Any way to see the index file that dovecot is using? In my case, sendmail is passing the message to procmail before it gets placed in /var/mail/dickson. No other (known) processes are accessing this file except dovecot. I don't even read email on this server. Looking at last week's maillog file, the above event (comment #10) happened 4 times, once while it was transferring only 21 messages. I'm going to see if I can track down the last "official" message in this group from the sendmail logged info. Maybe I see something noteworthy (like UTF-8 or something). Created attachment 149895 [details]
Maillog with dovecot download of 21 messages that has duplications
The excerpt from maillog has 21 messages added to /var/mail/dickson (plus one
additional that went to root). Of these 21, 5 were duplicated.
16 I python-list
17 L linux-kernel
18 M linux-kernel
19 K linux-ide
20 N python-list
21 G fedora-maintainers
The letter in column 2 is the 8th char in the sendmail unique ID.
Assuming I have the mbox order correct above, message 16 was not duplicated,
but the rest were (I only verified 16 through 18).
The only thing noteworthy of message 16 was that the subject line spanned two
lines.
Well after nearly 48 hours of not downloading E-mail, I had 2968 message in mailbox (plus the sentinel message, for 2969). I downloaded the messages in sylpheed: [16:40:42] POP3> DELE 2967 [16:40:42] POP3< +OK Marked to be deleted. [16:40:42] POP3> RETR 2968 [16:40:42] POP3< +OK 5126 octets [16:40:42] POP3> DELE 2968 [16:40:42] POP3< +OK Marked to be deleted. [16:40:42] POP3> QUIT [16:40:44] POP3< +OK Logging out, messages deleted. I went back to the mail box: [root@red 16:42:17 log]# mail -f /var/mail/dickson Mail version 8.1 6/6/93. Type ? for help. "/var/mail/dickson": 2483 messages 1 new 2482 unread Mar 14 16:36:37 red dovecot: pop3-login: Login: user=<dickson>, method=PLAIN, rip=::ffff:192.168.1.9, lip=::ffff:192.168.1.3, TLS Mar 14 16:40:47 red dovecot: POP3(dickson): mbox sync: UID inserted in the middle of mailbox /var/mail/dickson (544514 > 285, seq=489, idx_msgs=2968) Mar 14 16:40:48 red dovecot: POP3(dickson): Disconnected: Logged out top=0/0, retr=2968/18360893, del=2968/2968, size=18309982 Apparently dovecot is downloading the E-mail, but is getting confused later as to what it has done (the times on the two systems are synchronized to the same time source). I'd suggest you try if 1.0.rc27 has fixed this. There have been several fixes related to this error since rc22. (there's dovecot-1.0-7.rc27.fc7 available in dovecot since now) I've switched to dovecot-1.0-7.rc27.fc7, but in the past 27 hours I've only accumulated 1700 messages. I'm waiting for at least 2000. So it appears I won't be testing (or reading new email) for 20 more hours (when I'm available again for testing). I should note that with dovecot-1.0-3.rc22.fc7, I discovered I could avoid the problem if sylpheed's logging window was open and watching each message getting downloaded. I guess this might have slowed sylpheed down enough that dovecot didn't choke on itself. :=) After 47 hours I had 2962 messages queued, so I let sylpheed download them. Afterwards, there was still 1625 messages left on the server. Mar 22 17:32:26 red dovecot: pop3-login: Login: user=<dickson>, method=PLAIN, rip=::ffff:192.168.1.9, lip=::ffff:192.168.1.3, TLS Mar 22 17:35:42 red dovecot: POP3(dickson): mbox sync: UID inserted in the middle of mailbox /var/mail/dickson (559678 > 1314, seq=1339, idx_msgs=2962) Mar 22 17:35:42 red dovecot: POP3(dickson): Disconnected: Logged out top=0/0, retr=2962/17659757, del=2962/2962, size=17609024 Sylpheed's log entries from when it starts (this is from the next email download): * Connecting to POP3 server: red.pwd.internal... [17:50:08] POP3< +OK Dovecot ready. [17:50:08] POP3> STLS [17:50:08] POP3< +OK Begin TLS negotiation now. * SSL certificate of red.pwd.internal previously accepted [17:50:08] POP3> USER dickson [17:50:08] POP3< +OK [17:50:08] POP3> PASS ******** [17:50:08] POP3< +OK Logged in. [17:50:08] POP3> STAT [17:50:08] POP3< +OK 11 83558 [17:50:08] POP3> UIDL [17:50:08] POP3< +OK [17:50:09] POP3> LIST [17:50:09] POP3< +OK 11 messages: [17:50:09] POP3> RETR 1 [17:50:09] POP3< +OK 581 octets [17:50:09] POP3> DELE 1 [17:50:09] POP3< +OK Marked to be deleted. ... Could there be a race condition within dovecot so at higher transfer speeds I hit this problem? I switched from a hub to switch more than a year ago. And the notebook running sylpheed has a newer cpu with slight better clock speed than that of the dovecot server (1.6GHz Pentium M vs 1.333GHz Athon), plus 150% more RAM. Thanks, fixed: http://dovecot.org/list/dovecot-cvs/2007-March/008300.html It would anyway be a good idea to strip X-UID headers before they're written to mbox file (see http://wiki.dovecot.org/MboxProblems) I'm added this procmailrc recipe to see if I can capture some of these before dovecot filters them out: :0 c * ^X-UID /home/dickson/x-uid-messages Paul, what's the status of this bug? The last time I saw the logged issue that indicated the duplication was March 26. I installed dovecot-1.0-8.rc28.fc7 on March 28. I've captured (meaning I've saved clones) 23 messages that have come through with X-UID: headers during this period. Does that mean you are no longer able to reproduce the issue with 1.0.rc28? > Does that mean you are no longer able to reproduce the issue with 1.0.rc28?
Yes, I haven't been able to reproduce this problem since updating. I have not
done any exhaustive testing, but I'll reopen this bug report if this problem
reoccurs (I'm optimistic that I won't need to since it was fairly common before).
Ok, thx. |