Bug 1974508

Summary: Dovecot 2.3.8 regression - can not replicate using dsync
Product: Red Hat Enterprise Linux 8 Reporter: Salatiel <salatiel.filho>
Component: dovecotAssignee: Michal Hlavinka <mhlavink>
Status: CLOSED ERRATA QA Contact: Evgeny Fedin <efedin>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.4CC: efedin, gdelross, mhlavink
Target Milestone: betaKeywords: Rebase, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: dovecot-2.3.16-1.el8 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-10 14:27:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Salatiel 2021-06-21 21:29:30 UTC
Description of problem:

Redhat 8 latest dovecot package is dovecot-2.3.8-9.el8.x86_64. The dovecot version 2.3.8 has a serious bug that blocks replication using dsync.
The synchonization status  will be "Waiting for dsync to finish" until 10 minutes later when we will get in the logs "I/O has stalled, no activity for 600 seconds (version not received). Error: Timeout during state=master_recv_handshake"
This is a known regression introduced by dovecot 2.3.8 that has been fixed on upstream 2.3.9  but it was not backported by RedHat yet. This is a critical bug impacting core funcionality.

As you can see in dovecot changelog for 2.3.9:
dsync: Remote dsync started hanging if the initial doveadm
  "dsync-server" command was sent in the same TCP packet as the
  following dsync handshake. v2.3.8 regression.

Please backport this patch or update dovecot on rhel8 to version 2.3.9 otherwise replication feature will not work.


Version-Release number of selected component (if applicable):

dovecot-2.3.8-9.el8.x86_64

How reproducible: 

Steps to Reproduce:
1. Create two servers and install the dovecot package. Make postfix or sendmail deliver the email to dovecot using lmtp.

2.Enable replication between them. You can copy the following content on /etc/dovecot/conf.d/99-replica.conf and change the peer ip variable to point to the remote peer.

mail_plugins = $mail_plugins notify replication
service replicator {
  process_min_avail = 1
}
service replicator {
  unix_listener replicator-doveadm {
    mode = 0666
    user = dovecot
  }
}
service aggregator {
  fifo_listener replication-notify-fifo {
    mode = 0666
    user = dovecot
  }
  unix_listener replication-notify {
    mode = 0666
    user = dovecot
  }
}
service doveadm {
  inet_listener {
    port = 26
  }
}
doveadm_password = SOME_SHARED_PASSWORD_HERE
plugin {
  mail_replica = tcp:$PEER_IP:26       
}


3. Send an email to any of the nodes and check that the node will never replicate to the peer and 10 minutes later the stalled error message will be shown on logs. 
doveadm replicator dsync-status will stuck in 'Waiting for dsync to finish' for 10 minutes and then give up with that stalled error message.


Actual results:
No replication is done.

Expected results:
The replication should be done immediately.

Additional info:

The exactly same configuration works with 3rd party packages for 2.3.7 and 2.3.9 confirming that the bug was really a regression in 2.3.8.
You can find several threads on dovecot mailing list pointing to this problem.
https://dovecot.org/pipermail/dovecot/2019-December/117774.html

Comment 14 errata-xmlrpc 2022-05-10 14:27:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: dovecot security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1950