Comment 2Daniel Alvarez Sanchez
2019-10-10 07:57:30 UTC
Under pressure, the default monitor timeout value of 20 seconds is not enough to prevent unnecessary failovers of the ovn-dbs pacemaker resource.
While spawning a few VMs in the same time this could lead to unnecessary movements of master DB, then re-connections of ovn-controllers (slaves are read-only), further peaks of load on DBs, and at the end it could lead to snowball effect.
We should bump the default value in puppet to 60 seconds and provide an option to change it in the future from THT.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2020:0760