Bug 1412306

Summary: Document to upgrade IPA servers one at a time with enough time between updates to allow replication to finish
Product: Red Hat Enterprise Linux 7 Reporter: Petr Vobornik <pvoborni>
Component: doc-Linux_Domain_Identity_Management_GuideAssignee: Aneta Šteflová Petrová <apetrova>
Status: CLOSED CURRENTRELEASE QA Contact: Namita Soman <nsoman>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: pvoborni, rcritten, rhel-docs, tbordaz
Target Milestone: rcKeywords: Documentation
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-14 09:35:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1415716    

Description Petr Vobornik 2017-01-11 17:24:37 UTC
IPA server is upgraded by simply updating rpms, e.g. `yum update ipa-server`.

Upgrader, and especially between major updates, internally does data update, meaning it changes data and possibly schema in LDAP. Given that IPA is multimaster solution, this data is then replicated automatically to other IPA masters and they accept it just fine.

Trouble comes when two or more servers are updated at the same time or right after each other. In that case update might not be replicated to other servers and given that the servers are doing the same or similar changes it then create two conflicting replication events which might end in replication conflicts which in in the end might break IPA server functionality.

Therefore it is required to document that servers needs to be upgraded one at a time with enough time between each update to let replication to finish.

Comment 1 thierry bordaz 2017-01-11 17:46:11 UTC
During the upgrade some updates will be done. Ideally we would like to wait all the upgrade updates to be replicated to all servers before attempting to upgrade the next server.

Under normal load, we could expect updates to be replicated in few seconds, let's say within 10sec.

If we want to be sure, we may check the operation in the access logs of the others servers. On the upgraded server, we can monitor access log looking for the last 'csn=xxx' value in the log (note the access log is flushed periodically so one may need to wait a bit to be sure all updates are in the access log).
Then a "grep 'csn=xxx' <non_upgraded_servers>:/var/log/dirsrv/slapd-<instance>/access" will indicate that this last update was replicated.

An other solution would be to monitor RUV, but likely more complex to interpret.

Comment 2 Aneta Šteflová Petrová 2017-02-06 14:05:10 UTC
The update is now pending reviews.

Comment 3 Aneta Šteflová Petrová 2017-02-14 11:00:29 UTC
The update was acked in peer and developer review.

Comment 6 Aneta Šteflová Petrová 2017-03-14 09:35:54 UTC
The update is now available on the Customer Portal.