Bug 1498523
| Summary: | second replica installation fails when master is restored from backup | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Mohammad Rizwan <myusuf> |
| Component: | ipa | Assignee: | IPA Maintainers <ipa-maint> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | ipa-qe <ipa-qe> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.4 | CC: | cheimes, frenaud, gparente, mrhodes, myusuf, pvoborni, rcritten, tscherf |
| Target Milestone: | rc | Keywords: | Regression |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-01-10 11:41:13 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1511607 | ||
| Bug Blocks: | |||
|
Description
Mohammad Rizwan
2017-10-04 14:07:10 UTC
How reproducible is this? We'd need more logs to try to diagnose why the request is failing. Also need to know what services are configured on each of the masters (CA, DNS, etc). It is always reproducible by following steps provided in the description The master have CA, DNS, KDC, DS, httpd, ntpd services on it. What logs you need? I have scraped the systems but it is reproducible so can set it up. I will be setting up an environment and will share with you, but it will be available for only 4 days as we can reserve beaker system for that much period only. Not sure how or if this is related to the master restore but the replica is requesting a cert from itself which doesn't have Apache configured yet which is why the request is rejected.
Number of certificates and requests being tracked: 1.
Request ID '20171017063153':
status: CA_REJECTED
ca-error: Server at https://replica2.testrelm.test/ipa/xml failed request, will retry: -504 (libcurl failed to execute the HTTP POST transaction, explaining: Failed connect to replica2.testrelm.test:443; Connection refused).
stuck: yes
key pair storage: type=NSSDB,location='/etc/dirsrv/slapd-TESTRELM-TEST',nickname='Server-Cert',token='NSS Certificate DB',pinfile='/etc/dirsrv/slapd-TESTRELM-TEST/pwdfile.txt'
certificate: type=NSSDB,location='/etc/dirsrv/slapd-TESTRELM-TEST',nickname='Server-Cert'
CA: IPA
issuer:
subject:
expires: unknown
pre-save command:
post-save command: /usr/libexec/ipa/certmonger/restart_dirsrv TESTRELM-TEST
track: yes
auto-renew: yes
I'm assuming your doing this as domain level 1. Given that the client that will become the replica should have: server=master.example.com in /etc/ipa/default.conf. The promotion process uses this server as the bootstrap server because not all servers have been setup on the new replica yet. Right after 389-ds is configure it rewrites /etc/ipa/default.conf to drop the server entry and it adds the options for the CA (enable_ra, dogtag_version, etc). Can you re-run this and before promoting the replica see if there is a server value in /etc/ipa/default.conf? It is doing its own promotion. The client gets configured first. I reinstalled it and paused before certmonger is executed. As I had suspected the xmlrpc_uri is pointing to the local server instead of the remote master which is why it is failing. I'll need to poke at the code further. AFAIU it isn't supposed to rewrite /etc/ipa/default.conf until after this step specifically for this reason. Have been unable to reproduce locally so far and default.conf is similar, pointing to the new replica, so I think that's a dead-end. Going to look to see what certmonger is doing on the working replica I have, then see what it is doing on the reproducing machines. I believe the issue is related to the server directive in /etc/ipa/default.conf. My working reproducer has it set to the remote master, replica2 in this case does not. certmonger will generate the xmlrpc_uri in the case that the server directive exists otherwise it will use the xmlrpc_uri defined in default.conf. This is why it is trying to talk to itself: no server directive. I've yet to determine under what condition(s) this value is being set. Previous findings were red herrings. While server directive will tell certmonger which master to go this isn't normally necessary. In case the xmlrcp_uri is incorrect certmonger will look for available CA's against a configured master. Because the client is configured first /etc/openldap/ldap.conf is already configured properly. I confirmed that two CAs are returned for the equivalent search: ldapsearch -Y GSSAPI -H ldaps://master.testrelm.test -b cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=test "(&(objectClass=ipaconfigobject)(cn=CA)(ipaConfigString=enabledservice))" certmonger should be picking one of these to resubmit the request to. Still working out why this isn't happening. Ok, the whole certificate thing itself is a red herring. strace led the way with 15907 write(2, "Fault 2100: (RPC failed at server. Insufficient access: Invalid credentials).\n", 80) = 80 I confirmed that replica1.testrelm.test had no knowledge of replica2.testrelm.test hence the failure in obtaining the certificate (I used kvno). The current replication status with master.testrelm.test is: Error (-2) Problem connecting to replica - LDAP error: Local error (connection error) So I need the exact steps taken to uninstall master and restore it from backup. When there are existing masters a full re-init on the other masters is necessary after the restore. Did you run this on replica1: # ipa-replica-manage re-initialize --from=master.testrelm.net I uninstall master with the following command: 1. ipa-server-install -U --uninstall 2. ipa-restore <backup-file> 3. on replica1 ran following: $ ipa topologysegment-reinitialize domain segment-name --left I ran above command for re-initialization and not ipa-replica-manage re-initialize. Second replica installation failed with same error.
I observed the "unknown host error" while re-initiliazing replica.
[root@replica1 ~]# ipa-replica-manage re-initialize --from=master.testrelm.test
Unknown host master.testrelm.test: Host 'master.testrelm.test' does not have corresponding DNS A/AAAA record
I added the dns A record for master manually on the replica1. I used following command:
[root@replica1 ~]# ipa dnsrecord-mod testrelm.test master --a-rec <master-ip-address>
Record name: master
A record: <master-ip-address>
SSHFP record: 1 1 8FEFCC37BC3B34899F7E2B0F6AA08408A425370F, 1 2 B0250CE035B5FC4CB818D19513D7C73CB7D0426B746189DD3EC3CD4D 63B8A657, 3 1
5D08FFF93CAEE4D3E5690E164D4378AA6DE4E62F, 3 2 4041C9E9C1BDC9AC7A512430C64A344169CC768F81B904A695463177 4D205855, 4 1
F3F6B7363C8872E16ECE2523FC36C5E5C300AF8B, 4 2 48FFCDF5BEA6009A324CDE5F0E877E23A42BC0F6CAD3E6F069CC97E0 865BC105
Again I tried re-initialize the replica but got following error (I used kinit admin):
ACIError: Insufficient access: SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Server ldap/master.testrelm.test not found in Kerberos database)
Here is full command I ran and traceback:
[root@bkr-hv01-guest26 ~]# ipa-replica-manage re-initialize --from=master.testrelm.test --verbose
Traceback (most recent call last):
File "/usr/sbin/ipa-replica-manage", line 1615, in <module>
main(options, args)
File "/usr/sbin/ipa-replica-manage", line 1558, in main
options.nolookup)
File "/usr/sbin/ipa-replica-manage", line 1200, in re_initialize
repl = replication.ReplicationManager(realm, fromhost, dirman_passwd)
File "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py", line 222, in __init__
self.conn.gssapi_bind()
File "/usr/lib/python2.7/site-packages/ipapython/ipaldap.py", line 1124, in gssapi_bind
'', auth_tokens, server_controls, client_controls)
File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/lib/python2.7/site-packages/ipapython/ipaldap.py", line 1007, in error_handler
raise errors.ACIError(info=info)
ACIError: Insufficient access: SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Server ldap/master.testrelm.test not found in Kerberos database)
Unexpected error: Insufficient access: SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Server ldap/master.testrelm.test not found in Kerberos database)
I think the issue lies with the procedure. As I can follow the steps they are: - install master - install replica1 - backup master - uninstall master <---- this is where things go wrong - restore master When master is uninstalled the records for master on replica1 are removed. This is why the re-initialize is failing, because it has no knowledge of master to reconnect with. You could try disabling the replication agreement by setting nsds5ReplicaEnabled=off in the agreement on both sides before uninstalling. I think that re-initializing it will re-enable it on replica1 but you'd need to double-check that. I don't understand why you are changing the SELinux booleans at the same time. Is that to confirm that they get set properly during the restore? It was a legacy code where it was changing the SELinux boolean. I thought it is relevant thats why it is there. I will try by disabling nsds5ReplicaEnabled on both servers before uninstalling master. I found doc[1] where it says that we need to enable the nsds5ReplicaEnabled manually in order to re-ebnable the replication agreement. [1] https://access.redhat.com/documentation/en-us/red_hat_directory_server/9.0/html/administration_guide/disabling-replication So far, I found 2 issues with ipa-restore. 1/ ipa-restore is broken with python2, see ticket https://pagure.io/freeipa/issue/7231 2/ ipa-restore does not enable oddjobd, see ticket https://pagure.io/freeipa/issue/7234 There are still other issues, for instance when ipa-server-install --uninstall is run, the service entry dn: krbprincipalname=ldap/master.domain.com is removed from the replica. This entry is used for replication authentication from master to replica. Thus when calling ipa-replica-manage re-initialize --from=master.domain.com on the replica, the command fails with : Directory Manager password: Re-run /usr/sbin/ipa-replica-manage with --verbose option to get more information Unexpected error: Insufficient access: Invalid credentials I am investigating if ipa-server-install --uninstall should really delete the entry or simply disable the replication agreement. The deletion was added in commit https://github.com/freeipa/freeipa/commit/31ffe1a12922b5118c847cbd6ac1ca9ea232ef94 Doing the uninstall isn't really mimicking what was expected in the real world (e.g. machine dying horrible death). IMHO ipa-server-install is working properly. The user is requesting that the maste r go away and IPA is obliging. There are several possible scenarios for restoration: 1. Your master simply died and you want to re-install it (hardware failure) 2. Somebody did something really bad and you need to restore data to an older state 3. The master is somehow hosed and restoring seems like a good way to get it back (it isn't) For #2 in particular all replication agreements need to be killed because otherwise MMR will happily shove down all the "bad" data again after the restore. By doing a restore it is a way of saying "This is the data I want" hence all the re-initialize. I think for this test disabling the replication agreement(s) before calling the uninstall will have the desired effect. The entry can be deleted without propagating the deletion of the needed entries. What I'm less sure of is whether 389-ds will automatically re-enable the agreement on re-initialize. With the following procedure, the 2nd replica can successfully be installed: 1. configure master with ipa-server-install 2. configure replica1 with ipa-replica-install --server=master.domain.com ... 3. back-up master with ipa-backup 4. perform ldapmodify on the master to disable replication: $ ldapmodify -h master.domain.com -p 389 -D "cn=directory manager" -W dn: cn=meToreplica1.domain.com,cn=replica,cn=dc\3Ddomain\2Cdc\3Dcom,cn=mapping tree,cn=config changetype: modify replace: nsds5ReplicaEnabled nsds5ReplicaEnabled: off dn: cn=caToreplica1.domain.com,cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config changetype: modify replace: nsds5ReplicaEnabled nsds5ReplicaEnabled: off 5. uninstall server with ipa-server-install --uninstall -U. As replication is already disabled, the repl agreements and service entries are not deleted on replica1 6. restore the master with ipa-restore /path/to/backup. The new master contains the replication agreements and service entries. 7. on the replica1, re-initialize replication with: $ ipa-replica-manage re-initialize --from=master.domain.com $ ipa-csreplica-manage re-initialize --from=master.domain.com At this point the replication is working. 8. install the second replica with ipa-replica-install --server=master.domain.com ... No issue with the 2nd replica installation when the master<->replica1 replication is working. Using steps provided in comment #24, replication between master <-> relica1 is working fine. However it is failing to install replica2 and throwing the error: ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall): ERROR 406 Client Error: Failed to validate message: No recipient matched the provided key["Failed: [ValueError('Decryption failed.',)]"] ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall): ERROR The ipa-replica-install command failed. See /var/log/ipareplica-install.log for more information I'll setup other env to see if it is reproducible. I have couple of doubt/concerns: 1. Disabling the replication agreement should be part of the ipa-server-uninstall itself, shouldn't it? 2. I have gone through the Doc[1], there is no mention of disabling the replication agreement. However man page for ipa-restore have some discussion on replication agreement. We should update the doc..?? 3. What about ipa-replica-manage re-initialize command, is it deprecated in favour of "ipa topologysegment-reinitialize" or both can be used to re-initialize the servers? [1] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/linux_domain_identity_authentication_and_policy_guide/#backup-restore The problems you're seeing are related to your attempt to legitimately manufacture a state where a server has gone away and the tools are working against you. Normally a master isn't uninstalled by a user to then be restored again. It goes away due to hardware or some other failure. 1. The agreement is deleted as part of the uninstall automatically in DL1. By disabling the agreement in advance we aren't replicating the changes out which is why Flo's suggestion works. 2. No this is due to your trying to force a bad situation 3. For DL1 the preference is to use the topology commands. They are more or less functionally equivalent for the time being. I used the test script provided by https://github.com/freeipa/freeipa/pull/948 and it shows an issue on the master: # /tmp/ipa-custodia-check `hostname` --verbose [2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: Platform: Linux-3.10.0-693.11.1.el7.x86_64-x86_64-with-redhat-7.4-Maipo [2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: IPA version: 4.5.0 [2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: IPA vendor version: 4.5.0-22.el7_4 [2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: Realm: TESTRELM.TEST [2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: Host: master.testrelm.test [2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: Remote server: master.testrelm.test [2017-11-07T10:47:05 ipa-custodia-tester] <WARNING>: Performing self-test only. [2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: File '/etc/ipa/default.conf' exists. [2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: File '/etc/krb5.keytab' exists. [2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: File '/etc/ipa/custodia/custodia.conf' exists. [2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: File '/etc/ipa/custodia/server.keys' exists. [2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: Custodia client created. [2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: Loaded key for usage 'sig' from '/etc/ipa/custodia/server.keys'. [2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: JWK KID matches host's service principal name 'host/master.testrelm.test'. [2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: Checked host LDAP keys 'host/master.testrelm.test' for usage sig. [2017-11-07T10:47:05 ipa-custodia-tester] <ERROR>: Host key in LDAP does not match local key. None [ERROR] One or more tests have failed. This means that the content of cn=sig/master.testrelm.test,cn=custodia,cn=ipa,cn=etc,dc=testrelm,dc=test does not match the content in /etc/ipa/custodia/server.keys. Can you try (on the master) to delete /etc/ipa/custodia/server.keys then run ipa-server-update (this should re-generate the server.keys file), then retry replica2 install? I didn't find ipa-server-update command, so ran ipa-server-upgrade after removing /etc/ipa/custodia/server.keys. It failed to upgrade by throwing : [root@master ~]# ipa-server-upgrade Upgrading IPA:. Estimated time: 1 minute 30 seconds [1/10]: stopping directory server [2/10]: saving configuration [3/10]: disabling listeners [4/10]: enabling DS global lock [5/10]: starting directory server [6/10]: updating schema [7/10]: upgrading server [8/10]: stopping directory server [9/10]: restoring configuration [10/10]: starting directory server Done. Update complete Upgrading IPA services Upgrading the configuration of the IPA services [Verifying that root certificate is published] [Migrate CRL publish directory] Publish directory already set to new location [Verifying that CA proxy configuration is correct] [Verifying that KDC configuration is using ipa-kdb backend] [Fix DS schema file syntax] [Removing RA cert from DS NSS database] [Enable sidgen and extdom plugins by default] [Updating HTTPD service IPA configuration] [Updating mod_nss protocol versions] [Updating mod_nss cipher suite] [Updating mod_nss enabling OCSP] [Fixing trust flags in /etc/httpd/alias] [Moving HTTPD service keytab to gssproxy] [Removing self-signed CA] [Removing Dogtag 9 CA] [Checking for deprecated KDC configuration files] [Checking for deprecated backups of Samba configuration files] [Add missing CA DNS records] Updating DNS system records [Removing deprecated DNS configuration options] [Ensuring minimal number of connections] [Updating GSSAPI configuration in DNS] [Updating pid-file configuration in DNS] [Enabling "dnssec-enable" configuration in DNS] [Setting "bindkeys-file" option in named.conf] [Including named root key in named.conf] [Checking global forwarding policy in named.conf to avoid conflicts with automatic empty zones] [Masking named] [Fix bind-dyndb-ldap IPA working directory] [Adding server_id to named.conf] Changes to named.conf have been made, restart named IPA server upgrade failed: Inspect /var/log/ipaupgrade.log and run command ipa-server-upgrade manually. Unexpected error - see /var/log/ipaupgrade.log for details: OSError: [Errno 2] No such file or directory: '/etc/ipa/custodia/server.keys' The ipa-server-upgrade command failed. See /var/log/ipaupgrade.log for more information So it failed for the same file which I deleted. Then I restored the master with backup previously taken and ran the ipa-server-upgrade command without removing the "/etc/ipa/custodia/server.keys". The i tried to install replica2 and it failed for the same error. I ran ipa-certupdate on master and replica1 and then tried installing the replica2 but failed for the same error. Christian, does this look familiar? Could it be that something isn't being backed up? Looks like ipa-backup does not backup custodia stuff. I'll work on a PR. Christian created https://pagure.io/freeipa/issue/7247 ipa-backup does not backup Custodia keys and file The issue is that the ipa-backup tool does not backup the /etc/ipa/custodia/custodia.conf and /etc/ipa/custodia/server.keys. Using steps from comment#24, second replica installation succeed. Replication is working. No error has been observed. |