Bug 1498523 - second replica installation fails when master is restored from backup
Summary: second replica installation fails when master is restored from backup
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: ipa
Version: 7.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: IPA Maintainers
QA Contact: ipa-qe
URL:
Whiteboard:
Keywords: Regression
Depends On: 1511607
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-04 14:07 UTC by Mohammad Rizwan
Modified: 2018-01-10 11:41 UTC (History)
8 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2018-01-10 11:41:13 UTC


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Bugzilla 1511607 None CLOSED ipa-backup does not backup Custodia keys and files 2019-05-16 01:47 UTC

Internal Trackers: 1511607

Description Mohammad Rizwan 2017-10-04 14:07:10 UTC
Description of problem:
second replica installation fails when master is restored from backup.

Version-Release number of selected component (if applicable):
ipa-server-4.5.0-21.el7_4.2.2.x86_64

How reproducible:
always

Steps to Reproduce:
1. Install Master
2. Install replica1
3. Take Backup of Master
4. Uninstall Master confogured in 1
5. Restore Master
6. Install replica2 against restored Master


Actual results:
replica installation failed with following error:
 ERROR    Certificate issuance failed (CA_REJECTED)

Expected results:
replica installation succeed

Additional info:
replica1 re-intilization with master succeed 

command ran :
ipa topologysegment-reinitialize domain segment-name --left

Comment 4 Rob Crittenden 2017-10-16 11:06:50 UTC
How reproducible is this? We'd need more logs to try to diagnose why the request is failing. Also need to know what services are configured on each of the masters (CA, DNS, etc).

Comment 5 Mohammad Rizwan 2017-10-16 12:34:35 UTC
It is always reproducible by following steps provided in the description

The master have CA, DNS, KDC, DS, httpd, ntpd services on it.

What logs you need? I have scraped the systems but it is reproducible so can set it up.

I will be setting up an environment and will share with you, but it will be available for only 4 days as we can reserve beaker system for that much period only.

Comment 10 Rob Crittenden 2017-10-17 11:37:56 UTC
Not sure how or if this is related to the master restore but the replica is requesting a cert from itself which doesn't have Apache configured yet which is why the request is rejected.

Number of certificates and requests being tracked: 1.
Request ID '20171017063153':
        status: CA_REJECTED
        ca-error: Server at https://replica2.testrelm.test/ipa/xml failed request, will retry: -504 (libcurl failed to execute the HTTP POST transaction, explaining:  Failed connect to replica2.testrelm.test:443; Connection refused).
        stuck: yes
        key pair storage: type=NSSDB,location='/etc/dirsrv/slapd-TESTRELM-TEST',nickname='Server-Cert',token='NSS Certificate DB',pinfile='/etc/dirsrv/slapd-TESTRELM-TEST/pwdfile.txt'
        certificate: type=NSSDB,location='/etc/dirsrv/slapd-TESTRELM-TEST',nickname='Server-Cert'
        CA: IPA
        issuer: 
        subject: 
        expires: unknown
        pre-save command: 
        post-save command: /usr/libexec/ipa/certmonger/restart_dirsrv TESTRELM-TEST
        track: yes
        auto-renew: yes

Comment 11 Rob Crittenden 2017-10-17 14:15:30 UTC
I'm assuming your doing this as domain level 1.

Given that the client that will become the replica should have:

server=master.example.com

in /etc/ipa/default.conf. The promotion process uses this server as the bootstrap server because not all servers have been setup on the new replica yet.

Right after 389-ds is configure it rewrites /etc/ipa/default.conf to drop the server entry and it adds the options for the CA (enable_ra, dogtag_version, etc).

Can you re-run this and before promoting the replica see if there is a server value in /etc/ipa/default.conf?

Comment 13 Rob Crittenden 2017-10-18 10:54:53 UTC
It is doing its own promotion. The client gets configured first.

I reinstalled it and paused before certmonger is executed. As I had suspected the xmlrpc_uri is pointing to the local server instead of the remote master which is why it is failing.

I'll need to poke at the code further. AFAIU it isn't supposed to rewrite /etc/ipa/default.conf until after this step specifically for this reason.

Comment 14 Rob Crittenden 2017-10-19 11:08:14 UTC
Have been unable to reproduce locally so far and default.conf is similar, pointing to the new replica, so I think that's a dead-end. Going to look to see what certmonger is doing on the working replica I have, then see what it is doing on the reproducing machines.

Comment 15 Rob Crittenden 2017-10-20 10:33:28 UTC
I believe the issue is related to the server directive in /etc/ipa/default.conf. My working reproducer has it set to the remote master, replica2 in this case does not.

certmonger will generate the xmlrpc_uri in the case that the server directive exists otherwise it will use the xmlrpc_uri defined in default.conf. This is why it is trying to talk to itself: no server directive.

I've yet to determine under what condition(s) this value is being set.

Comment 16 Rob Crittenden 2017-10-27 18:31:03 UTC
Previous findings were red herrings. While server directive will tell certmonger which master to go this isn't normally necessary. In case the xmlrcp_uri is incorrect certmonger will look for available CA's against a configured master.

Because the client is configured first /etc/openldap/ldap.conf is already configured properly. I confirmed that two CAs are returned for the equivalent search:

ldapsearch -Y GSSAPI -H ldaps://master.testrelm.test  -b cn=masters,cn=ipa,cn=etc,dc=testrelm,dc=test "(&(objectClass=ipaconfigobject)(cn=CA)(ipaConfigString=enabledservice))"

certmonger should be picking one of these to resubmit the request to. Still working out why this isn't happening.

Comment 17 Rob Crittenden 2017-10-27 18:55:57 UTC
Ok, the whole certificate thing itself is a red herring. strace led the way with

15907 write(2, "Fault 2100: (RPC failed at server.  Insufficient access:  Invalid credentials).\n", 80) = 80

I confirmed that replica1.testrelm.test had no knowledge of replica2.testrelm.test hence the failure in obtaining the certificate (I used kvno).

The current replication status with master.testrelm.test is:

Error (-2) Problem connecting to replica - LDAP error: Local error (connection error)

So I need the exact steps taken to uninstall master and restore it from backup.

When there are existing masters a full re-init on the other masters is necessary after the restore. Did you run this on replica1:

# ipa-replica-manage re-initialize --from=master.testrelm.net

Comment 18 Mohammad Rizwan 2017-10-30 06:42:16 UTC
I uninstall master with the following command:

1. ipa-server-install -U --uninstall

2. ipa-restore <backup-file>

3. on replica1 ran following:
   $ ipa topologysegment-reinitialize domain segment-name --left

I ran above command for re-initialization and not ipa-replica-manage re-initialize.

Comment 19 Mohammad Rizwan 2017-10-31 09:27:14 UTC
Second replica installation failed with same error.


I observed the "unknown host error" while re-initiliazing replica.

[root@replica1 ~]# ipa-replica-manage re-initialize --from=master.testrelm.test
Unknown host master.testrelm.test: Host 'master.testrelm.test' does not have corresponding DNS A/AAAA record

I added the dns A record for master manually on the replica1. I used following command:

[root@replica1 ~]# ipa dnsrecord-mod testrelm.test master --a-rec <master-ip-address>
  Record name: master
  A record: <master-ip-address>
  SSHFP record: 1 1 8FEFCC37BC3B34899F7E2B0F6AA08408A425370F, 1 2 B0250CE035B5FC4CB818D19513D7C73CB7D0426B746189DD3EC3CD4D 63B8A657, 3 1
                5D08FFF93CAEE4D3E5690E164D4378AA6DE4E62F, 3 2 4041C9E9C1BDC9AC7A512430C64A344169CC768F81B904A695463177 4D205855, 4 1
                F3F6B7363C8872E16ECE2523FC36C5E5C300AF8B, 4 2 48FFCDF5BEA6009A324CDE5F0E877E23A42BC0F6CAD3E6F069CC97E0 865BC105


Again I tried re-initialize the replica but got following error (I used kinit  admin):

ACIError: Insufficient access: SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information (Server ldap/master.testrelm.test@TESTRELM.TEST not found in Kerberos database)

Here is full command I ran and traceback:

[root@bkr-hv01-guest26 ~]# ipa-replica-manage re-initialize --from=master.testrelm.test --verbose
Traceback (most recent call last):
  File "/usr/sbin/ipa-replica-manage", line 1615, in <module>
    main(options, args)
  File "/usr/sbin/ipa-replica-manage", line 1558, in main
    options.nolookup)
  File "/usr/sbin/ipa-replica-manage", line 1200, in re_initialize
    repl = replication.ReplicationManager(realm, fromhost, dirman_passwd)
  File "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py", line 222, in __init__
    self.conn.gssapi_bind()
  File "/usr/lib/python2.7/site-packages/ipapython/ipaldap.py", line 1124, in gssapi_bind
    '', auth_tokens, server_controls, client_controls)
  File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/lib/python2.7/site-packages/ipapython/ipaldap.py", line 1007, in error_handler
    raise errors.ACIError(info=info)
ACIError: Insufficient access: SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information (Server ldap/master.testrelm.test@TESTRELM.TEST not found in Kerberos database)
Unexpected error: Insufficient access: SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information (Server ldap/master.testrelm.test@TESTRELM.TEST not found in Kerberos database)

Comment 20 Rob Crittenden 2017-11-01 19:14:47 UTC
I think the issue lies with the procedure.

As I can follow the steps they are:

- install master
- install replica1
- backup master
- uninstall master <---- this is where things go wrong
- restore master

When master is uninstalled the records for master on replica1 are removed. This is why the re-initialize is failing, because it has no knowledge of master to reconnect with.

You could try disabling the replication agreement by setting nsds5ReplicaEnabled=off in the agreement on both sides before uninstalling.

I think that re-initializing it will re-enable it on replica1 but you'd need to double-check that.

I don't understand why you are changing the SELinux booleans at the same time. Is that to confirm that they get set properly during the restore?

Comment 21 Mohammad Rizwan 2017-11-02 13:33:05 UTC
It was a legacy code where it was changing the SELinux boolean. I thought it is relevant thats why it is there.

I will try by disabling nsds5ReplicaEnabled  on both servers before uninstalling master.

I found doc[1] where it says that we need to enable the nsds5ReplicaEnabled manually in order to re-ebnable the replication agreement.


[1] https://access.redhat.com/documentation/en-us/red_hat_directory_server/9.0/html/administration_guide/disabling-replication

Comment 22 Florence Blanc-Renaud 2017-11-03 09:14:54 UTC
So far, I found 2 issues with ipa-restore.

1/ ipa-restore is broken with python2, see ticket https://pagure.io/freeipa/issue/7231
2/ ipa-restore does not enable oddjobd, see ticket https://pagure.io/freeipa/issue/7234

There are still other issues, for instance when ipa-server-install --uninstall is run, the service entry dn: krbprincipalname=ldap/master.domain.com@DOMAIN.COM is removed from the replica. This entry is used for replication authentication from master to replica.
Thus when calling ipa-replica-manage re-initialize --from=master.domain.com on the replica, the command fails with :
Directory Manager password: 

Re-run /usr/sbin/ipa-replica-manage with --verbose option to get more information
Unexpected error: Insufficient access:  Invalid credentials


I am investigating if ipa-server-install --uninstall should really delete the entry or simply disable the replication agreement. The deletion was added in commit https://github.com/freeipa/freeipa/commit/31ffe1a12922b5118c847cbd6ac1ca9ea232ef94

Comment 23 Rob Crittenden 2017-11-03 12:50:49 UTC
Doing the uninstall isn't really mimicking what was expected in the real world (e.g. machine dying horrible death).

IMHO ipa-server-install is working properly. The user is requesting that the maste r go away and IPA is obliging.

There are several possible scenarios for restoration:

1. Your master simply died and you want to re-install it (hardware failure)
2. Somebody did something really bad and you need to restore data to an older state
3. The master is somehow hosed and restoring seems like a good way to get it back (it isn't)

For #2 in particular all replication agreements need to be killed because otherwise MMR will happily shove down all the "bad" data again after the restore. By doing a restore it is a way of saying "This is the data I want" hence all the re-initialize.

I think for this test disabling the replication agreement(s) before calling the uninstall will have the desired effect. The entry can be deleted without propagating the deletion of the needed entries. What I'm less sure of is whether 389-ds will automatically re-enable the agreement on re-initialize.

Comment 24 Florence Blanc-Renaud 2017-11-03 15:40:13 UTC
With the following procedure, the 2nd replica can successfully be installed:

1. configure master with ipa-server-install
2. configure replica1 with ipa-replica-install --server=master.domain.com ...
3. back-up master with ipa-backup
4. perform ldapmodify on the master to disable replication:
$ ldapmodify -h master.domain.com -p 389 -D "cn=directory manager" -W
dn: cn=meToreplica1.domain.com,cn=replica,cn=dc\3Ddomain\2Cdc\3Dcom,cn=mapping tree,cn=config
changetype: modify
replace: nsds5ReplicaEnabled
nsds5ReplicaEnabled: off

dn: cn=caToreplica1.domain.com,cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config
changetype: modify
replace: nsds5ReplicaEnabled
nsds5ReplicaEnabled: off

5. uninstall server with ipa-server-install --uninstall -U. As replication is already disabled, the repl agreements and service entries are not deleted on replica1

6. restore the master with ipa-restore /path/to/backup. The new master contains the replication agreements and service entries.

7. on the replica1, re-initialize replication with:
$ ipa-replica-manage re-initialize --from=master.domain.com
$ ipa-csreplica-manage re-initialize --from=master.domain.com

At this point the replication is working.

8. install the second replica with ipa-replica-install --server=master.domain.com ...

No issue with the 2nd replica installation when the master<->replica1 replication is working.

Comment 25 Mohammad Rizwan 2017-11-06 12:27:26 UTC
Using steps provided in comment #24, replication between master <-> relica1 is working fine. 

However it is failing to install replica2 and throwing the error:

ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall): ERROR    406 Client Error: Failed to validate message: No recipient matched the provided key["Failed: [ValueError('Decryption failed.',)]"]
ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall): ERROR    The ipa-replica-install command failed. See /var/log/ipareplica-install.log for more information

I'll setup other env to see if it is reproducible.

I have couple of doubt/concerns:

1. Disabling the replication agreement should be part of the ipa-server-uninstall  itself, shouldn't it?

2. I have gone through the Doc[1], there is no mention of disabling the replication agreement. However man page for ipa-restore have some discussion on replication agreement. We should update the doc..??

3. What about ipa-replica-manage re-initialize command, is it deprecated in favour of "ipa topologysegment-reinitialize" or both can be used to re-initialize the servers?


[1] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/linux_domain_identity_authentication_and_policy_guide/#backup-restore

Comment 26 Rob Crittenden 2017-11-06 14:21:29 UTC
The problems you're seeing are related to your attempt to legitimately manufacture  a state where a server has gone away and the tools are working against you.

Normally a master isn't uninstalled by a user to then be restored again. It goes away due to hardware or some other failure.

1. The agreement is deleted as part of the uninstall automatically in DL1. By disabling the agreement in advance we aren't replicating the changes out which is why Flo's suggestion works.

2. No this is due to your trying to force a bad situation

3. For DL1 the preference is to use the topology commands. They are more or less functionally equivalent for the time being.

Comment 28 Florence Blanc-Renaud 2017-11-07 16:00:47 UTC
I used the test script provided by https://github.com/freeipa/freeipa/pull/948 and it shows an issue on the master:

# /tmp/ipa-custodia-check `hostname` --verbose
[2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: Platform: Linux-3.10.0-693.11.1.el7.x86_64-x86_64-with-redhat-7.4-Maipo
[2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: IPA version: 4.5.0
[2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: IPA vendor version: 4.5.0-22.el7_4
[2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: Realm: TESTRELM.TEST
[2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: Host: master.testrelm.test
[2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: Remote server: master.testrelm.test
[2017-11-07T10:47:05 ipa-custodia-tester] <WARNING>: Performing self-test only.
[2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: File '/etc/ipa/default.conf' exists.
[2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: File '/etc/krb5.keytab' exists.
[2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: File '/etc/ipa/custodia/custodia.conf' exists.
[2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: File '/etc/ipa/custodia/server.keys' exists.
[2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: Custodia client created.
[2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: Loaded key for usage 'sig' from '/etc/ipa/custodia/server.keys'.
[2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: JWK KID matches host's service principal name 'host/master.testrelm.test@TESTRELM.TEST'.
[2017-11-07T10:47:05 ipa-custodia-tester] <INFO>: Checked host LDAP keys 'host/master.testrelm.test@TESTRELM.TEST' for usage sig.
[2017-11-07T10:47:05 ipa-custodia-tester] <ERROR>: Host key in LDAP does not match local key.
None
[ERROR] One or more tests have failed.


This means that the content of cn=sig/master.testrelm.test,cn=custodia,cn=ipa,cn=etc,dc=testrelm,dc=test does not match the content in /etc/ipa/custodia/server.keys.

Can you try (on the master) to delete /etc/ipa/custodia/server.keys then run ipa-server-update (this should re-generate the server.keys file), then retry replica2 install?

Comment 29 Mohammad Rizwan 2017-11-08 10:54:58 UTC
I didn't find ipa-server-update command, so ran ipa-server-upgrade after removing 
/etc/ipa/custodia/server.keys. It failed to upgrade by throwing :

[root@master ~]# ipa-server-upgrade 
Upgrading IPA:. Estimated time: 1 minute 30 seconds
  [1/10]: stopping directory server
  [2/10]: saving configuration
  [3/10]: disabling listeners
  [4/10]: enabling DS global lock
  [5/10]: starting directory server
  [6/10]: updating schema
  [7/10]: upgrading server
  [8/10]: stopping directory server
  [9/10]: restoring configuration
  [10/10]: starting directory server
Done.
Update complete
Upgrading IPA services
Upgrading the configuration of the IPA services
[Verifying that root certificate is published]
[Migrate CRL publish directory]
Publish directory already set to new location
[Verifying that CA proxy configuration is correct]
[Verifying that KDC configuration is using ipa-kdb backend]
[Fix DS schema file syntax]
[Removing RA cert from DS NSS database]
[Enable sidgen and extdom plugins by default]
[Updating HTTPD service IPA configuration]
[Updating mod_nss protocol versions]
[Updating mod_nss cipher suite]
[Updating mod_nss enabling OCSP]
[Fixing trust flags in /etc/httpd/alias]
[Moving HTTPD service keytab to gssproxy]
[Removing self-signed CA]
[Removing Dogtag 9 CA]
[Checking for deprecated KDC configuration files]
[Checking for deprecated backups of Samba configuration files]
[Add missing CA DNS records]
Updating DNS system records
[Removing deprecated DNS configuration options]
[Ensuring minimal number of connections]
[Updating GSSAPI configuration in DNS]
[Updating pid-file configuration in DNS]
[Enabling "dnssec-enable" configuration in DNS]
[Setting "bindkeys-file" option in named.conf]
[Including named root key in named.conf]
[Checking global forwarding policy in named.conf to avoid conflicts with automatic empty zones]
[Masking named]
[Fix bind-dyndb-ldap IPA working directory]
[Adding server_id to named.conf]
Changes to named.conf have been made, restart named
IPA server upgrade failed: Inspect /var/log/ipaupgrade.log and run command ipa-server-upgrade manually.
Unexpected error - see /var/log/ipaupgrade.log for details:
OSError: [Errno 2] No such file or directory: '/etc/ipa/custodia/server.keys'
The ipa-server-upgrade command failed. See /var/log/ipaupgrade.log for more information

So it failed for the same file which I deleted.

Then I restored the master with backup previously taken and ran the ipa-server-upgrade command without removing the "/etc/ipa/custodia/server.keys". 

The i tried to install replica2 and it failed for the same error.

I ran ipa-certupdate on master and replica1 and then tried installing the replica2 but failed for the same error.

Comment 30 Rob Crittenden 2017-11-08 13:02:11 UTC
Christian, does this look familiar? Could it be that something isn't being backed up?

Comment 31 Christian Heimes 2017-11-08 13:20:28 UTC
Looks like ipa-backup does not backup custodia stuff. I'll work on a PR.

Comment 32 Florence Blanc-Renaud 2017-11-08 14:34:52 UTC
Christian created https://pagure.io/freeipa/issue/7247  ipa-backup does not backup Custodia keys and file

The issue is that the ipa-backup tool does not backup the /etc/ipa/custodia/custodia.conf and /etc/ipa/custodia/server.keys.

Comment 36 Mohammad Rizwan 2018-01-05 07:39:28 UTC
Using steps from comment#24, second replica installation succeed.
Replication is working. No error has been observed.


Note You need to log in before you can comment on or make changes to this bug.