Bug 1492560 - ipa-replica-install --setup-kra broken on DL0 [rhel-7.4.z]
Summary: ipa-replica-install --setup-kra broken on DL0 [rhel-7.4.z]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pki-core
Version: 7.4
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Fraser Tweedale
QA Contact: Asha Akkiangady
Petr Bokoc
URL:
Whiteboard:
Depends On: 1486225
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-18 07:27 UTC by Oneata Mircea Teodor
Modified: 2017-11-30 15:32 UTC (History)
12 users (show)

Fixed In Version: pki-core-10.4.1-16.el7_4
Doc Type: Bug Fix
Doc Text:
An earlier change to PKCS #12 password encoding in NSS caused Certificate System to fail to import PKCS #12 files. As a consequence, CA clone installation could not be completed. With this update, PKI will retry a failed PKCS #12 decryption with a different password encoding, which allows it to import PKCS #12 files produced by both old and new versions of NSS, and CA clone installation succeeds.
Clone Of: 1486225
Environment:
Last Closed: 2017-11-30 15:32:00 UTC


Attachments (Terms of Use)
Console logs for bug verfication (8.71 KB, text/plain)
2017-11-02 05:13 UTC, anuja
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:3301 normal SHIPPED_LIVE pki-core bug fix and enhancement update 2017-11-30 20:14:57 UTC

Description Oneata Mircea Teodor 2017-09-18 07:27:23 UTC
This bug has been copied from bug #1486225 and has been proposed to be backported to 7.4 z-stream (EUS).

Comment 6 Fraser Tweedale 2017-09-22 07:01:31 UTC
Add doc text.

Comment 7 anuja 2017-10-04 08:51:58 UTC
How reproducible:Always

Steps to reproduce:
Version:
ipa-server-4.5.0-21.el7_4.2.2.x86_64

Master setup: 
ipa-server-install --setup-dns --forwarder --domain --realm --setup-kra --admin-password --ds-password --allow-zone-overlap --domain-level=0 -U

Prepare-Replica-file
ipa-replica-prepare -p --ip-address replica.hostname.com

ipa-replica-install -U --setup-dns --forwarder --allow-zone-overlap --setup-ca --admin-password --password --setup-kra replica-info-replicahostname.gpg 

Actual results:

[1/7]: configuring KRA instance
ipa.ipaserver.install.krainstance.KRAInstance: CRITICAL Failed to configure KRA instance: Command '/usr/sbin/pkispawn -s KRA -f /tmp/tmpmGBgN2' returned non-zero exit status 1
ipa.ipaserver.install.krainstance.KRAInstance: CRITICAL See the installation logs and the following files/directories for more information:
ipa.ipaserver.install.krainstance.KRAInstance: CRITICAL   /var/log/pki/pki-tomcat
  [error] RuntimeError: KRA configuration failed.
Your system may be partly configured.
Run /usr/sbin/ipa-server-install --uninstall to clean up.

ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall): ERROR    KRA configuration failed.
ipa.ipapython.install.cli.install_tool(CompatServerReplicaInstall): ERROR    The ipa-replica-install command failed. See /var/log/ipareplica-install.log for more information



Based on above information marking status as assigned

Comment 10 Fraser Tweedale 2017-10-05 13:54:20 UTC
I now strongly suspect we're hitting the same problem as
https://bugzilla.redhat.com/show_bug.cgi?id=1402280.

The EE url only gets hit when the admin URL fails.  I presume it is because
of this error, but need to see the /var/log/pki/pki-tomcat/kra/debug of
the replica from the latest failure, to confirm.

Comment 12 Fraser Tweedale 2017-10-05 14:26:35 UTC
Nihkil, thanks.  It is confirmed to be the same issue as
https://bugzilla.redhat.com/show_bug.cgi?id=1402280, or at least,
the symptoms are identical.  I have reproduced it with current master
builds of FreeIPA and Dogtag, and will continue analysis ASAP.

Comment 13 Fraser Tweedale 2017-10-06 05:56:31 UTC
Further details:

- The problem doesn't occur on DL1.

- The problem does occur on DL0, when installing KRA replica separately
  from main KRA installation, via `ipa-kra-install <replica-file>`

Next step is to examine the pkispawn configuration files of DL0 vs DL1 for differences, and then to make some instrumented builds to get more info about what is happening.

Comment 14 Fraser Tweedale 2017-10-06 12:37:00 UTC
I haven't got to the bottom of this yet.  Here's what I know:

- KRA installer hits /ca/admin/ca/updateNumberRange on the master
                     ^^^^     ^^^^
  Is it correct that KRA is hitting the CA app?  I guess it is fine because
  DL1 installation works, but I thought it was worth noting/asking.

- This servlet attempts to authenticate the client via TokenAuthentication.
  This results in another HTTP call back into the master to
  /ca/admin/ca/tokenAuthentication which fails with a NullPointerException.
  Not sure why yet.

- There may be other places a NPE could be raised

Someone who is more familiar with cloning and security domain stuff might
want to push forward on this.  Otherwise I'll continue on Monday.

Comment 15 Ade Lee 2017-10-06 15:42:12 UTC
Please provide log files.

Comment 16 Ade Lee 2017-10-06 19:05:13 UTC
Incidentally, it does not make sense that the updateNumberRange servlet on the CA is being contacted.

Comment 17 Fraser Tweedale 2017-10-07 00:06:12 UTC
That's interesting, Ade.  Yesterday I changed it to contact the KRA updateNumberRange servlet and tested.  This was then failing with NPE due to
the TokenAuthentication "session table" being null.

Logs snippets from my slightly-more-instrumented slightly-less-NPE-prone build,
which contacts the KRA updateNumberRange resource instead of CA when configuring KRA.

===== master /var/log/pki/pki-tomcat/kra/debug =====
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-15]: UpdateNumberRange: initializing...
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-15]: according to ccMode, authorization for servlet: kraUpdateNumberRange is LDAP based, not XML {1}, use default authz mgr: {2}.
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-15]: UpdateNumberRange: done initializing...
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-15]: CMSServlet:service() uri = /kra/admin/kra/updateNumberRange
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-15]: CMSServlet::service() param name='xmlOutput' value='true'
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-15]: CMSServlet::service() param name='sessionID' value='3442776556571635063'
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-15]: CMSServlet::service() param name='type' value='request'
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-15]: CMSServlet: kraUpdateNumberRange start to service.
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-15]: UpdateNumberRange: processing...
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-15]: UpdateNumberRange process: authentication starts
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-15]: IP: 192.168.124.165
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-15]: AuthMgrName: TokenAuth
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-15]: CMSServlet: no client certificate found
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-15]: TokenAuthentication: start
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-15]: TokenAuthentication: content={hostname=[192.168.124.165], sessionID=[3442776556571635063]}
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-15]: ConfigurationUtils: POST https://f27-1.ipa.local:443/kra/admin/kra/tokenAuthenticate
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-18]: according to ccMode, authorization for servlet: kraTokenAuthenticate is LDAP based, not XML {1}, use default authz mgr: {2}.
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-18]: CMSServlet:service() uri = /kra/admin/kra/tokenAuthenticate
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-18]: CMSServlet::service() param name='hostname' value='192.168.124.165'
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-18]: CMSServlet::service() param name='sessionID' value='3442776556571635063'
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-18]: CMSServlet: kraTokenAuthenticate start to service.
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-18]: TokenAuthentication: sessionId=3442776556571635063
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-18]: TokenAuthentication: givenHost=192.168.124.165
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-18]: TokenAuthentication: checking session in the session table
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-18]: TokenAuthentication: session table is null
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-15]: TokenAuthentication: response: <?xml version="1.0" encoding="UTF-8" standalone="no"?><XMLResponse><Status>1</Status><Error>Error: session table is null</Error></XMLResponse>
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-15]: TokenAuthentication: status=1
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-15]: SignedAuditLogger: event AUTH_FAIL
[06/Oct/2017:22:47:24][ajp-nio-127.0.0.1-8009-exec-18]: CMSServlet: curDate=Fri Oct 06 22:47:24 AEDT 2017 id=kraTokenAuthenticate time=12


===== replica /var/log/pki/pki-tomcat/kra/debug =====
[06/Oct/2017:22:47:24][http-bio-8443-exec-3]: === Subsystem Configuration ===
[06/Oct/2017:22:47:24][http-bio-8443-exec-3]: SystemConfigService: validate clone URI: https://f27-1.ipa.local:443
[06/Oct/2017:22:47:24][http-bio-8443-exec-3]: SystemConfigService: get configuration entries from master
[06/Oct/2017:22:47:24][http-bio-8443-exec-3]: updateNumberRange start host=f27-1.ipa.local adminPort=443 eePort=443
[06/Oct/2017:22:47:24][http-bio-8443-exec-3]: updateNumberRange content: {xmlOutput=[true], sessionID=[3442776556571635063],
 type=[request]}
[06/Oct/2017:22:47:24][http-bio-8443-exec-3]: ConfigurationUtils: POST https://f27-1.ipa.local:443/kra/admin/kra/updateNumbe
rRange
[06/Oct/2017:22:47:24][http-bio-8443-exec-3]: Server certificate:
[06/Oct/2017:22:47:24][http-bio-8443-exec-3]:  - subject: CN=f27-1.ipa.local,O=IPA.LOCAL 201710061647
[06/Oct/2017:22:47:24][http-bio-8443-exec-3]:  - issuer: CN=Certificate Authority,O=IPA.LOCAL 201710061647
[06/Oct/2017:22:47:24][http-bio-8443-exec-3]: content from admin interface =<HTML>
<BODY BGCOLOR=white>
<P>
The Certificate System has encountered an unrecoverable error.
<P>
Error Message:<BR>
<I>java.lang.NullPointerException</I>
<P>
Please contact your local administrator for assistance.
</BODY>
</HTML>


[06/Oct/2017:22:47:24][http-bio-8443-exec-3]: updateNumberRange: Failed to contact master using admin portorg.xml.sax.SAXPar
seException; lineNumber: 2; columnNumber: 15; Open quote is expected for attribute "BGCOLOR" associated with an  element type  "BODY".
[06/Oct/2017:22:47:24][http-bio-8443-exec-3]: updateNumberRange: Attempting to contact master using EE port
[06/Oct/2017:22:47:24][http-bio-8443-exec-3]: ConfigurationUtils: POST https://f27-1.ipa.local:443/kra/ee/kra/updateNumberRange
javax.ws.rs.NotFoundException: HTTP 404 Not Found
        at org.jboss.resteasy.client.jaxrs.internal.ClientInvocation.handleErrorStatus(ClientInvocation.java:201)
        at org.jboss.resteasy.client.jaxrs.internal.ClientInvocation.extractResult(ClientInvocation.java:174)
        at org.jboss.resteasy.client.jaxrs.internal.ClientInvocation.invoke(ClientInvocation.java:473)
        at org.jboss.resteasy.client.jaxrs.internal.ClientInvocationBuilder.post(ClientInvocationBuilder.java:201)
        at com.netscape.certsrv.client.PKIConnection.post(PKIConnection.java:509)
        at com.netscape.cms.servlet.csadmin.ConfigurationUtils.post(ConfigurationUtils.java:238)
        at com.netscape.cms.servlet.csadmin.ConfigurationUtils.updateNumberRange(ConfigurationUtils.java:661)
        at com.netscape.cms.servlet.csadmin.ConfigurationUtils.getConfigEntriesFromMaster(ConfigurationUtils.java:558)
        at org.dogtagpki.server.rest.SystemConfigService.configureClone(SystemConfigService.java:806)
        at org.dogtagpki.server.rest.SystemConfigService.configureSubsystem(SystemConfigService.java:939)
        at org.dogtagpki.server.rest.SystemConfigService.configure(SystemConfigService.java:143)
        at org.dogtagpki.server.rest.SystemConfigService.configure(SystemConfigService.java:100)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:139)
        at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:295)
        at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:249)
        at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:236)
        at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:402)
        at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:209)
        at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:221)
        at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:56)
        at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:51)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:293)
        at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:290)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAsPrivileged(Subject.java:549)
        at org.apache.catalina.security.SecurityUtil.execute(SecurityUtil.java:325)
        at org.apache.catalina.security.SecurityUtil.doAsPrivilege(SecurityUtil.java:176)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:286)
        at org.apache.catalina.core.ApplicationFilterChain.access$000(ApplicationFilterChain.java:56)
        at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:190)
        at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:186)
        at java.security.AccessController.doPrivileged(Native Method)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:185)
        at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:293)
        at org.apache.catalina.security.SecurityUtil$1.run(SecurityUtil.java:290)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAsPrivileged(Subject.java:549)
        at org.apache.catalina.security.SecurityUtil.execute(SecurityUtil.java:325)
        at org.apache.catalina.security.SecurityUtil.doAsPrivilege(SecurityUtil.java:264)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:237)
        at org.apache.catalina.core.ApplicationFilterChain.access$000(ApplicationFilterChain.java:56)
        at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:190)
        at org.apache.catalina.core.ApplicationFilterChain$1.run(ApplicationFilterChain.java:186)
        at java.security.AccessController.doPrivileged(Native Method)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:185)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:94)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:504)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
        at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:502)
        at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1132)
        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
        at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:283)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:748)
[06/Oct/2017:22:52:24][http-bio-8443-exec-4]: SignedAuditLogger: event ACCESS_SESSION_TERMINATED
(ends)


Note that the NPE that originally occurred - due to session table being null - is
handled in this build, but an NPE still happens somewhere as indicated in the response
returned by the master to the replica.  Unlike that first NPE, there is nothing in the
journal or Tomcat logs to indicate where that NPE gets raised, but the output appears
to be produced by CMSServlet.renderFinalError() so modifying that method to log the
exception should reveal the location.  That was/is to be the next step in my analysis.

Comment 18 Ade Lee 2017-10-09 17:05:07 UTC
Please attach the entire log for both the CA and KRA.  Its hard to get a sense of context.  Also the pkispawn parameters being used and the proxy.conf file.  It seems like servlets are getting routed/rewritten incorrectly perhaps.

The basic idea is:

1. Clone KRA contacts the security domain CA to get an install token,  Thast the value 3442776556571635063 in the log above.  There is a session table that is maintained on the CA.

2. Clone KRA contacts the master KRA servlet /kra/admin/kra/updateNumberRange, passing in the install token.

3. The master KRA contacts the security domain CA to validate the token.  This is where things get weird - because it looks like the KRA is trying to contact another KRA or maybe even itself, rather than the CA - which of course is the reason that the session table does not exist.

Looking at the code, it seems that the ability to run a token authenticate  servlet on the KRA was added in 47c77a67, to allow for a CA -less KRA install.  This, as far as I know, is certainly not what we expect to be installing in IPA.

Looking at the auth code now ..

Comment 19 Fraser Tweedale 2017-10-10 04:02:35 UTC
I've made some progress on the analysis.  First of all,
I was wrong earlier when I said KRA contacted /ca/admin/ca/updateNumberRange.
What I meant was that it contacts /ca/admin/ca/tokenAuthentication which,
now that I have a better understanding of how this part of cloning works,
is the correct behaviour.  For updateNumberRange it is contacting
/kra/admin/kra/updateNumberRange which is also correct.

I believe the issue arises as an authentication problem because:

1. the security domain session gets created on the REPLICA CA instance
2. the /updateNumberRange (and reentrant /tokenAuthentication request) is
   performed against the MASTER
3. LDAP replication lag means that the MASTER is not aware of the session,
   and it rejects the authentication.

The investigation now turns to working out how to get (1) and (2) happening
on the same host - presumably the MASTER's CA instance because that is where
the MASTER's KRA will validate the token during /updateNumberRange.

Comment 20 Fraser Tweedale 2017-10-10 05:29:35 UTC
This can be solved by an adjustment to the pkispawn configuration,
i.e. on the IPA side; set

  [KRA]
  pki_security_domain_hostname = $MASTER_FQDN

(It is currently set to the replica FQDN.)

That resolves this specific scenario, but the general problem of
the possibility of a mismatch between where the security domain session
gets created, and where the token gets checked, remains - but it is
controlled by a couple of things:

1. the pki_security_domain_hostname (pkispawn config) setting for the
   CLONE.

2. the securitydomain.host (CS.cfg) setting of the MASTER from which the
   clone is being created.

We can't really guarantee that these settings will be in alignment,
but I can see a few ways to approach this problem, with different
degrees of effort and different effectiveness of solving the problem:

a) [low effort; low effectiveness]
   Don't make any changes to Dogtag itself; just document that
   the pki_security_domain_hostname value should match the
   securitydomain.host setting of the subsystem being cloned.

b) [low effort; high effectiveness (probably)]
   introduce a delay after security domain session creation to allow for
   LDAP replication to take place (say, 5s).  This is just a mitigation,
   not a guaranteed solution.

c) [high effort; perfect effectiveness]
   1. Add or enhance a security domain resource to allow a client to query
      what an individual server's own security domain settings are
      (i.e. to learn its securitydomain.host setting)
   2. Update the SystemConfigService to query this resource on the subsystem
      being cloned
   3. Use the returned data to create the security domain session on the
      same domain manager that will be used to verify the session token
      during clone configuration, ignoring the pki_security_domain_hostname
      config.

d) Do nothing at in in PKI (but fix IPA's pkispawn settings, of course).
   Leave admins in the dark if they encounter this issue when deploying
   Dogtag themselves :)

Let me know your thoughts.

Comment 21 Ade Lee 2017-10-10 19:07:24 UTC
So, just to confirm then, if IPA's pkispawn settings are set "correctly" - that is - all pointing to the same master CA- then cloning works?  Can we verify this before trying to embark on a solution to the replication problem?

As for the problem you suggest, ...

One thing that we should keep in mind is that security domain interactions can be a little more complex than the simple clone-the-KRA scenario.  

For instance, when installing a TPS, the TPS needs to contact the CA, KRA and TKS to set stuff up.  That means that each of those subsystems needs to contact the security domain.  So in your solution (c) above, you'd need the client to contact all three of the subsystems - and hope that the security domain matches.

Similarly, solution (a) above fails as well.  What do you document if the security domain differs for the above subsystems?

This is a case where some information about a replication topology stored in ldap would have been valuable, because then we could potentially query all security domain CAs.  We don't have that, though, and adding it would be a lot of work.

I think that ultimately a better solution would be to create some kind of signed token system.  That is what I would suggest for 10.6.  So the security domain could issue a signed token -- maybe even ahead of time -- which would contain the necessary information (host, system_type, ticket expiration) - the same stuff that is in the security domain.  The token would be signed by the security domain CA.  The client would simply need to validate the signature without actually needing to contact the security domain CA.  As all security domain CAs are clones of each other, they share the same CA signing cert.  If we need to encrypt the data, then the process is a little more complicated, but we still could contact any security domain.

So, based on above, I'm not excited to make changes that are too large - particularly given the current schedule.

Therefore, I'm inclined to (b) or (d).  If we do (b), we should make sure the delay is configurable.

Comment 22 Fraser Tweedale 2017-10-10 23:34:58 UTC
Ade, yes, a signed or HMAC'd token is a good idea!  I've filed a ticket:
https://pagure.io/dogtagpki/issue/2831.

I have verified that changing the pkispawn
settings so that we create the security domain session on the host that's
being cloned does avoid this issue.

But...

The resolution on the IPA side was NACKed - it re-introduces an issue
where creating a clone of a clone fails if the original subsystem was
deleted (because the first clone's security domain settings point to
the removed host).  There is a workaround (fixup the KRA clone's security
domain settings to point to the same host *after* clone configuration
has finished), but I'll implement the delay (suggestion (b)) in pkispawn
and see how that goes.

Comment 23 Fraser Tweedale 2017-10-11 07:19:33 UTC
Gerrit review for adding a short sleep after Security Domain login:
https://review.gerrithub.io/#/c/382097/ (several small patches in patch set).

The default of 5s is confirmed to make the IPA KRA replica install happy
(in my environment ^_^).

Comment 24 Ade Lee 2017-10-26 17:26:50 UTC
Commits (in order):

386357c347f8433e14ccd8637576f4c4a4e42492
bc329a0162ae9af382c81e75742b282ea8c5df0d
9eb354883c9d965bb271223bf870839bb756db26
fa2d731b6ce51c5db9fb0b004d586b8f3e1decd3
8c0a7eee3bbfe01b2d965dbe09e95221c5031c8b

Comment 25 anuja 2017-11-02 05:09:45 UTC
Verified using IPA, PKI, JSS, NSS-UTIL version::

ipa-server-4.5.0-22.el7_4.x86_64
jss-4.4.0-9.el7_4.x86_64
pki-server-10.4.1-16.el7_4.noarch
pki-tools-10.4.1-16.el7_4.x86_64
pki-ca-10.4.1-16.el7_4.noarch
pki-base-java-10.4.1-16.el7_4.noarch
pki-kra-10.4.1-16.el7_4.noarch
pki-base-10.4.1-16.el7_4.noarch
nss-util-3.28.4-3.el7.x86_64

Marking BZ as verified. Please see attachment for console log.

Comment 26 anuja 2017-11-02 05:13:35 UTC
Created attachment 1346838 [details]
Console logs for bug verfication

Comment 29 errata-xmlrpc 2017-11-30 15:32:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3301


Note You need to log in before you can comment on or make changes to this bug.