RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1348976 - freeipa fails to start correctly after pki-core update on upgraded system
Summary: freeipa fails to start correctly after pki-core update on upgraded system
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pki-core
Version: 7.3
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: 7.3
Assignee: RHCS Maintainers
QA Contact: Asha Akkiangady
URL:
Whiteboard:
Depends On: 1323400
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-22 12:30 UTC by Fraser Tweedale
Modified: 2020-10-04 21:08 UTC (History)
19 users (show)

Fixed In Version: pki-core-10.3.3-3.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1323400
Environment:
Last Closed: 2016-11-04 05:25:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github dogtagpki pki issues 2405 0 None None None 2020-10-04 21:08:05 UTC
Red Hat Product Errata RHBA-2016:2396 0 normal SHIPPED_LIVE pki-core bug fix and enhancement update 2016-11-03 13:55:03 UTC

Description Fraser Tweedale 2016-06-22 12:30:24 UTC
+++ This bug was initially created as a clone of Bug #1323400 +++

I woke up this morning to find my FreeIPA server down. Looks like an automatic DNF update happened on the box and FreeIPA failed to recover afterwards. Attaching the journal for now; please let me know what else I can attach (maybe there's somewhere with more details on the pki-tomcat issue, but tomcat logging is insane so I've no idea where to look). The SSL cipher errors look obviously suspicious but I don't know how fatal they are. This is a system which has been upgraded all the way from F19.

--- Additional comment from Adam Williamson on 2016-04-02 10:30 EDT ---



--- Additional comment from Adam Williamson on 2016-04-02 10:46:19 EDT ---

Trying to start IPA again fails in the same way. The TLS cipher issue seems the most likely culprit, I don't have time to start poking around at server.xml now, I'll try it later.

--- Additional comment from Adam Williamson on 2016-04-03 12:38:05 EDT ---

So pki-tomcat last started successfully on March 28 at 14:18. It first failed to start on April 1 at 15:28. Between those times these packages were updated:

tzdata
pki-core (10.2.6-13.fc23 to 10.2.6-16.fc23)
freeipa (4.2.3-2.fc23 to 4.2.4-1.fc23)
sssd (1.13.3-5.fc23 to 1.13.3-6.fc23)
apache-commons-collections

the pki-core update did not trigger a pki-tomcat restart, which is why the problem only showed up after the freeipa update, but it's actually the cause of the problem. I just downgraded to 10.2.6-13.fc23 and IPA started up successfully after that.

Will attach my server.xml for inspection.

--- Additional comment from Adam Williamson on 2016-04-03 12:39 EDT ---



--- Additional comment from Ade Lee on 2016-04-04 09:41:29 EDT ---

I don't see anything that necessarily indicates that the problem is ciphers.
Can you check the dogtag logs under /var/log/pki/pki-tomcat/ for any updates during the specified time?  We would particularly be interested in the ca/debug log, but sme exception/stack trace might be seen in the system log or others.

--- Additional comment from Adam Williamson on 2016-04-04 11:47:38 EDT ---

I'll have a look, but I was looking at specifically these messages:

Apr 01 15:28:38 id.happyassassin.net server[6192]: WARNING: [SetAllPropertiesRule]{Server/Service/Connector} Setting property 'strictCiphers' to 'true' did not find a matching property.
Apr 01 15:28:38 id.happyassassin.net server[6192]: Apr 01, 2016 3:28:38 PM org.apache.catalina.startup.SetAllPropertiesRule begin
Apr 01 15:28:38 id.happyassassin.net server[6192]: WARNING: [SetAllPropertiesRule]{Server/Service/Connector} Setting property 'sslOptions' to 'ssl2=true,ssl3=true,tls=true' did not find a matching property.
Apr 01 15:28:38 id.happyassassin.net server[6192]: Apr 01, 2016 3:28:38 PM org.apache.catalina.startup.SetAllPropertiesRule begin
Apr 01 15:28:38 id.happyassassin.net server[6192]: WARNING: [SetAllPropertiesRule]{Server/Service/Connector} Setting property 'ssl2Ciphers' to '-SSL2_RC4_128_WITH_MD5,-SSL2_RC4_128_EXPORT40_WITH_MD5,-SSL2_RC2_128_CBC_WITH_MD5,-SSL2_RC2_128_CBC_EXPORT40_WITH_MD5,-SSL2_DES_64_CBC_WITH_MD5,-SSL2_DES_192_EDE3_CBC_WITH_MD5' did not find a matching property.
Apr 01 15:28:38 id.happyassassin.net server[6192]: Apr 01, 2016 3:28:38 PM org.apache.catalina.startup.SetAllPropertiesRule begin
Apr 01 15:28:38 id.happyassassin.net server[6192]: WARNING: [SetAllPropertiesRule]{Server/Service/Connector} Setting property 'ssl3Ciphers' to '-SSL3_FORTEZZA_DMS_WITH_NULL_SHA,-SSL3_FORTEZZA_DMS_WITH_RC4_128_SHA,+SSL3_RSA_WITH_RC4_128_SHA,-SSL3_RSA_EXPORT_WITH_RC4_40_MD5,+SSL3_RSA_WITH_3DES_EDE_CBC_SHA,+SSL3_RSA_WITH_DES_CBC_SHA,-SSL3_RSA_EXPORT_WITH_RC2_CBC_40_MD5,-SSL3_FORTEZZA_DMS_WITH_FORTEZZA_CBC_SHA,-SSL_RSA_FIPS_WITH_DES_CBC_SHA,+SSL_RSA_FIPS_WITH_3DES_EDE_CBC_SHA,-SSL3_RSA_WITH_NULL_MD5,-TLS_RSA_EXPORT1024_WITH_RC4_56_SHA,-TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA,+TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA' did not find a matching property.
Apr 01 15:28:38 id.happyassassin.net server[6192]: Apr 01, 2016 3:28:38 PM org.apache.catalina.startup.SetAllPropertiesRule begin
Apr 01 15:28:38 id.happyassassin.net server[6192]: WARNING: [SetAllPropertiesRule]{Server/Service/Connector} Setting property 'tlsCiphers' to '-TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA,-TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA,+TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA,+TLS_ECDH_RSA_WITH_AES_128_CBC_SHA,+TLS_ECDH_RSA_WITH_AES_256_CBC_SHA,-TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA,+TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,+TLS_RSA_WITH_3DES_EDE_CBC_SHA,+TLS_RSA_WITH_AES_128_CBC_SHA,+TLS_RSA_WITH_AES_256_CBC_SHA,+TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA,+TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,-TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,-TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,-TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,+TLS_DHE_DSS_WITH_3DES_EDE_CBC_SHA,+TLS_DHE_DSS_WITH_AES_128_CBC_SHA,+TLS_DHE_DSS_WITH_AES_256_CBC_SHA,+TLS_DHE_RSA_WITH_3DES_EDE_CBC_SHA,+TLS_DHE_RSA_WITH_AES_128_CBC_SHA,+TLS_DHE_RSA_WITH_AES_256_CBC_SHA' did not find a matching property.

there are several other 'did not find a matching property' warnings indeed, but those seem to cover the entire TLS/SSL configuration. Still, you're the expert :)

--- Additional comment from Endi Sukma Dewata on 2016-04-04 14:41:41 EDT ---

Hi, I think the above warnings are unrelated to the startup failure (see https://fedorahosted.org/pki/ticket/1352).

Could you attach the CA debug log file (i.e. /var/log/pki/pki-tomcat/ca/debug)?

--- Additional comment from Adam Williamson on 2016-04-04 15:12:58 EDT ---

there doesn't seem to be an awful lot there, for the affected tries:

[01/Apr/2016:15:28:42][localhost-startStop-1]: ============================================
[01/Apr/2016:15:28:42][localhost-startStop-1]: =====  DEBUG SUBSYSTEM INITIALIZED   =======
[01/Apr/2016:15:28:42][localhost-startStop-1]: ============================================
[01/Apr/2016:15:34:04][localhost-startStop-1]: ============================================
[01/Apr/2016:15:34:04][localhost-startStop-1]: =====  DEBUG SUBSYSTEM INITIALIZED   =======
[01/Apr/2016:15:34:04][localhost-startStop-1]: ============================================
[02/Apr/2016:07:32:03][localhost-startStop-1]: ============================================
[02/Apr/2016:07:32:03][localhost-startStop-1]: =====  DEBUG SUBSYSTEM INITIALIZED   =======
[02/Apr/2016:07:32:03][localhost-startStop-1]: ============================================
netscape.ldap.LDAPException: Server or network error (81)
        at netscape.ldap.LDAPConnThread.networkError(Unknown Source)
        at netscape.ldap.LDAPConnThread.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:745)
[03/Apr/2016:09:36:34][localhost-startStop-1]: ============================================
[03/Apr/2016:09:36:34][localhost-startStop-1]: =====  DEBUG SUBSYSTEM INITIALIZED   =======
[03/Apr/2016:09:36:34][localhost-startStop-1]: ============================================
[03/Apr/2016:09:36:40][http-bio-8443-exec-2]: CMSServlet: curDate=Sun Apr 03 09:36:40 PDT 2016 id=caGetStatus time=27
[03/Apr/2016:15:46:24][localhost-startStop-2]: ============================================
[03/Apr/2016:15:46:24][localhost-startStop-2]: =====  DEBUG SUBSYSTEM INITIALIZED   =======
[03/Apr/2016:15:46:24][localhost-startStop-2]: ============================================

Note there are two or three startup attempts there - 01/Apr/2016:15:28:42 is where it initially went down - and only one apparently couldn't hit the LDAP server, that was probably while I was playing around trying to fix it. The other failures don't seem to log anything interesting. 03/Apr/2016:09:36:34 is when it started successfully after downgrade of pki-core packages.

--- Additional comment from Adam Williamson on 2016-04-04 15:18:01 EDT ---

ca/system has some interesting (I think) lines:

0.profileChangeMonitor - [01/Apr/2016:15:28:12 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:13 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:14 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:15 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:16 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:17 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:18 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:19 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:20 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:21 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:22 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:23 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:24 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:25 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:26 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:27 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:28 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:29 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:30 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:31 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:32 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:33 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)
0.profileChangeMonitor - [01/Apr/2016:15:28:34 PDT] [8] [3] In Ldap (bound) connection pool to host id.happyassassin.net port 636, Cannot connect to LDAP server. Error: netscape.ldap.LDAPException: IO Error creating JSS SSL Socket (-1)

there is nothing after that in the log at all - no errors apparently appeared on the subsequent failed attempts to start IPA, or the successful one after the downgrade.

--- Additional comment from Adam Williamson on 2016-04-04 15:20 EDT ---

Now I look through the whole system log, similar SSL socket errors appear quite regularly, so that's probably not the whole story...here's the whole file.

--- Additional comment from Matthew Harmsen on 2016-04-04 19:50:52 EDT ---

(In reply to Adam Williamson from comment #3)
> So pki-tomcat last started successfully on March 28 at 14:18. It first
> failed to start on April 1 at 15:28. Between those times these packages were
> updated:
> 
> tzdata
> pki-core (10.2.6-13.fc23 to 10.2.6-16.fc23)
> freeipa (4.2.3-2.fc23 to 4.2.4-1.fc23)
> sssd (1.13.3-5.fc23 to 1.13.3-6.fc23)
> apache-commons-collections

What version of 'tomcat' are you using?

> 
> the pki-core update did not trigger a pki-tomcat restart, which is why the
> problem only showed up after the freeipa update, but it's actually the cause
> of the problem. I just downgraded to 10.2.6-13.fc23 and IPA started up
> successfully after that.
> 
> Will attach my server.xml for inspection.

--- Additional comment from Adam Williamson on 2016-04-04 19:51:41 EDT ---

tomcat-8.0.32-5.fc23.noarch

--- Additional comment from Endi Sukma Dewata on 2016-04-04 21:57:11 EDT ---

I'm unable to reproduce the problem on F23. I tried installing pki-core 10.2.6-13 and freeipa 4.2.3-2 then upgraded them to 10.2.6-16 and 4.2.4-1, respectively. The server seems to be working fine after upgrade and after restart.

It's possible there are remnants of F19 installation that are causing this problem that I cannot replicate with the above method, or there is something specific about your environment. Could you try increasing the debug level and try upgrading again? Thanks.

http://pki.fedoraproject.org/wiki/PKI_Server_Logs#Debug_Log

--- Additional comment from Adam Williamson on 2016-04-14 15:24:16 EDT ---

sorry I got distracted :(

Here is the CA debug file; the bug is still happening. I don't see anything here, but maybe you will.

This is everything from after I enabled the debug settings. First I did `systemctl restart pki-tomcatd` , which appeared to go OK and the service showed as 'running' afterwards, but I didn't leave it very long. I instead did `systemctl restart ipa.service`, which takes a long time and eventually fails, with `systemctl --failed` showing ipa.service , kadmin.service , and pki-tomcatd as failed.

--- Additional comment from Adam Williamson on 2016-04-14 15:24 EDT ---



--- Additional comment from Adam Williamson on 2016-04-14 15:28:07 EDT ---

I'm wondering if maybe tomcat is actually starting fine, but ipactl doesn't *think* it is, for some reason? I'm gonna take a quick look at how ipactl decides when something is done starting.

--- Additional comment from Adam Williamson on 2016-04-14 15:35:02 EDT ---

aha, yes, it seems that's something like what's happening. ipactl -d output:

ipa: DEBUG: Waiting until the CA is running
ipa: DEBUG: Starting external process
ipa: DEBUG: args='/usr/bin/wget' '-S' '-O' '-' '--timeout=30' '--no-check-certificate' 'https://id.happyassassin.net:8443/ca/admin/ca/getStatus'
ipa: DEBUG: Process finished, return code=4
ipa: DEBUG: stdout=
ipa: DEBUG: stderr=--2016-04-14 12:32:51--  https://id.happyassassin.net:8443/ca/admin/ca/getStatus
Resolving id.happyassassin.net (id.happyassassin.net)... 192.168.1.12
Connecting to id.happyassassin.net (id.happyassassin.net)|192.168.1.12|:8443... connected.
Unable to establish SSL connection.

ipa: DEBUG: The CA status is: check interrupted due to error: Command ''/usr/bin/wget' '-S' '-O' '-' '--timeout=30' '--no-check-certificate' 'https://id.happyassassin.net:8443/ca/admin/ca/getStatus'' returned non-zero exit status 4
ipa: DEBUG: Waiting for CA to start...
ipa: DEBUG: Starting external process
ipa: DEBUG: args='/usr/bin/wget' '-S' '-O' '-' '--timeout=30' '--no-check-certificate' 'https://id.happyassassin.net:8443/ca/admin/ca/getStatus'
ipa: DEBUG: Process finished, return code=4
ipa: DEBUG: stdout=
ipa: DEBUG: stderr=--2016-04-14 12:33:22--  https://id.happyassassin.net:8443/ca/admin/ca/getStatus
Resolving id.happyassassin.net (id.happyassassin.net)... 192.168.1.12
Connecting to id.happyassassin.net (id.happyassassin.net)|192.168.1.12|:8443... connected.
Unable to establish SSL connection.

ipa: DEBUG: The CA status is: check interrupted due to error: Command ''/usr/bin/wget' '-S' '-O' '-' '--timeout=30' '--no-check-certificate' 'https://id.happyassassin.net:8443/ca/admin/ca/getStatus'' returned non-zero exit status 4
ipa: DEBUG: Waiting for CA to start...
ipa: DEBUG: Starting external process
ipa: DEBUG: args='/usr/bin/wget' '-S' '-O' '-' '--timeout=30' '--no-check-certificate' 'https://id.happyassassin.net:8443/ca/admin/ca/getStatus'

i.e. it's trying to hit a URL and failing. indeed if I try to hit the same URL manually, it also just times out.

--- Additional comment from Adam Williamson on 2016-04-14 15:40:57 EDT ---

[adamw@id ~]$ curl -v https://localhost:8443/ca/admin/ca/getStatus
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
^C
[adamw@id ~]$ curl -v http://localhost:8443/ca/admin/ca/getStatus
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8443 (#0)
> GET /ca/admin/ca/getStatus HTTP/1.1
> Host: localhost:8443
> User-Agent: curl/7.43.0
> Accept: */*
> [hangs]

--- Additional comment from Adam Williamson on 2016-04-14 15:43:00 EDT ---

ok, it gives up after a while:

[adamw@id ~]$ curl -v http://localhost:8443/ca/admin/ca/getStatus
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8443 (#0)
> GET /ca/admin/ca/getStatus HTTP/1.1
> Host: localhost:8443
> User-Agent: curl/7.43.0
> Accept: */*
> 
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer

--- Additional comment from Matthew Harmsen on 2016-04-18 12:44:58 EDT ---

Upstream ticket:
https://fedorahosted.org/pki/ticket/2285

--- Additional comment from Fraser Tweedale on 2016-04-18 18:29:58 EDT ---

Let us eliminate connection or certificate authentication issues.

Adam, can you confirm that DS is up and running on port 636, and that the certificate that it presents is not expired?

  $ echo | openssl s_client -showcerts -connect $HOSTNAME:636

Then check the certificate for the user that Dogtag uses to authenticate to DS:

  $ ldapsearch -D 'cn=Directory Manager' -w $DM_PASSWORD -b uid=pkidbuser,ou=people,o=ipaca

And check the details of the certificate that is actually being used by Dogtag:

  $ sudo certutil -d /etc/pki/pki-tomcat/alias -L -n "subsystemCert cert-pki-ca"

--- Additional comment from Adam Williamson on 2016-04-19 14:51:31 EDT ---

Attaching results of each of those.

--- Additional comment from Adam Williamson on 2016-04-19 14:51 EDT ---



--- Additional comment from Adam Williamson on 2016-04-19 14:52 EDT ---



--- Additional comment from Adam Williamson on 2016-04-19 14:53 EDT ---



--- Additional comment from Fraser Tweedale on 2016-04-27 05:46:58 EDT ---

Thanks for the info Adam.  Dogtag 10.2.6-16 issue made profile
initialisation part of the startup process (it previously happened in
the background).  I guess the underlying problem is LDAP TLS connection,
and that the problem was already there, and simply exposed by the
startup changes in 10.2.6-16.  You may be able to start IPA with
v10.2.6-13, but are you able to issue certs or perform other operations
that hit the CA?

Let's eliminate authentication issues; the following command authenticates
to DS using Dogtag's client certificate:

  dnf install -y mozldap-tools

  /usr/lib64/mozldap/ldapsearch -Z \
    -P /etc/pki/pki-tomcat/alias \
    -W $(grep ^internal= /etc/pki/pki-tomcat/password.conf) \
    -N "subsystemCert cert-pki-ca"
    -b o=ipaca '(uid=pkidbuser)' seeAlso

If this command fails, we will focus on DS cert mapping.

If it succeeds, let's focus on cipher suites; please provide the contents
of /etc/dirsrv/slapd-HAPPYASSASSIN-NET/dse.ldif.

--- Additional comment from Adam Williamson on 2016-04-27 05:53:55 EDT ---

"You may be able to start IPA with v10.2.6-13, but are you able to issue certs or perform other operations that hit the CA?"

not sure if I've done anything recently which would've caused a cert to be issued, indeed.

thanks for the next steps, will get to it when i wake up.

--- Additional comment from Adam Williamson on 2016-05-03 14:25:53 EDT ---

Um. I don't know if I was supposed to substitute something for (uid=pkidbuser) (and if so, what?). I tried running the command as-is and this is what happens:

[root@id adamw]# /usr/lib64/mozldap/ldapsearch -Z \
>     -P /etc/pki/pki-tomcat/alias \
>     -W $(grep ^internal= /etc/pki/pki-tomcat/password.conf) \
>     -N "subsystemCert cert-pki-ca" \
> -b o=ipaca '(uid=pkidbuser)' seeAlso
Warning: Incorrect PIN may result in disabling the token
Enter PIN for NSS Certificate DB: 

I have no idea what the PIN for the NSS certificate DB would be. If I leave it blank I get:

ldapssl_enable_clientauth: Bad parameter to an ldap routine
ldapssl_enable_clientauth: additional info: bad key or key password
	SSL error -8177 (The security password entered is incorrect.)

--- Additional comment from Fraser Tweedale on 2016-05-03 19:04:25 EDT ---

Adam, apologies - I made a mistake in the command.  The `-W' argument should be:

  -W $(grep ^internal= /etc/pki/pki-tomcat/password.conf | cut -d = -f 2)

--- Additional comment from Adam Williamson on 2016-05-03 19:50:39 EDT ---

OK, with that change, I get:

[root@id adamw]# /usr/lib64/mozldap/ldapsearch -Z \
>     -P /etc/pki/pki-tomcat/alias \
> -W $(grep ^internal= /etc/pki/pki-tomcat/password.conf | cut -d = -f 2) \
> -N "subsystemCert cert-pki-ca" \
> -b o=ipaca '(uid=pkidbuser)' seeAlso
version: 1
dn: uid=pkidbuser,ou=people,o=ipaca
seeAlso: CN=CA Subsystem,O=HAPPYASSASSIN.NET

which Fraser tells me is fine. So we are now looking at cipher suites. So far I see this bit in dse.ldif:

allowWeakCipher: off
cn: encryption
createTimestamp: 20130923221704Z
creatorsName: cn=server,cn=plugins,cn=config
modifiersName: cn=Directory Manager
modifyTimestamp: 20141211014209Z
nsSSL2: off
nsSSL3: off
nsSSL3Ciphers: +all
nsSSLClientAuth: allowed
nsSSLSessionTimeout: 0
objectClass: top
objectClass: nsEncryptionConfig
numSubordinates: 1

I'm not sure what else you may need to see.

--- Additional comment from Fraser Tweedale on 2016-05-04 00:14:37 EDT ---

On a fresh install of freeipa-server-4.2.4-1.fc23,
my dse.ldif has:

    dn: cn=encryption,cn=config
    allowWeakCipher: off
    cn: encryption
    createTimestamp: 20160504035626Z
    creatorsName: cn=server,cn=plugins,cn=config
    modifiersName: cn=directory manager
    modifyTimestamp: 20160504035907Z
    nsSSL3Ciphers: +all
    nsSSLClientAuth: allowed
    nsSSLSessionTimeout: 0
    objectClass: top
    objectClass: nsEncryptionConfig
    sslVersionMin: TLS1.0
    numSubordinates: 1

Perhaps a long shot but if you stop DS, update dse.ldif to remove the extra attributes:

    nsSSL2: off
    nsSSL3: off

And add the missing attribute:

    sslVersionMin: TLS1.0

Then restart DS... does Dogtag then come up properly?

--- Additional comment from Adam Williamson on 2016-05-04 03:17:36 EDT ---

No, that doesn't seem to help.

--- Additional comment from Adam Williamson on 2016-06-14 19:27:57 EDT ---

this is still happening, fwiw, even with latest pki-core (19).

--- Additional comment from Adam Williamson on 2016-06-21 11:58:47 EDT ---

still happening after upgrade of server box to f24, and now I don't know if I can fix it with a package downgrade. argh.

--- Additional comment from Adam Williamson on 2016-06-21 15:14:07 EDT ---

OK, luckily, can still get 10.2.6-13 installed on f24. So I've pinned this down a bit tighter: not just between 10.2.6-13 and 10.2.6-16, but 10.2.6-13 and 10.2.6-14. It works with 10.2.6-13.fc24.x86_64 , fails with 10.2.6-14.fc24.x86_64 . So one of the changes in -14 must be the culprit here.

--- Additional comment from Adam Williamson on 2016-06-21 15:49:47 EDT ---

OK, so I've narrowed this down to a specific commit:

https://git.fedorahosted.org/cgit/pki.git/commit?id=5fae5826e5442d7266681d19f282dc7728062b89

that was included in -14.fc23 / -14.fc24 as a patch. I did a scratch build of 10.3.3 for F24 with that commit reverted:

http://koji.fedoraproject.org/koji/taskinfo?taskID=14596519

and with that, IPA starts up fine.

Of course, simply reverting that change doesn't look like the right 'fix', I guess the question is, why does this check for profile load completion break things in my setup? Is there a bug in the code, or do the profiles in question truly never load properly in my config, or what?

I guess I'll try poking at it with a big stick and spreading some debug statements around and see if I can work anything else, though I'm no Java or Tomcat expert.

--- Additional comment from Adam Williamson on 2016-06-21 17:56:35 EDT ---

So I threw a couple of debugs into LDAPProfileSubsystem.java run():

        while (!stopped) {
            forgetAllProfiles();
            try {
                conn = dbFactory.getConn();
                LDAPSearchConstraints cons = conn.getSearchConstraints();
                cons.setServerControls(persistCtrl);
                cons.setBatchSize(1);
                cons.setServerTimeLimit(0 /* seconds */);
                String[] attrs = {"*", "entryUSN", "nsUniqueId", "numSubordinates"};
                LDAPSearchResults results = conn.search(
                    dn, LDAPConnection.SCOPE_SUB, "(objectclass=*)",
                    attrs, false, cons);
                CMS.debug("Got here #1");
                while (!stopped && results.hasMoreElements()) {
                    CMS.debug("Got here #2");

and we reach #1, but we never reach #2. The debug log hangs at this point:

[21/Jun/2016:14:47:31][profileChangeMonitor]: Profile change monitor: starting.
[21/Jun/2016:14:47:31][profileChangeMonitor]: In LdapBoundConnFactory::getConn()
[21/Jun/2016:14:47:31][profileChangeMonitor]: masterConn is connected: true
[21/Jun/2016:14:47:31][profileChangeMonitor]: getConn: conn is connected true
[21/Jun/2016:14:47:31][profileChangeMonitor]: getConn: mNumConns now 2
[21/Jun/2016:14:47:31][profileChangeMonitor]: Got here #1
[21/Jun/2016:14:47:31][CRLIssuingPoint-MasterCRL]: findNextUpdate:  fromLastUpdate: true  delta: false
[21/Jun/2016:14:47:31][CRLIssuingPoint-MasterCRL]: findNextUpdate:  Tue Jun 21 17:00:00 PDT 2016  delay: 7948646

I note that init() does this:

        monitor = new Thread(this, "profileChangeMonitor");
        monitor.start();
        try {
            initialLoadDone.await();
        } catch (InterruptedException e) {
            CMS.debug("LDAPProfileSubsystem: caught InterruptedException "
                    + "while waiting for initial load of profiles.");
        }
        CMS.debug("LDAPProfileSubsystem: finished init");
    }

So init() is never going to return until initialLoadDone - which is a CountDownLatch, see https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/CountDownLatch.html - hits 0 (when that await() call will return). The *only* call that decrements initialLoadDone is in checkInitialLoadDone(), which is only ever called from within the LDAP search result loop.

So, basically, it's clear at one level what's going on: init() will never return if there are no results for the LDAPsearch near the start of run(), and on my system, there aren't any. I guess the question is, why not? And is it expected that the search may not return any results (in which case, the logic is broken and needs fixing), or should there *always* be results?

--- Additional comment from Adam Williamson on 2016-06-21 18:44:42 EDT ---

Well, here's a thing. If I do this on my deployment:

/usr/lib64/mozldap/ldapsearch -Z -P /etc/pki/pki-tomcat/alias -W $(grep ^internal= /etc/pki/pki-tomcat/password.conf | cut -d = -f 2) -N "subsystemCert cert-pki-ca" -b o=ipaca '(objectclass=organizationalUnit) | grep "^ou"

I get the following ous:

ou: people
ou: groups
ou: requests
ou: ca
ou: ranges
ou: replica
ou: requests
ou: certificateRepository

on a clean F24 FreeIPA deployment, I get these:

ou: people
ou: groups
ou: requests
ou: ca
ou: ranges
ou: replica
ou: requests
ou: certificateRepository
ou: certificateProfiles
ou: authorities

note how my server has no certificateProfiles (or authorities). Could this be the problem here? My server was installed as F19 and has been upgraded from there (I think through 21 and 23 to 24), so perhaps this should have been added on upgrade at some point but was not?

--- Additional comment from Adam Williamson on 2016-06-21 19:00:04 EDT ---

Note that pki_profiles_in_ldap is 'False' in /etc/pki/default.cfg on both my server and the fresh F24 deployment...

--- Additional comment from Adam Williamson on 2016-06-21 19:30:13 EDT ---

I cannot find anything in the pki server upgrade scripts which would trigger import of profiles to LDAP:

[root@id upgrade]# pwd
/usr/share/pki/server/upgrade
[root@id upgrade]# grep -R -i profil *
10.3.0/02-AddAuthzRealmToRegistry:            'com.netscape.cms.profile.def.AuthzRealmDefault',
10.3.0/02-AddAuthzRealmToRegistry:            'com.netscape.cms.profile.constraint.AuthzRealmConstraint',

This commit added some code for importing profiles:

https://git.fedorahosted.org/cgit/pki.git/commit?id=e4869e62f432b510dc99eb7e00d16a23caa6ea64

but I just can't see anything suggesting that would get triggered on upgrade of a deployed system, it *seems* like that configure() stuff only gets hit on deployment, unless I'm misunderstanding that (it's not exactly easy for a Java noob to follow).

--- Additional comment from Fraser Tweedale on 2016-06-21 21:56:24 EDT ---

Adam, this is super info, thanks for it!

(Sorry for lack of attention to this issue; very busy on IPA v4.4 work).

PKI is by default deployed to use the file-based profile subsystem.  FreeIPA
upgrade is responsible for switching Dogtag instances to the LDAP profile
subsystem, adding the out=certificateProfiles container and importing the
profiles to LDAP.  It seems that the instance was switched to 
LDAPProfileSubsystem but the container was not added and profiles not imported.

If you send me /var/log/ipaupgrade.log I'll have a look and try to
identify where things went wrong.

Could you please modify /etc/pki/pki-tomcat/ca/CS.cfg and replace:

   subsystem.1.class=com.netscape.cmscore.profile.LDAPProfileSubsystem

with:

   subsystem.1.class=com.netscape.cmscore.profile.ProfileSubsystem

Switch back to an official build (10.2.6-14 or later), then
run `ipa-server-upgrade`, and see how that goes.

Thanks!

--- Additional comment from Adam Williamson on 2016-06-21 22:14:48 EDT ---

So I did a bit of testing to see when the certificateProfiles ou showed up. It's there in a fresh deployment on a clean network install of F22, but not of F21 - so it showed up either between F21 and F22, or during F22's lifetime. The F21 install has pki-server-10.2.1-3.fc21.noarch , F22 would be pki-server-10.2.6-12.fc22 , I think, so somewhere between 10.2.1 and 10.2.6 I guess.

I also successfully reproduced the bug by simply trying to upgrade the clean F21 freeipa deployment to F23. All I had to was net install F21, do 'ipa-server-install', then use dnf-plugin-system-upgrade to upgrade to F23.

The upgrade process takes a very long time, because as part of the scriptlets (after all the 'clean' tasks have apparently run), it winds up trying to start IPA, and this takes a long time and eventually fails, the failure I believe caused by this bug. After booting the upgraded system, ipa service start up again fails. The ldapsearch command shows the certificateProfiles ou is not present.

Will attach ipaupgrade.log from the test VM, then try the re-run of ipa-server-upgrade from #c41 and see what happens.

--- Additional comment from Adam Williamson on 2016-06-21 22:17 EDT ---



--- Additional comment from Fraser Tweedale on 2016-06-21 22:46:32 EDT ---

Analysis indicdates that PKI restart fails because
LDAPProfileSubsystem startup fails due to ou=certificateProfiles
container not yet being added (race condition; the FreeIPA
upgrade behaviour that adds the container and imports profiles
waits on the restart, which fails because the container isn't there).

Proposed fix is the enhance LDAPProfileSubsystem to add the
ou=certificateProfiles container if it does not exist.
Fix should be backported to 10.2.x branch on f23.

What initially broke the upgrade is still unknown (the version
of pki was before the CountDownLatch was introduced).  The
exact versions were:

   12:37 < adamw> looks like i went from pki-core 10.2.1-3.fc21 to 10.2.6-11.fc23
   12:38 < adamw> and freeipa 4.1.4-1.fc21 to 4.2.2-1.fc23

--- Additional comment from Adam Williamson on 2016-06-21 22:53:04 EDT ---

OK, so we figured out why it fails *now* in an F21 -> F23 upgrade. The upgrade code does this:

    if needs_update:
        installutils.set_directive(
            paths.CA_CS_CFG_PATH,
            directive,
            'com.netscape.cmscore.profile.LDAPProfileSubsystem',
            quotes=False,
            separator='=')

        ca.restart('pki-tomcat')

    root_logger.info('[Migrating certificate profiles to LDAP]')
    cainstance.migrate_profiles_to_ldap()

i.e. it changes the config file, then tries to restart pki-tomcat and migrate the profiles...only that's never going to work with pki-core -14 or higher, because pki-tomcat will fail to start as soon as the config file is changed and the whole thing then fails.

I confirmed that if I re-run the upgrade with pki-core downgraded to -13 , it works.

This doesn't explain why the upgrade failed *on my prod server*, however, because at the time I upgraded that, the pki-core change had not yet happened. Unfortunately ipaupgrade.log has been overwritten since that upgrade with later IPA upgrade attempts, and I can't find anything terribly useful in dnf history or system logs to explain why the ipa upgrade process failed on my initial F23 upgrade. It certainly *did* fail, though, we know that much. This is an extract from the journal during the system upgrade, right at the end of the 'Cleanup' step:

Nov 01 16:00:49 id.happyassassin.net dnf[302]: Cleanup     : libgcc-4.9.2-6.fc21.x86_64                            2304/2304
Nov 01 16:00:49 id.happyassassin.net dnf[302]: 1038 blocks
Nov 01 16:00:50 id.happyassassin.net systemd[1]: Reached target Network.
Nov 01 16:00:50 id.happyassassin.net systemd[1]: Starting Network.
Nov 01 16:00:50 id.happyassassin.net systemd[1]: Starting Identity, Policy, Audit...
Nov 01 16:00:51 id.happyassassin.net ipactl[16557]: Upgrade required: please run ipa-server-upgrade command
Nov 01 16:00:51 id.happyassassin.net ipactl[16557]: Aborting ipactl
Nov 01 16:00:51 id.happyassassin.net systemd[1]: ipa.service: Main process exited, code=exited, status=1/FAILURE
Nov 01 16:00:51 id.happyassassin.net systemd[1]: Failed to start Identity, Policy, Audit.
Nov 01 16:00:51 id.happyassassin.net systemd[1]: ipa.service: Unit entered failed state.
Nov 01 16:00:51 id.happyassassin.net systemd[1]: ipa.service: Failed with result 'exit-code'.
Nov 01 16:00:51 id.happyassassin.net dnf[302]: IPA server upgrade failed: Inspect /var/log/ipaupgrade.log and run command ipa-server-upgrade manually.
Nov 01 16:00:51 id.happyassassin.net dnf[302]: Certmonger is not running. Start certmonger and run upgrade again.

I'm not sure if that certmonger error is the real one. Looking at the logs, I do recall that the upgrade to F23 was a big mess at the time, it's obvious that I ran into https://bugzilla.redhat.com/show_bug.cgi?id=1277224 , and I think there were some other bugs around too , like https://bugzilla.redhat.com/show_bug.cgi?id=1274905 and https://fedorahosted.org/pki/ticket/1310 . So I guess we'd just have to chalk my production server failure up to an unfortunate incomplete upgrade process - I guess one of those bugs that I ran into happened to cause it to fail in between changing the config file and importing the profiles...

Comment 2 Matthew Harmsen 2016-06-24 00:13:47 UTC
Per PKI Bug Council of 06/23/2016: 10.3.4

Comment 3 Matthew Harmsen 2016-07-01 15:30:01 UTC
On June 30, 2016, ftweedal checked fix into master:
* 2dea243d51765e3a8f01f7680592143c842921ce

Comment 5 Nikhil Dehadrai 2016-09-20 06:41:40 UTC
IPA server version: ipa-server-4.4.0-12.el7.x86_64
PKI-server: pki-server-10.3.3-10.el7.noarch


Verified the bug on the basis of below observations:
1. Verified that IPA server is successfully upgraded to latest version.
2. IPA server service is successfully restarted after the upgrade.
# ipactl restart
Stopping pki-tomcatd Service
Restarting Directory Service
Restarting krb5kdc Service
Restarting kadmin Service
Restarting named Service
Restarting ipa_memcached Service
Restarting httpd Service
Restarting ipa-custodia Service
Restarting ntpd Service
Restarting pki-tomcatd Service
Restarting ipa-otpd Service
Restarting ipa-dnskeysyncd Service
ipa: INFO: The ipactl command was successful

3. Kinit command is run successfully after the upgrade.

# kinit admin
Password for admin: 

4. Also noticed following meesages inside logs :

cat /var/log/pki/pki-tomcat/ca/debug | grep LDAPConnection, as expected due to persistent LDAP search to watch for changes to profiles


[19/Sep/2016:16:46:37][profileChangeMonitor]: Profile change monitor: failed to get LDAPConnection. Retrying in 1 second.
[19/Sep/2016:16:46:38][profileChangeMonitor]: Profile change monitor: failed to get LDAPConnection. Retrying in 1 second.
	at netscape.ldap.LDAPConnection.connect(Unknown Source)
	at netscape.ldap.LDAPConnection.connect(Unknown Source)
	at netscape.ldap.LDAPConnection.connect(Unknown Source)
[19/Sep/2016:16:46:39][profileChangeMonitor]: Profile change monitor: failed to get LDAPConnection. Retrying in 1 second.
[19/Sep/2016:16:46:40][profileChangeMonitor]: Profile change monitor: failed to get LDAPConnection. Retrying in 1 second.

5. Verified the same for following upgrade paths:
- 7.2.z > 7.3
- 7.1.z > 7.3

# tail -10 /var/log/ipaupgrade.log 
2016-09-19T20:48:39Z DEBUG response status 200
2016-09-19T20:48:39Z DEBUG response headers {'date': 'Mon, 19 Sep 2016 20:48:38 GMT', 'content-length': '168', 'content-type': 'application/xml', 'server': 'Apache-Coyote/1.1'}
2016-09-19T20:48:39Z DEBUG response body '<?xml version="1.0" encoding="UTF-8" standalone="no"?><XMLResponse><State>1</State><Type>CA</Type><Status>running</Status><Version>10.3.3-10.el7</Version></XMLResponse>'
2016-09-19T20:48:39Z DEBUG Starting external process
2016-09-19T20:48:39Z DEBUG args=/bin/systemctl stop pki-tomcatd
2016-09-19T20:48:40Z DEBUG Process finished, return code=0
2016-09-19T20:48:40Z DEBUG stdout=
2016-09-19T20:48:40Z DEBUG stderr=
2016-09-19T20:48:40Z INFO The IPA services were upgraded
2016-09-19T20:48:40Z INFO The ipa-server-upgrade command was successful
 
Thus on the basis of above observations, marking status of bug to "VERIFIED".

Comment 7 errata-xmlrpc 2016-11-04 05:25:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2396.html


Note You need to log in before you can comment on or make changes to this bug.