RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1018804 - Reinstalling ipa server hangs when configuring certificate server
Summary: Reinstalling ipa server hangs when configuring certificate server
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: ipa
Version: 7.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Martin Kosek
QA Contact: Namita Soman
URL:
Whiteboard:
Depends On: 1012827 1023157
Blocks: 1020711
TreeView+ depends on / blocked
 
Reported: 2013-10-14 12:41 UTC by Namita Soman
Modified: 2014-06-18 00:12 UTC (History)
8 users (show)

Fixed In Version: ipa-3.3.2-3.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1020711 (view as bug list)
Environment:
Last Closed: 2014-06-13 09:53:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ipaserver uninstall log (32.35 KB, text/plain)
2013-10-14 17:19 UTC, Namita Soman
no flags Details
python backtrace (14.74 KB, text/plain)
2013-10-14 23:02 UTC, Nathan Kinder
no flags Details
gdb py-bt backtrace (8.11 KB, text/plain)
2013-10-15 23:00 UTC, Scott Poore
no flags Details
gdb backtrace (25.24 KB, text/plain)
2013-10-15 23:04 UTC, Scott Poore
no flags Details

Description Namita Soman 2013-10-14 12:41:06 UTC
Description of problem:
Do an uninstall and re-install of ipa server and it looks like it's hanging on the re-install at:
2013-10-11T17:06:47Z DEBUG   [8/22]: importing CA chain to RA certificate database

Version-Release number of selected component (if applicable):
ipa-server-3.3.2-2.el7.x86_64.

How reproducible:
always

Steps to Reproduce:
1. Install ipa server
2. uninstall
3. reinstall

Actual results:
reinstall hangs

Expected results:
reinstall successfully

Additional info:
# ps -ef|grep ipa-server-install
root     12209  4969  0 15:51 pts/0    00:00:00 grep --color=auto ipa-server-install
root     15046 18725  0 13:05 ?        00:00:03 /usr/bin/python -E /usr/sbin/ipa-server-install --setup-dns --no-forwarder -p Secret123 -P Secret123 -a Secret123 -r TESTRELM.COM -n testrelm.com --ip-address=10.16.98.182 --hostname=ipaqa64vma.testrelm.com -U

# date
Fri Oct 11 15:54:54 EDT 2013


# tail /var/log/ipaserver-install.log
2013-10-11T17:06:46Z DEBUG The httpd proxy is not installed, skipping wait for CA
2013-10-11T17:06:46Z DEBUG   duration: 4 seconds
2013-10-11T17:06:46Z DEBUG   [7/22]: creating RA agent certificate database
2013-10-11T17:06:46Z DEBUG Starting external process
2013-10-11T17:06:46Z DEBUG args=/usr/bin/certutil -d /etc/httpd/alias -f XXXXXXXX -N
2013-10-11T17:06:47Z DEBUG Process finished, return code=0
2013-10-11T17:06:47Z DEBUG stdout=
2013-10-11T17:06:47Z DEBUG stderr=
2013-10-11T17:06:47Z DEBUG   duration: 0 seconds
2013-10-11T17:06:47Z DEBUG   [8/22]: importing CA chain to RA certificate database

>From the previous ipaserver-uninstall.log, this was the only thing that stood out:

Uninstalling CA from /var/lib/pki/pki-tomcat.

Uninstallation complete.

2013-10-11T17:04:24Z DEBUG stderr=pkidestroy  : WARNING  ....... this 'CA' entry will NOT be deleted fr
om security domain 'IPA'!
pkidestroy  : WARNING  ....... security domain 'IPA' may be offline or unreachable!
pkidestroy  : ERROR    ....... subprocess.CalledProcessError:  Command '/usr/bin/sslget -n 'subsystemCe
rt cert-pki-ca' -p '588648796016' -d '/etc/pki/pki-tomcat/alias' -e 'name="/var/lib/pki/pki-tomcat"&typ
e=CA&list=caList&host=ipaqa64vma.testrelm.com&sport=443&ncsport=8443&adminsport=8443&agentsport=8443&op
eration=remove' -v -r '/ca/agent/ca/updateDomainXML' ipaqa64vma.testrelm.com:443 2>&1' returned non-zer
o exit status 6!


# strace -p 15046
Process 15046 attached
recvfrom(5,

Comment 1 Nathan Kinder 2013-10-14 14:48:06 UTC
Could you attach the full ipaserver-uninstall.log?

The removal of the CA is failing when pkidestroy tries to remove the CA from the Security Domain.  It looks like it is unable to connect to the Security Domain over port 443, which results in leaving some stuff behind.  Perhaps the proxy used for the CA is already removed?

Comment 2 Martin Kosek 2013-10-14 16:15:24 UTC
When IPA server is being removed, it first shuts down all it's services and then removes the configuration:

# ipa-server-install --uninstall --unattended
Shutting down all IPA services
Removing IPA client configuration
Unconfiguring ntpd
Unconfiguring CA
Unconfiguring named
Unconfiguring web server
Unconfiguring krb5kdc
Unconfiguring kadmin
Unconfiguring directory server
Unconfiguring ipa_memcached
Unconfiguring ipa-otpd

So in time when pkidestroy is called, nothing is running. But CA is being uninstalled this way for the whole time, I see this error in my F19 instance. I would assume this is something different - we need to investigate.

Comment 3 Namita Soman 2013-10-14 17:19:02 UTC
Created attachment 812126 [details]
ipaserver uninstall log

Comment 4 Nathan Kinder 2013-10-14 18:00:06 UTC
(In reply to Martin Kosek from comment #2) 
> So in time when pkidestroy is called, nothing is running. But CA is being
> uninstalled this way for the whole time, I see this error in my F19
> instance. I would assume this is something different - we need to
> investigate.

Ok, it's possible that the security domain error is a red herring that has nothing to do with the reinstallation failure.

Comment 5 Scott Poore 2013-10-14 20:27:09 UTC
And I've got another server hanging at a different location:

Configuring the web interface (httpd): Estimated time 1 minute
  [1/15]: disabling mod_ssl in httpd
  [2/15]: setting mod_nss port to 443
  [3/15]: setting mod_nss password file
  [4/15]: enabling mod_nss renegotiate
  [5/15]: adding URL rewriting rules
  [6/15]: configuring httpd
  [7/15]: setting up ssl

/var/log/ipaserver-install.log:

2013-10-14T19:46:44Z DEBUG   [7/15]: setting up ssl
2013-10-14T19:46:44Z DEBUG Loading Index file from '/var/lib/ipa/sysrestore/sysrestore.index'
2013-10-14T19:46:44Z DEBUG Loading Index file from '/var/lib/ipa/sysrestore/sysrestore.index'
2013-10-14T19:46:44Z DEBUG Starting external process
2013-10-14T19:46:44Z DEBUG args=/usr/bin/certutil -d /etc/httpd/alias -R -s CN=beast.testrelm.com,O=TESTRELM.COM -o /var/lib/ipa/ipa-YGxSHf/tmpcertreq -k rsa -g 2048 -z /etc/httpd/alias/noise.txt -f /etc/httpd/alias/pwdfile.txt -a
2013-10-14T19:46:45Z DEBUG Process finished, return code=0
2013-10-14T19:46:45Z DEBUG stdout=
2013-10-14T19:46:45Z DEBUG stderr=

Generating key.  This may take a few moments...


2013-10-14T19:46:45Z DEBUG request 'https://beast.testrelm.com:8443/ca/ee/ca/profileSubmitSSLClient'
2013-10-14T19:46:45Z DEBUG request body 'profileId=caIPAserviceCert&requestor_name=IPA+Installer&cert_request=...<trunc>...&cert_request_type=pkcs10&xmlOutput=true'
2013-10-14T19:46:45Z DEBUG NSSConnection init beast.testrelm.com
2013-10-14T19:46:45Z DEBUG Connecting: <beast_ip_address>:0

Comment 6 Nathan Kinder 2013-10-14 21:35:02 UTC
I wonder if this is related to the recent nss build that was made on 10/11:

    nss-3.15.1-4.el7.x86_64

We haven't build new pki-* packages recently, so I'm not sure why these issues would start popping up all of a sudden.  Does this issue occur if you downgrade nss?

Comment 7 Nathan Kinder 2013-10-14 23:01:24 UTC
Strace shows that the python process for pkispawn is stuck on a read: 

----------------------------
# strace -p 6178
Process 6178 attached
read(5, 
----------------------------

Attaching to the python process with gdb shows that it's trying to read from a socket that's using SSL:

----------------------------
(gdb) py-list
 155    
 156            """Read up to LEN bytes and return them.
 157            Return zero-length string on EOF."""
 158    
 159            try:
>160                return self._sslobj.read(len)
 161            except SSLError, x:
 162                if x.args[0] == SSL_ERROR_EOF and self.suppress_ragged_eofs:
 163                    return ''
 164                else:
 165                    raise
(gdb) where
#0  0x00007f4c49400230 in __read_nocancel () from /lib64/libpthread.so.0
#1  0x00007f4c3dd6d30b in sock_read () from /lib64/libcrypto.so.10
#2  0x00007f4c3dd6b31b in BIO_read () from /lib64/libcrypto.so.10
#3  0x00007f4c3e0a0964 in ssl3_read_n () from /lib64/libssl.so.10
#4  0x00007f4c3e0a1ab5 in ssl3_read_bytes () from /lib64/libssl.so.10
#5  0x00007f4c3e09ef16 in ssl3_read_internal () from /lib64/libssl.so.10
#6  0x00007f4c3bc5ff5c in ?? () from /usr/lib64/python2.7/lib-dynload/_ssl.so
#7  0x00007f4c496ebcee in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
----------------------------

I'll attach the python backtrace separately, as it's a bit long.

Comment 8 Nathan Kinder 2013-10-14 23:02:48 UTC
Created attachment 812247 [details]
python backtrace

Comment 9 Scott Poore 2013-10-14 23:29:21 UTC
Nathan, FYI:

The version in the hang I saw Friday was nss-3.15.1-3.el7.x86_64.

Scott

Comment 11 Martin Kosek 2013-10-15 07:00:39 UTC
Judging by the Scott's post, it may not be NSS.

I checked Nathan's stack trace, I see it freezed in this place in PKI

#56 Frame 0x2de8470, for file /usr/lib/python2.7/site-packages/pki/client.py, line 63, in post ...
    headers=headers)
#60 Frame 0x2de8280, for file /usr/lib/python2.7/site-packages/pki/system.py, line 80, in configure ...
    r = self.connection.post('/rest/installer/configure', data, headers)

So it is apparently calling REST call '/rest/installer/configure' and it freezes. Question is - why. Nathan or Ade, can you please follow up on this one?

Comment 13 Scott Poore 2013-10-15 22:57:56 UTC
FYI: I see the original failure again using a repo from a few days ago:

2013-10-11T17:06:47Z DEBUG   [8/22]: importing CA chain to RA certificate database

(gdb) py-list
 471                    self._rbuf.write(buf.read())
 472                    return rv
 473                self._rbuf = StringIO()  # reset _rbuf.  we consume it via buf.
 474                while True:
 475                    try:
>476                        data = self._sock.recv(self._rbufsize)
 477                    except error, e:
 478                        if e.args[0] == EINTR:
 479                            continue
 480                        raise
 481                    if not data:


And I'll attach the backtrace separately.

Comment 14 Scott Poore 2013-10-15 23:00:10 UTC
Created attachment 812711 [details]
gdb py-bt backtrace

Comment 15 Scott Poore 2013-10-15 23:04:46 UTC
Created attachment 812712 [details]
gdb backtrace

Comment 16 Martin Kosek 2013-10-16 07:35:00 UTC
Ade Lee was further investigating this issue and found it is caused by Bug 1005446 - when HTTP proxy is not configured, installer does not wait for CA to be up which may cause some requests to get lost.

I am working on a fix to make the installer use local ports and thus always wait.

Comment 17 Martin Kosek 2013-10-16 07:37:13 UTC
Upstream ticket:
https://fedorahosted.org/freeipa/ticket/3973

Comment 20 Martin Kosek 2013-10-16 14:57:11 UTC
Patch proposed for https://fedorahosted.org/freeipa/ticket/3973 was acknowledged by Namita that it fixes the issue.

Comment 23 Martin Kosek 2013-10-18 08:11:22 UTC
Even though ipa-server-install now properly waits on PKI to start in all situations, the installation still occasionally freezes (actually in the waiting code). I will clone this bug to PKI to help us address it.

Comment 24 Scott Poore 2013-11-04 17:52:01 UTC
Verified.

Version ::
ipa-server-3.3.2-3.el7.x86_64

Manual Test Results ::

This was verified by running many test jobs that re-installed IPA.  After this piece of the fix, we would see this only hang at the abrt-java-connector issue from bug #1012827.

When this would still fail with the abrt-java-connector issue we'd still see the fix in /var/log/ipaserver-install.log:

2013-11-04T17:38:36Z DEBUG stderr=
2013-11-04T17:38:36Z DEBUG wait_for_open_ports: localhost [8080, 8443] timeout 120
2013-11-04T17:38:40Z DEBUG The httpd proxy is not installed, wait on local port
2013-11-04T17:38:40Z DEBUG Waiting until the CA is running

A quick check here that IPA installer is waiting properly:

[root@rhel7-5 yum.repos.d]# grep wait_for_open_ports:.*8443.*120 /var/log/ipaserver-install.log -A 5
2013-11-04T17:38:36Z DEBUG wait_for_open_ports: localhost [8080, 8443] timeout 120
2013-11-04T17:38:40Z DEBUG The httpd proxy is not installed, wait on local port
2013-11-04T17:38:40Z DEBUG Waiting until the CA is running
2013-11-04T17:38:40Z DEBUG request 'https://rhel7-5.testrelm.com:8443/ca/admin/ca/getStatus'
2013-11-04T17:38:40Z DEBUG request body ''
2013-11-04T17:38:53Z DEBUG request status 200
--
2013-11-04T17:38:56Z DEBUG wait_for_open_ports: localhost [8080, 8443] timeout 120
2013-11-04T17:39:00Z DEBUG The httpd proxy is not installed, wait on local port
2013-11-04T17:39:00Z DEBUG Waiting until the CA is running
2013-11-04T17:39:00Z DEBUG request 'https://rhel7-5.testrelm.com:8443/ca/admin/ca/getStatus'
2013-11-04T17:39:00Z DEBUG request body ''
2013-11-04T17:39:11Z DEBUG request status 200
--
2013-11-04T17:39:48Z DEBUG wait_for_open_ports: localhost [8080, 8443] timeout 120
2013-11-04T17:39:51Z DEBUG The httpd proxy is not installed, wait on local port
2013-11-04T17:39:51Z DEBUG Waiting until the CA is running
2013-11-04T17:39:51Z DEBUG request 'https://rhel7-5.testrelm.com:8443/ca/admin/ca/getStatus'
2013-11-04T17:39:51Z DEBUG request body ''
2013-11-04T17:40:02Z DEBUG request status 200

[root@rhel7-5 yum.repos.d]#

Comment 25 Ludek Smid 2014-06-13 09:53:59 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.


Note You need to log in before you can comment on or make changes to this bug.