| Summary: | Reinstalling ipa server hangs when configuring certificate server | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Namita Soman <nsoman> | ||||||||||
| Component: | ipa | Assignee: | Martin Kosek <mkosek> | ||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Namita Soman <nsoman> | ||||||||||
| Severity: | high | Docs Contact: | |||||||||||
| Priority: | high | ||||||||||||
| Version: | 7.0 | CC: | alee, dpal, jgalipea, lmiksik, nkinder, nsoman, rcritten, spoore | ||||||||||
| Target Milestone: | rc | Keywords: | TestBlocker | ||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | ipa-3.3.2-3.el7 | Doc Type: | Bug Fix | ||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | |||||||||||||
| : | 1020711 (view as bug list) | Environment: | |||||||||||
| Last Closed: | 2014-06-13 09:53:59 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Bug Depends On: | 1012827, 1023157 | ||||||||||||
| Bug Blocks: | 1020711 | ||||||||||||
| Attachments: |
|
||||||||||||
Could you attach the full ipaserver-uninstall.log? The removal of the CA is failing when pkidestroy tries to remove the CA from the Security Domain. It looks like it is unable to connect to the Security Domain over port 443, which results in leaving some stuff behind. Perhaps the proxy used for the CA is already removed? When IPA server is being removed, it first shuts down all it's services and then removes the configuration: # ipa-server-install --uninstall --unattended Shutting down all IPA services Removing IPA client configuration Unconfiguring ntpd Unconfiguring CA Unconfiguring named Unconfiguring web server Unconfiguring krb5kdc Unconfiguring kadmin Unconfiguring directory server Unconfiguring ipa_memcached Unconfiguring ipa-otpd So in time when pkidestroy is called, nothing is running. But CA is being uninstalled this way for the whole time, I see this error in my F19 instance. I would assume this is something different - we need to investigate. Created attachment 812126 [details]
ipaserver uninstall log
(In reply to Martin Kosek from comment #2) > So in time when pkidestroy is called, nothing is running. But CA is being > uninstalled this way for the whole time, I see this error in my F19 > instance. I would assume this is something different - we need to > investigate. Ok, it's possible that the security domain error is a red herring that has nothing to do with the reinstallation failure. And I've got another server hanging at a different location: Configuring the web interface (httpd): Estimated time 1 minute [1/15]: disabling mod_ssl in httpd [2/15]: setting mod_nss port to 443 [3/15]: setting mod_nss password file [4/15]: enabling mod_nss renegotiate [5/15]: adding URL rewriting rules [6/15]: configuring httpd [7/15]: setting up ssl /var/log/ipaserver-install.log: 2013-10-14T19:46:44Z DEBUG [7/15]: setting up ssl 2013-10-14T19:46:44Z DEBUG Loading Index file from '/var/lib/ipa/sysrestore/sysrestore.index' 2013-10-14T19:46:44Z DEBUG Loading Index file from '/var/lib/ipa/sysrestore/sysrestore.index' 2013-10-14T19:46:44Z DEBUG Starting external process 2013-10-14T19:46:44Z DEBUG args=/usr/bin/certutil -d /etc/httpd/alias -R -s CN=beast.testrelm.com,O=TESTRELM.COM -o /var/lib/ipa/ipa-YGxSHf/tmpcertreq -k rsa -g 2048 -z /etc/httpd/alias/noise.txt -f /etc/httpd/alias/pwdfile.txt -a 2013-10-14T19:46:45Z DEBUG Process finished, return code=0 2013-10-14T19:46:45Z DEBUG stdout= 2013-10-14T19:46:45Z DEBUG stderr= Generating key. This may take a few moments... 2013-10-14T19:46:45Z DEBUG request 'https://beast.testrelm.com:8443/ca/ee/ca/profileSubmitSSLClient' 2013-10-14T19:46:45Z DEBUG request body 'profileId=caIPAserviceCert&requestor_name=IPA+Installer&cert_request=...<trunc>...&cert_request_type=pkcs10&xmlOutput=true' 2013-10-14T19:46:45Z DEBUG NSSConnection init beast.testrelm.com 2013-10-14T19:46:45Z DEBUG Connecting: <beast_ip_address>:0 I wonder if this is related to the recent nss build that was made on 10/11:
nss-3.15.1-4.el7.x86_64
We haven't build new pki-* packages recently, so I'm not sure why these issues would start popping up all of a sudden. Does this issue occur if you downgrade nss?
Strace shows that the python process for pkispawn is stuck on a read:
----------------------------
# strace -p 6178
Process 6178 attached
read(5,
----------------------------
Attaching to the python process with gdb shows that it's trying to read from a socket that's using SSL:
----------------------------
(gdb) py-list
155
156 """Read up to LEN bytes and return them.
157 Return zero-length string on EOF."""
158
159 try:
>160 return self._sslobj.read(len)
161 except SSLError, x:
162 if x.args[0] == SSL_ERROR_EOF and self.suppress_ragged_eofs:
163 return ''
164 else:
165 raise
(gdb) where
#0 0x00007f4c49400230 in __read_nocancel () from /lib64/libpthread.so.0
#1 0x00007f4c3dd6d30b in sock_read () from /lib64/libcrypto.so.10
#2 0x00007f4c3dd6b31b in BIO_read () from /lib64/libcrypto.so.10
#3 0x00007f4c3e0a0964 in ssl3_read_n () from /lib64/libssl.so.10
#4 0x00007f4c3e0a1ab5 in ssl3_read_bytes () from /lib64/libssl.so.10
#5 0x00007f4c3e09ef16 in ssl3_read_internal () from /lib64/libssl.so.10
#6 0x00007f4c3bc5ff5c in ?? () from /usr/lib64/python2.7/lib-dynload/_ssl.so
#7 0x00007f4c496ebcee in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
----------------------------
I'll attach the python backtrace separately, as it's a bit long.
Created attachment 812247 [details]
python backtrace
Nathan, FYI: The version in the hang I saw Friday was nss-3.15.1-3.el7.x86_64. Scott Judging by the Scott's post, it may not be NSS.
I checked Nathan's stack trace, I see it freezed in this place in PKI
#56 Frame 0x2de8470, for file /usr/lib/python2.7/site-packages/pki/client.py, line 63, in post ...
headers=headers)
#60 Frame 0x2de8280, for file /usr/lib/python2.7/site-packages/pki/system.py, line 80, in configure ...
r = self.connection.post('/rest/installer/configure', data, headers)
So it is apparently calling REST call '/rest/installer/configure' and it freezes. Question is - why. Nathan or Ade, can you please follow up on this one?
FYI: I see the original failure again using a repo from a few days ago:
2013-10-11T17:06:47Z DEBUG [8/22]: importing CA chain to RA certificate database
(gdb) py-list
471 self._rbuf.write(buf.read())
472 return rv
473 self._rbuf = StringIO() # reset _rbuf. we consume it via buf.
474 while True:
475 try:
>476 data = self._sock.recv(self._rbufsize)
477 except error, e:
478 if e.args[0] == EINTR:
479 continue
480 raise
481 if not data:
And I'll attach the backtrace separately.
Created attachment 812711 [details]
gdb py-bt backtrace
Created attachment 812712 [details]
gdb backtrace
Ade Lee was further investigating this issue and found it is caused by Bug 1005446 - when HTTP proxy is not configured, installer does not wait for CA to be up which may cause some requests to get lost. I am working on a fix to make the installer use local ports and thus always wait. Upstream ticket: https://fedorahosted.org/freeipa/ticket/3973 Patch proposed for https://fedorahosted.org/freeipa/ticket/3973 was acknowledged by Namita that it fixes the issue. Fixed upstream: master: https://fedorahosted.org/freeipa/changeset/dd3295ac32c0cae3234723e65175e337761ddf38 ipa-3-3: https://fedorahosted.org/freeipa/changeset/122d5ce286c74dbeb7c243093721f5e2ded837ff Even though ipa-server-install now properly waits on PKI to start in all situations, the installation still occasionally freezes (actually in the waiting code). I will clone this bug to PKI to help us address it. Verified. Version :: ipa-server-3.3.2-3.el7.x86_64 Manual Test Results :: This was verified by running many test jobs that re-installed IPA. After this piece of the fix, we would see this only hang at the abrt-java-connector issue from bug #1012827. When this would still fail with the abrt-java-connector issue we'd still see the fix in /var/log/ipaserver-install.log: 2013-11-04T17:38:36Z DEBUG stderr= 2013-11-04T17:38:36Z DEBUG wait_for_open_ports: localhost [8080, 8443] timeout 120 2013-11-04T17:38:40Z DEBUG The httpd proxy is not installed, wait on local port 2013-11-04T17:38:40Z DEBUG Waiting until the CA is running A quick check here that IPA installer is waiting properly: [root@rhel7-5 yum.repos.d]# grep wait_for_open_ports:.*8443.*120 /var/log/ipaserver-install.log -A 5 2013-11-04T17:38:36Z DEBUG wait_for_open_ports: localhost [8080, 8443] timeout 120 2013-11-04T17:38:40Z DEBUG The httpd proxy is not installed, wait on local port 2013-11-04T17:38:40Z DEBUG Waiting until the CA is running 2013-11-04T17:38:40Z DEBUG request 'https://rhel7-5.testrelm.com:8443/ca/admin/ca/getStatus' 2013-11-04T17:38:40Z DEBUG request body '' 2013-11-04T17:38:53Z DEBUG request status 200 -- 2013-11-04T17:38:56Z DEBUG wait_for_open_ports: localhost [8080, 8443] timeout 120 2013-11-04T17:39:00Z DEBUG The httpd proxy is not installed, wait on local port 2013-11-04T17:39:00Z DEBUG Waiting until the CA is running 2013-11-04T17:39:00Z DEBUG request 'https://rhel7-5.testrelm.com:8443/ca/admin/ca/getStatus' 2013-11-04T17:39:00Z DEBUG request body '' 2013-11-04T17:39:11Z DEBUG request status 200 -- 2013-11-04T17:39:48Z DEBUG wait_for_open_ports: localhost [8080, 8443] timeout 120 2013-11-04T17:39:51Z DEBUG The httpd proxy is not installed, wait on local port 2013-11-04T17:39:51Z DEBUG Waiting until the CA is running 2013-11-04T17:39:51Z DEBUG request 'https://rhel7-5.testrelm.com:8443/ca/admin/ca/getStatus' 2013-11-04T17:39:51Z DEBUG request body '' 2013-11-04T17:40:02Z DEBUG request status 200 [root@rhel7-5 yum.repos.d]# This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. |
Description of problem: Do an uninstall and re-install of ipa server and it looks like it's hanging on the re-install at: 2013-10-11T17:06:47Z DEBUG [8/22]: importing CA chain to RA certificate database Version-Release number of selected component (if applicable): ipa-server-3.3.2-2.el7.x86_64. How reproducible: always Steps to Reproduce: 1. Install ipa server 2. uninstall 3. reinstall Actual results: reinstall hangs Expected results: reinstall successfully Additional info: # ps -ef|grep ipa-server-install root 12209 4969 0 15:51 pts/0 00:00:00 grep --color=auto ipa-server-install root 15046 18725 0 13:05 ? 00:00:03 /usr/bin/python -E /usr/sbin/ipa-server-install --setup-dns --no-forwarder -p Secret123 -P Secret123 -a Secret123 -r TESTRELM.COM -n testrelm.com --ip-address=10.16.98.182 --hostname=ipaqa64vma.testrelm.com -U # date Fri Oct 11 15:54:54 EDT 2013 # tail /var/log/ipaserver-install.log 2013-10-11T17:06:46Z DEBUG The httpd proxy is not installed, skipping wait for CA 2013-10-11T17:06:46Z DEBUG duration: 4 seconds 2013-10-11T17:06:46Z DEBUG [7/22]: creating RA agent certificate database 2013-10-11T17:06:46Z DEBUG Starting external process 2013-10-11T17:06:46Z DEBUG args=/usr/bin/certutil -d /etc/httpd/alias -f XXXXXXXX -N 2013-10-11T17:06:47Z DEBUG Process finished, return code=0 2013-10-11T17:06:47Z DEBUG stdout= 2013-10-11T17:06:47Z DEBUG stderr= 2013-10-11T17:06:47Z DEBUG duration: 0 seconds 2013-10-11T17:06:47Z DEBUG [8/22]: importing CA chain to RA certificate database >From the previous ipaserver-uninstall.log, this was the only thing that stood out: Uninstalling CA from /var/lib/pki/pki-tomcat. Uninstallation complete. 2013-10-11T17:04:24Z DEBUG stderr=pkidestroy : WARNING ....... this 'CA' entry will NOT be deleted fr om security domain 'IPA'! pkidestroy : WARNING ....... security domain 'IPA' may be offline or unreachable! pkidestroy : ERROR ....... subprocess.CalledProcessError: Command '/usr/bin/sslget -n 'subsystemCe rt cert-pki-ca' -p '588648796016' -d '/etc/pki/pki-tomcat/alias' -e 'name="/var/lib/pki/pki-tomcat"&typ e=CA&list=caList&host=ipaqa64vma.testrelm.com&sport=443&ncsport=8443&adminsport=8443&agentsport=8443&op eration=remove' -v -r '/ca/agent/ca/updateDomainXML' ipaqa64vma.testrelm.com:443 2>&1' returned non-zer o exit status 6! # strace -p 15046 Process 15046 attached recvfrom(5,