Bug 1193584
| Summary: | Race condition causes qpid-configure command to fail | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Og Maciel <omaciel> | ||||
| Component: | Installation | Assignee: | Eric Helms <ehelms> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Jan HutaĆ <jhutar> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | Nightly | CC: | bbuckingham, bkearney, jhutar, jmontleo | ||||
| Target Milestone: | Unspecified | Keywords: | Triaged | ||||
| Target Release: | Unused | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Linux | ||||||
| URL: | http://projects.theforeman.org/issues/9364 | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2015-08-12 05:26:16 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
10:52:17 al: omaciel, okay, I think I have an idea 10:52:31 al: here's what happens when I try to connect with a totally bogus certificate: 10:52:38 al: Failed: ConnectError: [Errno 1] _ssl.c:504: error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca 10:52:54 al: but in your paste I see [Errno 111] Connection refused 10:53:22 al: is it possible that the qpid daemon is still in the process of restarting when the qpid-configure command is issued? 10:54:08 al: yeah, if I try to connect to a bogus port, I get the same error. 10:54:24 al: You may need a "sleep 10" in the puppet configs somewhere 10:55:27 al: this is on RHEL 7 too and I have noticed that systemd is a lot more lenient that sysVinit about believing that services have started. 10:55:32 omaciel: al: hmmm Let me file a BZ and maybe you can chime in? 10:55:55 al: e.g. when I "systemctl restart tomcat" it returns instantly even though Tomcat will take 20 seconds to start 10:58:42 al: looks like someone else has the same issue: https://github.com/Stebalien/systemd-wait/blob/master/systemd-wait Adding link to upstream redmine issue which looks very similar. After some investigation, I think the problem is that Puppet is trying to connect to the qpid daemon before it has started up completely. If I try to connect to a port that I know qpid is NOT running on, I get the same error message (note that I'm just running "exchanges" to list the existing exchanges instead of trying to create a new one): # qpid-config --ssl-certificate /etc/pki/katello/certs/katello-apache.crt --ssl-key /etc/pki/katello/private/katello-apache.key -b 'amqps://cloud-qe-8.idmqe.lab.eng.bos.redhat.com:9000' exchanges Failed: ConnectError: [Errno 111] Connection refused If I try to connect with a certificate that I know is bogus, I get a different type of error: # openssl genrsa -out foo.key 2048 # openssl req -new -key foo.key -out foo.csr -subj "/C=US/CN=localhost" # openssl x509 -req -in foo.csr -days 365 -signkey foo.key -out foo.crt # qpid-config --ssl-certificate foo.crt --ssl-key foo.key -b 'amqps://cloud-qe-8.idmqe.lab.eng.bos.redhat.com:5671' exchanges Failed: ConnectError: [Errno 1] _ssl.c:504: error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca And if I run the command after Puppet has already quit, I am able to connect. This error appears on a RHEL 7 box and systemd is much more aggressive than SysVInit at telling you services have started. For example, if I restart Tomcat on my system, systemctl returns almost immediately even though Tomcat actually takes about 20 seconds to start. Other people seem to have this issue as well and have written various solutions: https://gist.github.com/ghedo/4b005ce111591781e3fd https://github.com/Stebalien/systemd-wait The puppet configuration scripts will probably either need something like this or just a "sleep 10" if a simpler (albeit less robust) approach is desired. Connecting redmine issue http://projects.theforeman.org/issues/9364 from this bug Moving to POST since upstream bug http://projects.theforeman.org/issues/9364 has been closed ------------- Eric Helms Testing this patch for this issue -- https://github.com/Katello/puppet-service_wait/pull/8 ------------- Eric Helms Applied in changeset commit:katello-installer|7097861d5de1c233170050417afaf2e721871ff7. This bug is slated to be released with Satellite 6.1. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2015:1592 |
Created attachment 992766 [details] foreman-debug Description of problem: While configuring Katello nightly on my RHEL 7.0 system I kept getting stuck with the following error: [DEBUG 2015-02-17 10:10:10 main] Executing 'qpid-config --ssl-certificate /etc/pki/katello/certs/java-client.crt --ssl-key /etc/pki/katello/private/java-client.key -b 'amqps://<SERVER>:5671' exchanges event' [DEBUG 2015-02-17 10:10:10 main] /Stage[main]/Certs::Candlepin/Exec[create candlepin qpid exchange]/unless: Failed: ConnectError: [Errno 111] Connection refused [DEBUG 2015-02-17 10:10:10 main] Exec[create candlepin qpid exchange](provider=posix): Executing 'qpid-config --ssl-certificate /etc/pki/katello/certs/java-client.crt --ssl-key /etc/pki/katello/private/java-client.key -b 'amqps://<SERVER>:5671' add exchange topic event --durable' [DEBUG 2015-02-17 10:10:10 main] Executing 'qpid-config --ssl-certificate /etc/pki/katello/certs/java-client.crt --ssl-key /etc/pki/katello/private/java-client.key -b 'amqps://<SERVER>:5671' add exchange topic event --durable' [ WARN 2015-02-17 10:10:10 main] /Stage[main]/Certs::Candlepin/Exec[create candlepin qpid exchange]/returns: Failed: ConnectError: [Errno 111] Connection refused [ERROR 2015-02-17 10:10:10 main] qpid-config --ssl-certificate /etc/pki/katello/certs/java-client.crt --ssl-key /etc/pki/katello/private/java-client.key -b 'amqps://<SERVER>:5671' add exchange topic event --durable returned 1 instead of one of [0] After talking to Al he told me that perhaps the issue is due to systemd may be reporting that qpid has been restarted even though it may still be in the process of doing so, thus causing the command qpid-configure to fail over and over. Version-Release number of selected component (if applicable): * candlepin-0.9.41-1.el7.noarch * candlepin-common-1.0.20-1.el7.noarch * candlepin-selinux-0.9.41-1.el7.noarch * candlepin-tomcat-0.9.41-1.el7.noarch * elasticsearch-0.90.10-7.el7.noarch * foreman-1.8.0-0.develop.201502121510git9edf91b.el7.noarch * foreman-compute-1.8.0-0.develop.201502121510git9edf91b.el7.noarch * foreman-debug-1.8.0-0.develop.201502121510git9edf91b.el7.noarch * foreman-gce-1.8.0-0.develop.201502121510git9edf91b.el7.noarch * foreman-libvirt-1.8.0-0.develop.201502121510git9edf91b.el7.noarch * foreman-ovirt-1.8.0-0.develop.201502121510git9edf91b.el7.noarch * foreman-postgresql-1.8.0-0.develop.201502121510git9edf91b.el7.noarch * foreman-proxy-1.8.0-0.develop.201502121459git0207401.el7.noarch * foreman-release-1.8.0-0.develop.201502121510git9edf91b.el7.noarch * foreman-selinux-1.8.0-0.develop.201412151103gite2863e4.el7.noarch * foreman-vmware-1.8.0-0.develop.201502121510git9edf91b.el7.noarch * katello-2.2.0-1.201502161312git49289e5.el7.noarch * katello-certs-tools-2.0.1-1.el7.noarch * katello-common-2.2.0-1.201502161312git49289e5.el7.noarch * katello-default-ca-1.0-1.noarch * katello-installer-2.2.0-1.201502160627gite8ff373.el7.noarch * katello-installer-base-2.2.0-1.201502160627gite8ff373.el7.noarch * katello-repos-2.1.1-1.el7.noarch * katello-server-ca-1.0-1.noarch * openldap-2.4.39-3.el7.x86_64 * pulp-docker-plugins-0.2.1-0.2.beta.el7.noarch * pulp-katello-0.3-3.el7.noarch * pulp-nodes-common-2.5.1-1.el7.noarch * pulp-nodes-parent-2.5.1-1.el7.noarch * pulp-puppet-plugins-2.5.1-2.katello.el7.noarch * pulp-puppet-tools-2.5.1-2.katello.el7.noarch * pulp-rpm-plugins-2.5.1-1.el7.noarch * pulp-selinux-2.5.1-1.el7.noarch * pulp-server-2.5.1-1.el7.noarch * python-ldap-2.4.6-6.el7.x86_64 * ruby193-rubygem-ldap_fluff-0.3.3-1.el7.noarch * ruby193-rubygem-net-ldap-0.10.0-1.el7.noarch * ruby193-rubygem-runcible-1.3.1-1.el7.noarch * rubygem-hammer_cli-0.1.4-1.201502121207git0ab2866.el7.noarch * rubygem-hammer_cli_foreman-0.1.4-1.201501221305git706b057.el7.noarch * rubygem-hammer_cli_foreman_bootdisk-0.1.2-1.el7.noarch * rubygem-hammer_cli_foreman_tasks-0.0.3-2.201409091410git163c264.git.0.988ca80.el7.noarch * rubygem-hammer_cli_gutterball-0.0.1-1.201501072024git01fe139.git.0.06e884f.el7.noarch * rubygem-hammer_cli_import-0.10.4-1.el7.noarch * rubygem-hammer_cli_katello-0.0.7-1.201502061831git68a34d6.git.0.dd5c904.el7.noarch How reproducible: Steps to Reproduce: 1. Install katello nightly using the katello-deploy script on RHEL 7.0 system 2. 3. Actual results: Expected results: Additional info: