Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1644201 - satellite-installer waits for qpidd service status, not for qpid listening on port 5671
Summary: satellite-installer waits for qpidd service status, not for qpid listening on...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Qpid
Version: 6.4
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: 6.6.0
Assignee: Evgeni Golov
QA Contact: Vladimír Sedmík
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-30 08:46 UTC by Pavel Moravec
Modified: 2022-03-13 15:54 UTC (History)
10 users (show)

Fixed In Version: foreman-installer-1.21.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-22 12:46:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 25909 0 Normal Closed satellite-installer --upgrade qpid-config returned 1 instead of one of [0] 2021-01-18 09:11:25 UTC
Red Hat Knowledge Base (Solution) 3675361 0 None None None 2018-11-01 11:21:08 UTC
Red Hat Product Errata RHSA-2019:3172 0 None None None 2019-10-22 12:47:07 UTC

Description Pavel Moravec 2018-10-30 08:46:33 UTC
Description of problem:
Installer waits for firing some actions (like pulp services startup, or namely bindings of katello_event_queue) *only* until service status returns OK and service-wait succeeds. With increased number of pulp.agent.* queues in qpidd, there is a chance qpidd takes more time than default 30s to recover all such durable queues, causing service-wait to fail and whole installer fail as well (or, once, service-wait succeeded BUT further check of katello_event_queue failed, with installer fail as well).

Please increase WAIT_MAX in

/usr/share/katello-installer-base/modules/service_wait/plugins/qpidd.sh

to some higher value.


Version-Release number of selected component (if applicable):
6.4 or any older.


How reproducible:
100% with some tweaking


Steps to Reproduce:
1. mimic having many durable queues for qpidd

for i in $(seq 1 30000); do
  qpid-config --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671 add queue satellite-installer.test.$i --durable &
  sleep 0.1
done

2. trigger installer restarting qpidd service by touching qpidd.conf:
echo >> /etc/qpid/qpidd.conf
satellite-installer


Actual results:
satellite-installer fails with:

 /Stage[main]/Candlepin::Qpid/Qpid::Config::Exchange[event]/Qpid::Config_cmd[ensure exchange event]/Exec[qpid-config ensure exchange event]/returns: change from notrun to 0 failed: qpid-config --ssl-certificate /etc/pki/katello/certs/pmoravec-sat63.gsslab.brq2.redhat.com-qpid-broker.crt --ssl-key /etc/pki/katello/private/pmoravec-sat63.gsslab.brq2.redhat.com-qpid-broker.key -b amqps://localhost:5671 add exchange topic event --durable returned 1 instead of one of [0]

usually (but not always) preceeded (and triggered by):
 /Stage[main]/Qpid::Service/Service[qpidd]: Failed to call refresh: Could not restart Service[qpidd]: Execution of '/usr/share/katello-installer-base/modules/service_wait/bin/service-wait restart qpidd' returned 6: 


Expected results:
No such errror of installer


Additional info:

Comment 2 Chris Roberts 2018-10-31 17:38:15 UTC
This is fixed upstream and will be in 6.5, we dropped the service wait puppet module so everything will be using foreman-maintain which should give enough time for services to come up. 

If you feel this needs to be fixed in a 6.4.z feel free to reopen.

Comment 3 Pavel Moravec 2018-10-31 18:18:05 UTC
I can't see where foreman-maintain waits for qpidd port to be listening or so - my understanding of the 

https://github.com/theforeman/foreman_maintain/tree/master/definitions/procedures/service

is that foreman-maintain just waits for systemd service status.


Martin, how foreman-maintain checks qpidd is operational (since there is a period of time when "systemctl status qpidd.service" returns OK but qpidd is not yet listening on its port - technically qpidd bug beneath)?

Comment 4 Pavel Moravec 2018-11-01 09:42:13 UTC
Bit artificial reproducer but IMHO this should not happen:

1) create many durable queues to qpidd (to mimic pulp.agent.* queues from katello-agent)
2) restart qpidd via foreman-maintain
3) check its status (rather directly than via f-m as f-m takes a long time to kick off)
4) netstat -anp | grep 5671 | grep LISTEN

3) must return true if and only if netstat shows qpidd listening on 5671.

BUT:

[root@pmoravec-sat64-beta ~]# check_netstat() { echo "$(date): LISTENing sockets on 5671: $(netstat -anp | grep 5671 | grep -c LISTEN)"; }
[root@pmoravec-sat64-beta ~]# date; foreman-maintain service restart --only qpidd; check_netstat; service qpidd status; echo "service qpidd status: $?"; while true; do check_netstat; sleep 1; done
Thu Nov  1 09:36:57 UTC 2018
Running preparation steps required to run the next scenarios
================================================================================
Setup hammer:                                                         [SKIPPED]
Satellite server is not running. Hammer can't be setup now.
--------------------------------------------------------------------------------


Running Restart Services
================================================================================
Check if command is run as root user:                                 [OK]
--------------------------------------------------------------------------------
Restart applicable services: 
Stopping the following service(s):

qpidd
| All services stopped                                                          
Starting the following service(s):

qpidd
/ All services started                                                [OK]      
--------------------------------------------------------------------------------

Thu Nov  1 09:37:17 UTC 2018: LISTENing sockets on 5671: 0
Redirecting to /bin/systemctl status qpidd.service
● qpidd.service - An AMQP message broker daemon.
   Loaded: loaded (/usr/lib/systemd/system/qpidd.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/qpidd.service.d
           └─limits.conf
   Active: active (running) since Thu 2018-11-01 09:37:17 UTC; 185ms ago
     Docs: man:qpidd(1)
           http://qpid.apache.org/
 Main PID: 2462 (qpidd)
   CGroup: /system.slice/qpidd.service
           └─2462 /usr/sbin/qpidd --config /etc/qpid/qpidd.conf

Nov 01 09:37:17 pmoravec-sat64-beta.sysmgmt.lan systemd[1]: Started An AMQP message broker daemon..
Nov 01 09:37:17 pmoravec-sat64-beta.sysmgmt.lan systemd[1]: Starting An AMQP message broker daemon....
service qpidd status: 0
Thu Nov  1 09:37:17 UTC 2018: LISTENing sockets on 5671: 0
Thu Nov  1 09:37:18 UTC 2018: LISTENing sockets on 5671: 0
Thu Nov  1 09:37:19 UTC 2018: LISTENing sockets on 5671: 0
Thu Nov  1 09:37:20 UTC 2018: LISTENing sockets on 5671: 0
Thu Nov  1 09:37:21 UTC 2018: LISTENing sockets on 5671: 2
Thu Nov  1 09:37:22 UTC 2018: LISTENing sockets on 5671: 2
Thu Nov  1 09:37:24 UTC 2018: LISTENing sockets on 5671: 2
Thu Nov  1 09:37:25 UTC 2018: LISTENing sockets on 5671: 2
Thu Nov  1 09:37:26 UTC 2018: LISTENing sockets on 5671: 2


See the:

service qpidd status: 0
Thu Nov  1 09:37:17 UTC 2018: LISTENing sockets on 5671: 0
Thu Nov  1 09:37:18 UTC 2018: LISTENing sockets on 5671: 0
Thu Nov  1 09:37:19 UTC 2018: LISTENing sockets on 5671: 0
Thu Nov  1 09:37:20 UTC 2018: LISTENing sockets on 5671: 0

that shouldnt happen.

If foreman-maintain (the only one instance that restarted qpidd during applying some installer change, that further checks qpidd status and continues in qpid-config commands) tries that time to invoke qpid-config (e.g. to check katello_event_queue), that check or command will fail - that is wrong.

Comment 5 Pavel Moravec 2018-11-01 09:45:16 UTC
(In reply to Pavel Moravec from comment #4)
> Bit artificial reproducer but IMHO this should not happen:

I mean on 6.4 where WAIT_MAX is removed.

Comment 6 Pavel Moravec 2018-11-01 11:21:08 UTC
Workaround: tune systemd qpidd.service unit to wait until qpidd listens on port 5671:

https://access.redhat.com/solutions/3675361

Mike C., this sounds like a bug in qpidd to me: qpidd should notify systemd *only* once it starts to listen on its port(s), not before recovering journals.

Isn't this worth fixing at the source of problems (in qpidd) rather than workarounding it in Satellite / systemd qpidd.service unit?

Comment 7 Martin Bacovsky 2018-11-19 18:09:26 UTC
> Martin, how foreman-maintain checks qpidd is operational (since there is a 
> period of time when "systemctl status qpidd.service" returns OK but qpidd
> is not yet listening on its port - technically qpidd bug beneath)?

"foreman-maintain service start" just wraps systemctl start and calls it for all Satellite related services. "foreman-maintain service restart" also waits till "hammer ping" is successful.

Comment 9 Evgeni Golov 2019-01-31 14:04:43 UTC
This is a duplicate of #1665466 (or it of this, whatever).

Comment 10 Bryan Kearney 2019-05-06 10:06:41 UTC
Upstream bug assigned to egolov

Comment 11 Bryan Kearney 2019-05-06 10:06:43 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/25909 has been resolved.

Comment 15 errata-xmlrpc 2019-10-22 12:46:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:3172


Note You need to log in before you can comment on or make changes to this bug.