Bug 1644201

Summary: satellite-installer waits for qpidd service status, not for qpid listening on port 5671
Product: Red Hat Satellite Reporter: Pavel Moravec <pmoravec>
Component: QpidAssignee: Evgeni Golov <egolov>
Status: CLOSED ERRATA QA Contact: Vladimír Sedmík <vsedmik>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.4CC: andrew.schofield, chrobert, daniele, egolov, ehelms, hmore, ktordeur, mbacovsk, mcressma, vsedmik
Target Milestone: 6.6.0Keywords: Reopened, Triaged
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: foreman-installer-1.21.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-22 12:46:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pavel Moravec 2018-10-30 08:46:33 UTC
Description of problem:
Installer waits for firing some actions (like pulp services startup, or namely bindings of katello_event_queue) *only* until service status returns OK and service-wait succeeds. With increased number of pulp.agent.* queues in qpidd, there is a chance qpidd takes more time than default 30s to recover all such durable queues, causing service-wait to fail and whole installer fail as well (or, once, service-wait succeeded BUT further check of katello_event_queue failed, with installer fail as well).

Please increase WAIT_MAX in

/usr/share/katello-installer-base/modules/service_wait/plugins/qpidd.sh

to some higher value.


Version-Release number of selected component (if applicable):
6.4 or any older.


How reproducible:
100% with some tweaking


Steps to Reproduce:
1. mimic having many durable queues for qpidd

for i in $(seq 1 30000); do
  qpid-config --ssl-certificate=/etc/pki/katello/qpid_client_striped.crt -b amqps://localhost:5671 add queue satellite-installer.test.$i --durable &
  sleep 0.1
done

2. trigger installer restarting qpidd service by touching qpidd.conf:
echo >> /etc/qpid/qpidd.conf
satellite-installer


Actual results:
satellite-installer fails with:

 /Stage[main]/Candlepin::Qpid/Qpid::Config::Exchange[event]/Qpid::Config_cmd[ensure exchange event]/Exec[qpid-config ensure exchange event]/returns: change from notrun to 0 failed: qpid-config --ssl-certificate /etc/pki/katello/certs/pmoravec-sat63.gsslab.brq2.redhat.com-qpid-broker.crt --ssl-key /etc/pki/katello/private/pmoravec-sat63.gsslab.brq2.redhat.com-qpid-broker.key -b amqps://localhost:5671 add exchange topic event --durable returned 1 instead of one of [0]

usually (but not always) preceeded (and triggered by):
 /Stage[main]/Qpid::Service/Service[qpidd]: Failed to call refresh: Could not restart Service[qpidd]: Execution of '/usr/share/katello-installer-base/modules/service_wait/bin/service-wait restart qpidd' returned 6: 


Expected results:
No such errror of installer


Additional info:

Comment 2 Chris Roberts 2018-10-31 17:38:15 UTC
This is fixed upstream and will be in 6.5, we dropped the service wait puppet module so everything will be using foreman-maintain which should give enough time for services to come up. 

If you feel this needs to be fixed in a 6.4.z feel free to reopen.

Comment 3 Pavel Moravec 2018-10-31 18:18:05 UTC
I can't see where foreman-maintain waits for qpidd port to be listening or so - my understanding of the 

https://github.com/theforeman/foreman_maintain/tree/master/definitions/procedures/service

is that foreman-maintain just waits for systemd service status.


Martin, how foreman-maintain checks qpidd is operational (since there is a period of time when "systemctl status qpidd.service" returns OK but qpidd is not yet listening on its port - technically qpidd bug beneath)?

Comment 4 Pavel Moravec 2018-11-01 09:42:13 UTC
Bit artificial reproducer but IMHO this should not happen:

1) create many durable queues to qpidd (to mimic pulp.agent.* queues from katello-agent)
2) restart qpidd via foreman-maintain
3) check its status (rather directly than via f-m as f-m takes a long time to kick off)
4) netstat -anp | grep 5671 | grep LISTEN

3) must return true if and only if netstat shows qpidd listening on 5671.

BUT:

[root@pmoravec-sat64-beta ~]# check_netstat() { echo "$(date): LISTENing sockets on 5671: $(netstat -anp | grep 5671 | grep -c LISTEN)"; }
[root@pmoravec-sat64-beta ~]# date; foreman-maintain service restart --only qpidd; check_netstat; service qpidd status; echo "service qpidd status: $?"; while true; do check_netstat; sleep 1; done
Thu Nov  1 09:36:57 UTC 2018
Running preparation steps required to run the next scenarios
================================================================================
Setup hammer:                                                         [SKIPPED]
Satellite server is not running. Hammer can't be setup now.
--------------------------------------------------------------------------------


Running Restart Services
================================================================================
Check if command is run as root user:                                 [OK]
--------------------------------------------------------------------------------
Restart applicable services: 
Stopping the following service(s):

qpidd
| All services stopped                                                          
Starting the following service(s):

qpidd
/ All services started                                                [OK]      
--------------------------------------------------------------------------------

Thu Nov  1 09:37:17 UTC 2018: LISTENing sockets on 5671: 0
Redirecting to /bin/systemctl status qpidd.service
● qpidd.service - An AMQP message broker daemon.
   Loaded: loaded (/usr/lib/systemd/system/qpidd.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/qpidd.service.d
           └─limits.conf
   Active: active (running) since Thu 2018-11-01 09:37:17 UTC; 185ms ago
     Docs: man:qpidd(1)
           http://qpid.apache.org/
 Main PID: 2462 (qpidd)
   CGroup: /system.slice/qpidd.service
           └─2462 /usr/sbin/qpidd --config /etc/qpid/qpidd.conf

Nov 01 09:37:17 pmoravec-sat64-beta.sysmgmt.lan systemd[1]: Started An AMQP message broker daemon..
Nov 01 09:37:17 pmoravec-sat64-beta.sysmgmt.lan systemd[1]: Starting An AMQP message broker daemon....
service qpidd status: 0
Thu Nov  1 09:37:17 UTC 2018: LISTENing sockets on 5671: 0
Thu Nov  1 09:37:18 UTC 2018: LISTENing sockets on 5671: 0
Thu Nov  1 09:37:19 UTC 2018: LISTENing sockets on 5671: 0
Thu Nov  1 09:37:20 UTC 2018: LISTENing sockets on 5671: 0
Thu Nov  1 09:37:21 UTC 2018: LISTENing sockets on 5671: 2
Thu Nov  1 09:37:22 UTC 2018: LISTENing sockets on 5671: 2
Thu Nov  1 09:37:24 UTC 2018: LISTENing sockets on 5671: 2
Thu Nov  1 09:37:25 UTC 2018: LISTENing sockets on 5671: 2
Thu Nov  1 09:37:26 UTC 2018: LISTENing sockets on 5671: 2


See the:

service qpidd status: 0
Thu Nov  1 09:37:17 UTC 2018: LISTENing sockets on 5671: 0
Thu Nov  1 09:37:18 UTC 2018: LISTENing sockets on 5671: 0
Thu Nov  1 09:37:19 UTC 2018: LISTENing sockets on 5671: 0
Thu Nov  1 09:37:20 UTC 2018: LISTENing sockets on 5671: 0

that shouldnt happen.

If foreman-maintain (the only one instance that restarted qpidd during applying some installer change, that further checks qpidd status and continues in qpid-config commands) tries that time to invoke qpid-config (e.g. to check katello_event_queue), that check or command will fail - that is wrong.

Comment 5 Pavel Moravec 2018-11-01 09:45:16 UTC
(In reply to Pavel Moravec from comment #4)
> Bit artificial reproducer but IMHO this should not happen:

I mean on 6.4 where WAIT_MAX is removed.

Comment 6 Pavel Moravec 2018-11-01 11:21:08 UTC
Workaround: tune systemd qpidd.service unit to wait until qpidd listens on port 5671:

https://access.redhat.com/solutions/3675361

Mike C., this sounds like a bug in qpidd to me: qpidd should notify systemd *only* once it starts to listen on its port(s), not before recovering journals.

Isn't this worth fixing at the source of problems (in qpidd) rather than workarounding it in Satellite / systemd qpidd.service unit?

Comment 7 Martin Bacovsky 2018-11-19 18:09:26 UTC
> Martin, how foreman-maintain checks qpidd is operational (since there is a 
> period of time when "systemctl status qpidd.service" returns OK but qpidd
> is not yet listening on its port - technically qpidd bug beneath)?

"foreman-maintain service start" just wraps systemctl start and calls it for all Satellite related services. "foreman-maintain service restart" also waits till "hammer ping" is successful.

Comment 9 Evgeni Golov 2019-01-31 14:04:43 UTC
This is a duplicate of #1665466 (or it of this, whatever).

Comment 10 Bryan Kearney 2019-05-06 10:06:41 UTC
Upstream bug assigned to egolov

Comment 11 Bryan Kearney 2019-05-06 10:06:43 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/25909 has been resolved.

Comment 15 errata-xmlrpc 2019-10-22 12:46:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:3172