Bug 1907801

Summary:	satellite-service restart takes 3 to 5 minutes
Product:	Red Hat Satellite	Reporter:	Devendra Singh <desingh>
Component:	Satellite Maintain	Assignee:	Amit Upadhye <aupadhye>
Status:	CLOSED ERRATA	QA Contact:	Gaurav Talreja <gtalreja>
Severity:	high	Docs Contact:
Priority:	high
Version:	6.9.0	CC:	ahumbe, apatel, aupadhye, ehelms, inecas, jsherril, kgaikwad, mmccune, pcreech, smallamp, swadeley, vsedmik
Target Milestone:	6.9.0	Keywords:	PrioBumpGSS, Regression, Triaged
Target Release:	Unused
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	rubygem-foreman_maintain-0.7.8	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1917883 (view as bug list)		Environment:
Last Closed:	2021-04-21 14:48:24 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1917883

Description Devendra Singh 2020-12-15 09:42:26 UTC

Description of problem: satellite-service restart takes 3 to 5 minutes 


Version-Release number of selected component (if applicable):
6.9 Snap5

How reproducible:
always

Steps to Reproduce:
1. Install 6.9 satellite 
2. Restart satellite services 
3. Satellite services restarted successfully.
4. Satellite service restart takes 3 to 5 minutes


# date;foreman-maintain service restart;date
Tue Dec 15 03:40:36 EST 2020
Running Restart Services
================================================================================
Check if command is run as root user:                                 [OK]
--------------------------------------------------------------------------------
Restart applicable services: 
.....
.....                                                      

Starting the following service(s):
....
....
Tue Dec 15 03:44:31 EST 2020


Actual results:
satellite-service restart takes 3 to 5 minutes 

Expected results:
satellite-service restart should take 1 to 2 minutes 

Additional info:

Comment 1 Eric Helms 2020-12-17 18:57:34 UTC

This is going to vary system to system. If you can point to a particular service taking longer than it ought to we can investigate this. Else wise I would ask that this be closed not a bug.

Comment 2 Devendra Singh 2020-12-18 10:10:23 UTC

I have seen this problem on all the 6.9 setups, dynflow-sidekiq@orchestrator , foreman, puppetserver, dynflow-sidekiq@worker, dynflow-sidekiq@worker-hosts-queue services start takes a bit longer.

Comment 3 Devendra Singh 2020-12-18 10:57:00 UTC

On the upgraded setup,  satellite service restart took 12 minutes, I think some, this is something unexpected.

# date; foreman-maintain service restart; date
Fri Dec 18 05:04:50 EST 2020
Running Restart Services
================================================================================
Check if command is run as root user:                                 [OK]
--------------------------------------------------------------------------------
Restart applicable services: 

Stopping the following service(s):
rh-mongodb34-mongod, postgresql, qdrouterd, qpidd, rh-redis5-redis, squid, pulp_celerybeat, pulp_resource_manager, pulp_streamer, pulp_workers, smart_proxy_dynflow_core, tomcat, dynflow-sidekiq@orchestrator, elasticsearch, foreman, httpd, puppetserver, dynflow-sidekiq@worker, dynflow-sidekiq@worker-hosts-queue, foreman-proxy
\ stopping foreman                                                              
Warning: Stopping foreman.service, but it can still be activated by:
  foreman.socket
- All services stopped                                                          

Starting the following service(s):
rh-mongodb34-mongod, postgresql, qdrouterd, qpidd, rh-redis5-redis, squid, pulp_celerybeat, pulp_resource_manager, pulp_streamer, pulp_workers, smart_proxy_dynflow_core, tomcat, dynflow-sidekiq@orchestrator, elasticsearch, foreman, httpd, puppetserver, dynflow-sidekiq@worker, dynflow-sidekiq@worker-hosts-queue, foreman-proxy
/ starting dynflow-sidekiq@orchestrator                                         
\ starting dynflow-sidekiq@orchestrator                                         
/ starting dynflow-sidekiq@orchestrator                                         
| starting dynflow-sidekiq@orchestrator                                         
\ All services started                                                [OK]      
--------------------------------------------------------------------------------

Fri Dec 18 05:16:00 EST 2020

#

Comment 7 Eric Helms 2021-01-19 15:20:36 UTC

After some digging, it was found that on slower machines any service or action that loads the underlying Rails application stack can take a long time (~2 minutes) on the reproducer machine. This can be mitigated through:

 * changing foreman-maintain service actions to happen in parallel given systemd can handle all this
 * investigating speed ups to the Rails loading process

Therefore, this BZ has been modified to focus on the foreman-maintain improvements. An additional bug has been created as a clone (https://bugzilla.redhat.com/show_bug.cgi?id=1917883) to handle future investigations into loading the stack quicker.

Comment 8 Eric Helms 2021-01-19 15:23:50 UTC

Created redmine issue https://projects.theforeman.org/issues/31680 from this bug

Comment 9 Amit Upadhye 2021-02-10 12:33:24 UTC

Hello Devendra,

I have raised PR[1] and I can see 1 minute faster restart. Can you give it a try?
[1] https://github.com/theforeman/foreman_maintain/pull/446

Thank You,
Amit Upadhye.

Comment 10 Sudhir Mallamprabhakara 2021-02-12 03:10:32 UTC

Adding Need Info on Devendra.

Comment 11 Devendra Singh 2021-02-12 05:57:23 UTC

(In reply to Amit Upadhye from comment #9)
> Hello Devendra,
> 
> I have raised PR[1] and I can see 1 minute faster restart. Can you give it a
> try?
> [1] https://github.com/theforeman/foreman_maintain/pull/446
> 
> Thank You,
> Amit Upadhye.

Hi Amit, 

I have verified your PR on 6.9 Snap 12 and observed the great performance improvement. In 6.9 Snap 9, the service restart took ~ 12 minutes on the upgrade box, and now it takes 5 minutes.

# date; foreman-maintain service restart;date
Thu Feb 11 23:36:16 EST 2021
Running Restart Services
================================================================================
Check if command is run as root user:                                 [OK]
--------------------------------------------------------------------------------
Restart applicable services: 

Stopping the following service(s):
rh-mongodb34-mongod, postgresql, qdrouterd, qpidd, rh-redis5-redis, squid, pulp_celerybeat, pulp_resource_manager, pulp_streamer, pulp_workers, smart_proxy_dynflow_core, tomcat, dynflow-sidekiq@orchestrator, elasticsearch, foreman, httpd, puppetserver, dynflow-sidekiq@worker, dynflow-sidekiq@worker-hosts-queue, foreman-proxy
\ All services stopped                                                          

Starting the following service(s):
rh-mongodb34-mongod, postgresql, qdrouterd, qpidd, rh-redis5-redis, squid, pulp_celerybeat, pulp_resource_manager, pulp_streamer, pulp_workers, smart_proxy_dynflow_core, tomcat, dynflow-sidekiq@orchestrator, elasticsearch, foreman, httpd, puppetserver, dynflow-sidekiq@worker, dynflow-sidekiq@worker-hosts-queue, foreman-proxy
- All services started                                                [OK]      
--------------------------------------------------------------------------------

Thu Feb 11 23:41:45 EST 2021
# 

I have tested this PR on the newly installed satellite too and saw the performance improvement also there. In 6.9 Snap9 the service restart took 4 to 5 minutes but now it takes 2 to 3 minutes.


# date; foreman-maintain service restart;date
Fri Feb 12 00:21:08 EST 2021
Running Restart Services
================================================================================
Check if command is run as root user:                                 [OK]
--------------------------------------------------------------------------------
Restart applicable services: 

Stopping the following service(s):
rh-mongodb34-mongod, postgresql, qdrouterd, qpidd, rh-redis5-redis, squid, pulp_celerybeat, pulp_resource_manager, pulp_streamer, pulp_workers, smart_proxy_dynflow_core, tomcat, dynflow-sidekiq@orchestrator, foreman, httpd, puppetserver, dynflow-sidekiq@worker, dynflow-sidekiq@worker-hosts-queue, foreman-proxy
| All services stopped                                                          

Starting the following service(s):
rh-mongodb34-mongod, postgresql, qdrouterd, qpidd, rh-redis5-redis, squid, pulp_celerybeat, pulp_resource_manager, pulp_streamer, pulp_workers, smart_proxy_dynflow_core, tomcat, dynflow-sidekiq@orchestrator, foreman, httpd, puppetserver, dynflow-sidekiq@worker, dynflow-sidekiq@worker-hosts-queue, foreman-proxy
\ All services started                                                [OK]      
--------------------------------------------------------------------------------

Fri Feb 12 00:23:45 EST 2021

Comment 12 Bryan Kearney 2021-03-02 12:03:37 UTC

Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/31680 has been resolved.

Comment 13 Justin Sherrill 2021-03-10 17:50:28 UTC

This breaks restarting of pulpcore-worker@* services.  This service needs --all passed to it for both start and stop and it doesn't seem like the change does that?

Comment 15 Justin Sherrill 2021-03-10 19:12:59 UTC

Amit, this particular code:

https://github.com/theforeman/foreman_maintain/blob/master/lib/foreman_maintain/utils/service/systemd.rb#L23 

is what handled this along side:

https://github.com/theforeman/foreman_maintain/blob/master/definitions/features/pulpcore.rb#L16


Maybe we could look for that all option and start those separately ?

Comment 16 Amit Upadhye 2021-03-15 15:25:31 UTC

Moving it to assigned state to fix concern mentioned in comment 15

Comment 17 Mike McCune 2021-03-19 21:23:17 UTC

Amit, looks like https://github.com/theforeman/foreman_maintain/pull/458/files got merged, should this be in POST?

Comment 18 Gaurav Talreja 2021-04-02 13:57:11 UTC

Verified.

Tested on: 
Satellite 6.9.0 Snap 19.1
Version: rubygem-foreman_maintain-0.7.8-1.el7sat.noarch

Steps:
# time foreman-maintain service restart

Observations:
- On the newly installed 6.9.0 Satellite/Capsule, it takes 2 to 3 minutes for service restart.
- On upgrade setup of 6.9.0 Satellite/Capsule, it takes 4 to 5 minutes for service restart. 
- On 6.9.0 Satellite with pulpcore services, all pulpcore-worker@* services are being restarted and it takes 2 to 3 minutes for service restart.

Comment 21 errata-xmlrpc 2021-04-21 14:48:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Satellite 6.9 Satellite Maintenance Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1312