Bug 2048470

Summary: Leapp upgrade fails after reboot with disabled postgresql redis tomcat services
Product: Red Hat Satellite Reporter: Lukas Pramuk <lpramuk>
Component: UpgradesAssignee: Evgeni Golov <egolov>
Status: CLOSED ERRATA QA Contact: Lukas Pramuk <lpramuk>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.11.0CC: egolov, gtalreja
Target Milestone: 6.11.0Keywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-07-05 14:32:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lukas Pramuk 2022-01-31 10:35:52 UTC
Description of problem:
Leapp upgrade fails after reboot with disabled postgres redis tomcat services
Enabling services is not enough to fix the issues.
Postgresql is not only disabled but also misconfigured - refuses to start
Only runnning satellite-installer explicitly fixes the issues


Version-Release number of selected component (if applicable):
Satellite 7.0.0 Snap7

How reproducible:
deterministic


Steps to Reproduce:
1. Prepare Sat7.0 rhel7 for LEAPP upgrade

2. Perform LEAPP upgrade to rhel8 and reboot

3. After reboot check the Satellite status

# systemctl status tomcat postgresql redis
● tomcat.service - Apache Tomcat Web Application Container
   Loaded: loaded (/usr/lib/systemd/system/tomcat.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

● postgresql.service - PostgreSQL database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/postgresql.service.d
           └─postgresql.conf
   Active: inactive (dead)

● redis.service - Redis persistent key-value database
   Loaded: loaded (/usr/lib/systemd/system/redis.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/redis.service.d
           └─limit.conf
   Active: inactive (dead)


Actual results:
disabled services: postgresql, redis, tomcat
misconfigured: postgresql

Expected results:
all satellite services run successfully

Comment 1 Lukas Pramuk 2022-01-31 10:41:11 UTC
After enabling postgresql it still fails to run: 

# systemctl status  postgresql
● postgresql.service - PostgreSQL database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/postgresql.service.d
           └─postgresql.conf
   Active: failed (Result: exit-code) since Mon 2022-01-31 04:58:35 EST; 38min ago
  Process: 1301 ExecStartPre=/usr/libexec/postgresql-check-db-dir postgresql (code=exited, status=1/FAILURE)

Jan 31 04:58:35 sat.example.com systemd[1]: Starting PostgreSQL database server...
Jan 31 04:58:35 sat.example.com systemd[1]: postgresql.service: Control process exited, code=exited status=1
Jan 31 04:58:35 sat.example.com systemd[1]: postgresql.service: Failed with result 'exit-code'.
Jan 31 04:58:35 sat.example.com systemd[1]: Failed to start PostgreSQL database server.

Apache doesn't look healthy too:

# systemctl status  httpd
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2022-01-31 04:58:35 EST; 39min ago
     Docs: man:httpd.service(8)
  Process: 1314 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, status=1/FAILURE)
 Main PID: 1314 (code=exited, status=1/FAILURE)

Jan 31 04:58:35 sat.example.com httpd[1314]: [Mon Jan 31 04:58:35.544207 2022] [so:warn] [pid 1314] AH01574: module proxy_module is already loaded, skipping
Jan 31 04:58:35 sat.example.com httpd[1314]: [Mon Jan 31 04:58:35.565848 2022] [so:warn] [pid 1314] AH01574: module proxy_http_module is already loaded, skipping
Jan 31 04:58:35 sat.example.com httpd[1314]: [Mon Jan 31 04:58:35.571191 2022] [so:warn] [pid 1314] AH01574: module proxy_wstunnel_module is already loaded, skipping
Jan 31 04:58:35 sat.example.com httpd[1314]: [Mon Jan 31 04:58:35.572644 2022] [so:warn] [pid 1314] AH01574: module ssl_module is already loaded, skipping
Jan 31 04:58:35 sat.example.com httpd[1314]: [Mon Jan 31 04:58:35.573149 2022] [so:warn] [pid 1314] AH01574: module systemd_module is already loaded, skipping
Jan 31 04:58:35 sat.example.com httpd[1314]: [Mon Jan 31 04:58:35.577067 2022] [so:warn] [pid 1314] AH01574: module cgi_module is already loaded, skipping
Jan 31 04:58:35 sat.example.com httpd[1314]: AH00534: httpd: Configuration error: More than one MPM loaded.
Jan 31 04:58:35 sat.example.com systemd[1]: httpd.service: Main process exited, code=exited, status=1/FAILURE
Jan 31 04:58:35 sat.example.com systemd[1]: httpd.service: Failed with result 'exit-code'.
Jan 31 04:58:35 sat.example.com systemd[1]: Failed to start The Apache HTTP Server.

Comment 3 Evgeni Golov 2022-01-31 12:57:41 UTC
the *underlying* issue is in foreman-maintain, which I filed BZ#2048517 for

but I'll take this BZ to make the code in leapp more robust

Comment 4 Evgeni Golov 2022-02-22 15:19:24 UTC
the code was made more robust and the latest builds in our repo have it, moving ON_DEV

Comment 5 Lukas Pramuk 2022-03-17 14:05:23 UTC
VERIFIED.

@Satellite 7.0.0 Snap13
leapp-0.13.0-100.202203021701Z.8d426bb.master.el7.noarch
leapp-upgrade-el7toel8-0.15.0-100.202203031950Z.9d7f141.master.el7.noarch

by the manual reproducer described in comment#0:

3) After reboot and leap_resume service finished check the Satellite status

# journalctl -qg 'leapp_resume.service: Succeeded'
Mar 17 07:52:37 sat.example.com systemd[1]: leapp_resume.service: Succeeded.

# satellite-maintain service status -b
Running Status Services
================================================================================
Get status of applicable services: 

Displaying the following service(s):
redis, postgresql, pulpcore-api, pulpcore-content, pulpcore-worker, pulpcore-worker, pulpcore-worker, pulpcore-worker, pulpcore-worker, pulpcore-worker, tomcat, dynflow-sidekiq@orchestrator, foreman, httpd, dynflow-sidekiq@worker-1, dynflow-sidekiq@worker-hosts-queue-1, foreman-proxy
- displaying redis                                 [OK]                         
- displaying postgresql                            [OK]                         
- displaying pulpcore-api                          [OK]                         
- displaying pulpcore-content                      [OK]                         
\ displaying pulpcore-worker             [OK]                         
\ displaying pulpcore-worker             [OK]                         
\ displaying pulpcore-worker             [OK]                         
\ displaying pulpcore-worker             [OK]                         
\ displaying pulpcore-worker             [OK]                         
\ displaying pulpcore-worker             [OK]                         
\ displaying tomcat                                [OK]                         
\ displaying dynflow-sidekiq@orchestrator          [OK]                         
\ displaying foreman                               [OK]                         
\ displaying httpd                                 [OK]                         
\ displaying dynflow-sidekiq@worker-1              [OK]                         
\ displaying dynflow-sidekiq@worker-hosts-queue-1  [OK]                         
\ displaying foreman-proxy                         [OK]                         
\ All services are running                                            [OK]      
--------------------------------------------------------------------------------

# hammer ping
database:         
    Status:          ok
    Server Response: Duration: 0ms
candlepin:        
    Status:          ok
    Server Response: Duration: 382ms
candlepin_auth:   
    Status:          ok
    Server Response: Duration: 68ms
candlepin_events: 
    Status:          ok
    message:         0 Processed, 0 Failed
    Server Response: Duration: 0ms
katello_events:   
    Status:          ok
    message:         0 Processed, 0 Failed
    Server Response: Duration: 1ms
pulp3:            
    Status:          ok
    Server Response: Duration: 222ms
pulp3_content:    
    Status:          ok
    Server Response: Duration: 198ms
foreman_tasks:    
    Status:          ok
    Server Response: Duration: 5ms

>>> after LEAPP upgrade to RHEL8 all Satellite services are running successfully

Comment 9 errata-xmlrpc 2022-07-05 14:32:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.11 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5498