Bug 2124215

Summary: During customer backup restore installer fails with qpidd.service start-post operation timed out. Stopping.
Product: Red Hat Satellite Reporter: Lukas Pramuk <lpramuk>
Component: UpgradesAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED WONTFIX QA Contact: Lukas Pramuk <lpramuk>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.10.7CC: ahumbe, egolov, ehelms, jbhatia, mjivraja, smallamp
Target Milestone: UnspecifiedKeywords: Reopened, Triaged, Upgrades
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-17 18:32:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lukas Pramuk 2022-09-05 10:01:36 UTC
Description of problem:
During customer DB restore installer fails with: 
qpidd.service start-post operation timed out. Stopping.


Version-Release number of selected component (if applicable):
6.10.7 (backup is older 6.10.z)

How reproducible:
deterministic with the backup

Steps to Reproduce:
1. Restore using satellite-clone 
# satellite-clone -y
...
TASK [satellite-clone : restore using foreman-maintain] ************************
Sunday 04 September 2022  20:17:19 -0400 (0:00:00.699)       0:27:37.141 ****** 
fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["foreman-maintain", "restore", "--assumeyes", "/tmp/backup"], "delta": "0:20:18.813265", "end": "2022-09-04 20:37:38.924989", "msg": "non-zero return code", "rc": 1, "start": "2022-09-04 20:17:20.111724", "stderr": "", "stderr_lines": [], "stdout":
---
Running Restore backup
================================================================================
Check if command is run as root user:                                 [OK]
--------------------------------------------------------------------------------
Validate backup has appropriate files:                                [OK]
--------------------------------------------------------------------------------
Confirm dropping databases and running restore: 

WARNING: This script will drop and restore your database.
Your existing installation will be replaced with the backup database.
Once this operation is complete there is no going back.
Do you want to proceed? (assuming yes)
                                                                      [OK]      
--------------------------------------------------------------------------------
Validate hostname is the same as backup:                              [OK]
--------------------------------------------------------------------------------
Setting file security: 
/ Restoring SELinux context                                           [OK]      
--------------------------------------------------------------------------------
Restore configs from backup: 
- Restoring configs                                                   [OK]      
--------------------------------------------------------------------------------
Run installer reset: 
\ Installer reset                                                     [FAIL]    
Failed executing yes | satellite-installer -v --reset-data --disable-system-checks , exit status 6:
 2022-09-03 06:18:43 [NOTICE] [root] Loading installer configuration. This will take some time.
2022-09-03 06:18:48 [NOTICE] [root] Running installer with log based terminal output at level NOTICE.
2022-09-03 06:18:48 [NOTICE] [root] Use -l to set the terminal output log level to ERROR, WARN, NOTICE, INFO, or DEBUG. See --full-help for definitions.
Are you sure you want to continue? This will drop the databases, reset all configurations that you have made and bring all application data back to a fresh install. [y/n]
Package versions are locked. Continuing with unlock.
2022-09-03 06:20:30 [NOTICE] [pre] Dropping foreman database!
2022-09-03 06:20:30 [NOTICE] [pre] Dropping candlepin database!
2022-09-03 06:20:30 [NOTICE] [pre] Dropping pulpcore database!
2022-09-03 06:20:30 [WARN  ] [pre] Pulpcore content directory not present at '/var/lib/pulp/docroot'
2022-09-03 06:20:30 [WARN  ] [pre] Skipping system checks.
2022-09-03 06:20:30 [WARN  ] [pre] Skipping system checks.
2022-09-03 06:20:36 [NOTICE] [configure] Starting system configuration.
2022-09-03 06:20:50 [NOTICE] [configure] 250 configuration steps out of 2125 steps complete.
2022-09-03 06:21:28 [NOTICE] [configure] 500 configuration steps out of 2125 steps complete.
2022-09-03 06:21:28 [NOTICE] [configure] 750 configuration steps out of 2127 steps complete.
2022-09-03 06:21:38 [NOTICE] [configure] 1000 configuration steps out of 2132 steps complete.
2022-09-03 06:21:38 [NOTICE] [configure] 1250 configuration steps out of 2136 steps complete.
2022-09-03 06:23:09 [ERROR ] [configure] Systemd start for qpidd failed!
2022-09-03 06:23:09 [ERROR ] [configure] journalctl log for qpidd:
2022-09-03 06:23:09 [ERROR ] [configure] -- Logs begin at Sat 2022-09-03 05:46:46 EDT, end at Sat 2022-09-03 06:23:09 EDT. --
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:19:00 sat.local systemd[1]: Stopping An AMQP message broker daemon....
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:20:30 sat.local systemd[1]: qpidd.service stop-sigterm timed out. Killing.
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:20:30 sat.local systemd[1]: qpidd.service: main process exited, code=killed, status=9/KILL
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:20:30 sat.local systemd[1]: Stopped An AMQP message broker daemon..
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:20:30 sat.local systemd[1]: Unit qpidd.service entered failed state.
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:20:30 sat.local systemd[1]: qpidd.service failed.
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:21:39 sat.local systemd[1]: Starting An AMQP message broker daemon....
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:23:09 sat.local systemd[1]: qpidd.service start-post operation timed out. Stopping.
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:23:09 sat.local systemd[1]: Failed to start An AMQP message broker daemon..
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:23:09 sat.local systemd[1]: Unit qpidd.service entered failed state.
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:23:09 sat.local systemd[1]: qpidd.service failed.
2022-09-03 06:23:09 [ERROR ] [configure] /Stage[main]/Qpid::Service/Service[qpidd]/ensure: change from 'stopped' to 'running' failed: Systemd start for qpidd failed!
2022-09-03 06:23:09 [ERROR ] [configure] journalctl log for qpidd:
2022-09-03 06:23:09 [ERROR ] [configure] -- Logs begin at Sat 2022-09-03 05:46:46 EDT, end at Sat 2022-09-03 06:23:09 EDT. --
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:19:00 sat.local systemd[1]: Stopping An AMQP message broker daemon....
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:20:30 sat.local systemd[1]: qpidd.service stop-sigterm timed out. Killing.
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:20:30 sat.local systemd[1]: qpidd.service: main process exited, code=killed, status=9/KILL
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:20:30 sat.local systemd[1]: Stopped An AMQP message broker daemon..
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:20:30 sat.local systemd[1]: Unit qpidd.service entered failed state.
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:20:30 sat.local systemd[1]: qpidd.service failed.
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:21:39 sat.local systemd[1]: Starting An AMQP message broker daemon....
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:23:09 sat.local systemd[1]: qpidd.service start-post operation timed out. Stopping.
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:23:09 sat.local systemd[1]: Failed to start An AMQP message broker daemon..
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:23:09 sat.local systemd[1]: Unit qpidd.service entered failed state.
2022-09-03 06:23:09 [ERROR ] [configure] Sep 03 06:23:09 sat.local systemd[1]: qpidd.service failed.
2022-09-03 06:25:57 [NOTICE] [configure] 1500 configuration steps out of 2137 steps complete.
2022-09-03 06:35:16 [NOTICE] [configure] 1750 configuration steps out of 2137 steps complete.
2022-09-03 06:36:35 [NOTICE] [configure] 2000 configuration steps out of 2137 steps complete.
2022-09-03 06:37:55 [NOTICE] [configure] System configuration has finished.

  There were errors detected during install.
  Please address the errors and re-run the installer to ensure the system is properly configured.
  Failing to do so is likely to result in broken functionality.

  The full log is at /var/log/foreman-installer/satellite.log
Package versions are being locked.
--------------------------------------------------------------------------------
Scenario [Restore backup] failed.

The following steps ended up in failing state:

  [restore-installer-reset]


Actual results:
restore fails for this backup

Expected results:
restore succeeds for this backup

Comment 16 Eric Helms 2023-01-17 18:32:32 UTC
As we are unable to reliably produce this, and it appears related to the load on the system causing a one off race condition. Given the fix is simple, and this is an outlier bug I am opting to close it for now.