Bug 1901631

Summary: [OSP16.1] overcloud deploy fails to tripleo_swift_account_reaper container timeout
Product: Red Hat OpenStack Reporter: ggrimaux
Component: openstack-tripleo-heat-templatesAssignee: Christian Schwede (cschwede) <cschwede>
Status: CLOSED ERRATA QA Contact: Gal Amado <gamado>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: cschwede, derekh, gcharot, gfidente, igallagh, lbezdick, ljozsa, mburns, mvalsecc, pmorey, slinaber, spower, tkajinam, zaitcev
Target Milestone: z4Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.2-1.20210104205661.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-17 15:36:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description ggrimaux 2020-11-25 17:06:02 UTC
Description of problem:
During a stack deploy with a high number of object in swift, the service tripleo_swift_account_reaper can take several minutes to start creating a timeout in the deployment:
~~~
time systemctl restart tripleo_swift_account_reaper.service

real	7m0.675s
user	0m0.021s
sys	0m0.027s
~~~

Client changed the timeout value on the server where the service is running and it worked fine after:
grep Timeout /etc/systemd/system.conf
DefaultTimeoutStartSec=1800s
DefaultTimeoutStopSec=1800s

I feel it is doing a scan when the service is started. So the number of objects (millions if its handling telemetry data) influence the start time.
Maybe it should start then do scan/verification.

I will share the error message in the next private comment.
Also have sosreport from the node in question.

If you need anything else please let me know.

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform release 16.1.1 GA (Train)
rhosp-rhel8/openstack-swift-account:16.1-50

How reproducible:
100%

Steps to Reproduce:
1. Have a lot of objects in swift
2. Try to do a stack deploy and tripleo_swift_account_reaper might take too long to start.
3.

Actual results:
Stack deploy failing (timeout)

Expected results:
Stack deploy don't fail.

Additional info:
sosreport on supportshell.

Comment 31 Gal Amado 2021-02-17 16:33:29 UTC
Verified in core_puddle: RHOS-16.1-RHEL-8-20210205.n.0

Comment 44 errata-xmlrpc 2021-03-17 15:36:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0817