Bug 1901631 - [OSP16.1] overcloud deploy fails to tripleo_swift_account_reaper container timeout
Summary: [OSP16.1] overcloud deploy fails to tripleo_swift_account_reaper container ti...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z4
: 16.1 (Train on RHEL 8.2)
Assignee: Christian Schwede (cschwede)
QA Contact: Gal Amado
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-25 17:06 UTC by ggrimaux
Modified: 2021-03-17 15:46 UTC (History)
14 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.2-1.20210104205661.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-17 15:36:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1907070 0 None None None 2020-12-07 10:41:00 UTC
OpenStack gerrit 765778 0 None MERGED Do not relabel Swift files on every container start 2021-02-19 11:54:20 UTC
Red Hat Product Errata RHBA-2021:0817 0 None None None 2021-03-17 15:36:38 UTC

Description ggrimaux 2020-11-25 17:06:02 UTC
Description of problem:
During a stack deploy with a high number of object in swift, the service tripleo_swift_account_reaper can take several minutes to start creating a timeout in the deployment:
~~~
time systemctl restart tripleo_swift_account_reaper.service

real	7m0.675s
user	0m0.021s
sys	0m0.027s
~~~

Client changed the timeout value on the server where the service is running and it worked fine after:
grep Timeout /etc/systemd/system.conf
DefaultTimeoutStartSec=1800s
DefaultTimeoutStopSec=1800s

I feel it is doing a scan when the service is started. So the number of objects (millions if its handling telemetry data) influence the start time.
Maybe it should start then do scan/verification.

I will share the error message in the next private comment.
Also have sosreport from the node in question.

If you need anything else please let me know.

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform release 16.1.1 GA (Train)
rhosp-rhel8/openstack-swift-account:16.1-50

How reproducible:
100%

Steps to Reproduce:
1. Have a lot of objects in swift
2. Try to do a stack deploy and tripleo_swift_account_reaper might take too long to start.
3.

Actual results:
Stack deploy failing (timeout)

Expected results:
Stack deploy don't fail.

Additional info:
sosreport on supportshell.

Comment 31 Gal Amado 2021-02-17 16:33:29 UTC
Verified in core_puddle: RHOS-16.1-RHEL-8-20210205.n.0

Comment 44 errata-xmlrpc 2021-03-17 15:36:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.4 director bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0817


Note You need to log in before you can comment on or make changes to this bug.