Bug 1763175 - [FFU][ceph-ansible] Impossible to set health_osd_check_retries from THT when migrating to containers
Summary: [FFU][ceph-ansible] Impossible to set health_osd_check_retries from THT when ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 13.0 (Queens)
Hardware: All
OS: All
medium
medium
Target Milestone: z10
: 13.0 (Queens)
Assignee: Giulio Fidente
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-18 12:09 UTC by Alex Stupnikov
Modified: 2023-09-07 20:50 UTC (History)
8 users (show)

Fixed In Version: openstack-tripleo-common-8.7.1-6.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-10 11:22:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1849625 0 None None None 2019-10-24 08:56:51 UTC
OpenStack gerrit 690886 0 'None' MERGED Pass CephAnsibleExtraConfig as ansible extra-vars 2020-11-09 12:24:00 UTC
Red Hat Issue Tracker OSP-28281 0 None None None 2023-09-07 20:50:52 UTC
Red Hat Product Errata RHBA-2020:0760 0 None None None 2020-03-10 11:22:54 UTC

Description Alex Stupnikov 2019-10-18 12:09:11 UTC
Description of problem:

Our official documentation recommends to increase restart delays for large Ceph clusters [1] (merged as a solution for bug #1620699). Basically, we recommend the customer to set the following parameters using THT:

parameter_defaults:
  CephAnsibleExtraConfig:
    health_osd_check_delay: 40
    health_osd_check_retries: 30
    health_mon_check_retries: 10
    health_mon_check_delay: 20

The truth is that this configuration change is not a silver bullet and doesn't actually work for the bug #1620699 itself: specified parameters are hard-coded in rolling_update.yml (it is reasonable high there) and switch-from-non-containerized-to-containerized-ceph-daemons.yml (quite low there) playbooks.

I understand that this issue should be likely handled by ceph-ansible (we can increase hard-coded values) or documentation (we can tell customer to adjust playbook), but wanted to ask THT developers to make a first touch and decide which way will work for us here.


[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/fast_forward_upgrades/assembly-preparing_for_overcloud_upgrade#increasing-the-restart-delay-for-large-ceph-clusters

Comment 12 errata-xmlrpc 2020-03-10 11:22:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0760


Note You need to log in before you can comment on or make changes to this bug.