Bug 2028171 - [OSP17] RetryFilter nova-scheduler template are deprecated
Summary: [OSP17] RetryFilter nova-scheduler template are deprecated
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-validations
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: beta
: 17.0
Assignee: James Slagle
QA Contact: Khomesh Thakre
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-01 16:13 UTC by Eran Kuris
Modified: 2022-09-21 12:18 UTC (History)
15 users (show)

Fixed In Version: openstack-tripleo-validations-14.2.2-0.20220322010856.6614654.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2061740 (view as bug list)
Environment:
Last Closed: 2022-09-21 12:18:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gerrithub.io 528887 0 None None None 2021-12-07 11:47:26 UTC
Launchpad 1897331 0 None None None 2021-12-01 16:13:32 UTC
OpenStack gerrit 832904 0 None MERGED Validating that RetryFilter is not in nova 2022-07-27 13:31:55 UTC
Red Hat Issue Tracker OSP-11136 0 None None None 2021-12-01 16:15:46 UTC
Red Hat Issue Tracker UPG-4922 0 None None None 2022-01-24 14:21:19 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:18:37 UTC

Description Eran Kuris 2021-12-01 16:13:32 UTC
Description of problem:
It is possible to deploy OSP17 and use the RetryFilter in the nova-scheduler template while they are deprecated.
The deployment is passing but when we check the nova services it looks like the nova-scheduler is not running.

overcloud) [stack@undercloud-0 ~]$ nova service-list
-------------------------------------------------------------------------------

Id                                   	Binary         	Host                      	Zone     	Status  	State	Updated_at                 	Disabled Reason	Forced down
--------------------------------------------------------------------------------

2797fef1-2503-456b-a60d-5f0025d5ba62	nova-conductor	controller-1.redhat.local	internal	enabled	up    	2021-12-01T09:47:10.000000	
              
False       
63a3f84c-c52e-4a7d-9fe5-6fb2081e09b4	nova-conductor	controller-2.redhat.local	internal	enabled	up    	2021-12-01T09:47:11.000000	
              
False       
5d9c46ff-d3d6-4a03-b8ca-8325877f1467	nova-conductor	controller-0.redhat.local	internal	enabled	up    	2021-12-01T09:47:11.000000	
              
False       
7943715f-af52-4dcd-9a18-e2381fbed829	nova-compute   	compute-1.redhat.local    	nova     	enabled	up    	2021-12-01T09:47:14.000000	
              
False       
29002683-7e8c-49e0-9d02-1b3e31555a52	nova-compute   	compute-0.redhat.local    	nova     	enabled	up    	2021-12-01T09:47:14.000000	
              
False       
--------------------------------------------------------------------------------

We need to avoid the user using those templates and failing the deployment at the beginning in case the user-provided deprecated templates.

in case the THT is provided we can see this Traceback in the logs :

2021-11-30 15:09:19.844 8 CRITICAL nova [req-e4c01711-5492-405b-ada8-74d46d458cdb - - - - -] Unhandled error: nova.exception.SchedulerHostFilterNotFound: Scheduler Host Filter RetryFilter could not be found.
2021-11-30 15:09:19.844 8 ERROR nova Traceback (most recent call last):
2021-11-30 15:09:19.844 8 ERROR nova   File "/usr/bin/nova-scheduler", line 10, in <module>
2021-11-30 15:09:19.844 8 ERROR nova     sys.exit(main())
2021-11-30 15:09:19.844 8 ERROR nova   File "/usr/lib/python3.6/site-packages/nova/cmd/scheduler.py", line 48, in main
2021-11-30 15:09:19.844 8 ERROR nova     binary='nova-scheduler', topic=rpcapi.RPC_TOPIC)
2021-11-30 15:09:19.844 8 ERROR nova   File "/usr/lib/python3.6/site-packages/nova/service.py", line 256, in create
2021-11-30 15:09:19.844 8 ERROR nova     periodic_interval_max=periodic_interval_max)
2021-11-30 15:09:19.844 8 ERROR nova   File "/usr/lib/python3.6/site-packages/nova/service.py", line 116, in _init_
2021-11-30 15:09:19.844 8 ERROR nova     self.manager = manager_class(host=self.host, *args, **kwargs)
2021-11-30 15:09:19.844 8 ERROR nova   File "/usr/lib/python3.6/site-packages/nova/scheduler/manager.py", line 59, in _init_
2021-11-30 15:09:19.844 8 ERROR nova     self.driver = filter_scheduler.FilterScheduler()
2021-11-30 15:09:19.844 8 ERROR nova   File "/usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py", line 44, in _init_
2021-11-30 15:09:19.844 8 ERROR nova     super(FilterScheduler, self)._init_(*args, **kwargs)
2021-11-30 15:09:19.844 8 ERROR nova   File "/usr/lib/python3.6/site-packages/nova/scheduler/driver.py", line 45, in _init_
2021-11-30 15:09:19.844 8 ERROR nova     self.host_manager = host_manager.HostManager()
2021-11-30 15:09:19.844 8 ERROR nova   File "/usr/lib/python3.6/site-packages/nova/scheduler/host_manager.py", line 344, in _init_
2021-11-30 15:09:19.844 8 ERROR nova     self.enabled_filters = self._choose_host_filters(self._load_filters())
2021-11-30 15:09:19.844 8 ERROR nova   File "/usr/lib/python3.6/site-packages/nova/scheduler/host_manager.py", line 484, in _choose_host_filters
2021-11-30 15:09:19.844 8 ERROR nova     raise exception.SchedulerHostFilterNotFound(filter_name=msg)
2021-11-30 15:09:19.844 8 ERROR nova nova.exception.SchedulerHostFilterNotFound: Scheduler Host Filter RetryFilter could not be found.
2021-11-30 15:09:19.844 8 ERROR nova 
2021-11-30 15:10:27.253 7 INFO oslo_service.periodic_task [-] Skipping periodic task _discover_hosts_in_cells because its interval is negative

Version-Release number of selected component (if applicable):
OSP17 
tripleo-ansible-3.3.1-0.20211029010151.85b8610.el8ost.noarch
openstack-tripleo-common-15.4.1-0.20211102001918.c404125.el8ost.noarch
puppet-tripleo-14.2.3-0.20211030221907.4c2c990.el8ost.noarch
ansible-tripleo-ipa-0.2.3-0.20211006201837.63d70bb.el8ost.noarch
ansible-tripleo-ipsec-11.0.1-0.20210910002917.b5559c8.el8ost.noarch
ansible-role-tripleo-modify-image-1.2.3-0.20210908102348.0b9fdcc.el8ost.noarch
openstack-tripleo-heat-templates-14.3.1-0.20211030221907.6b66b70.el8ost.noarch
openstack-tripleo-common-containers-15.4.1-0.20211102001918.c404125.el8ost.noarch
python3-tripleo-common-15.4.1-0.20211102001918.c404125.el8ost.noarch
python3-tripleoclient-16.4.1-0.20211030001911.e2f3acd.el8ost.noarch
openstack-tripleo-validations-14.2.2-0.20211014002028.d91ed58.el8ost.noarch               
(overcloud) [stack@undercloud-0 ~]$ cat core_puddle_version 
RHOS-17.0-RHEL-8-20211105.n.0
How reproducible:
100% 

Steps to Reproduce:
1. run the deployment and provide set-nova-scheduler template files 
2.
3.

Actual results:


Expected results:
remove the RetryFilter from the template

Additional info:

Comment 1 Brendan Shephard 2021-12-01 22:34:21 UTC
I don't see RetryFilter being enabled by default in any templates on the Wallaby branch? Do you see it enabled in your templates by default?
Or, do you mean that you're adding it to NovaSchedulerEnabledFilters: in one of your environment files and then nova-scheduler is failing?

If the latter is the case, I don't believe that would be a bug, we would allow users to specify any filters they like. But we can let dfg:compute weigh in on that.

Comment 2 Eran Kuris 2021-12-02 06:28:37 UTC
(In reply to Brendan Shephard from comment #1)
> I don't see RetryFilter being enabled by default in any templates on the
> Wallaby branch? Do you see it enabled in your templates by default?
> Or, do you mean that you're adding it to NovaSchedulerEnabledFilters: in one
> of your environment files and then nova-scheduler is failing?
> 
> If the latter is the case, I don't believe that would be a bug, we would
> allow users to specify any filters they like. But we can let dfg:compute
> weigh in on that.

Hi Brendan,
so in my job, I am using infrared parameter set-nova-scheduler-filter {https://github.com/redhat-openstack/infrared/blob/master/plugins/tripleo-overcloud/vars/overcloud/templates/set-nova-scheduler-filter.yml}
and as you can see it include the RetryFilter.
You are right that it's a user's wrong input but from my point of view as QE when the user provides incorrect THT/configuration, we have to block/ fail his/her deployment.
In that case I provided the wrong that and the deployment passed.

Comment 3 Brendan Shephard 2021-12-02 07:01:33 UTC
Hey,

Yeah, I see where you're coming from. But I believe it's possible to use custom Nova filters, for example, like we do with TripleO in Queens/Train with Tripleo Capabilities Filter:
https://github.com/openstack/tripleo-common/blob/stable/train/tripleo_common/filters/capabilities_filter.py

If we restrict this to n number of filters, we will be limiting the customers ability to use Nova features. I think we should let dfg:compute rule on this though and see what they say, maybe we don't want to support users with custom scheduler filters.

Comment 4 Eran Kuris 2021-12-02 07:09:58 UTC
(In reply to Brendan Shephard from comment #3)
> Hey,
> 
> Yeah, I see where you're coming from. But I believe it's possible to use
> custom Nova filters, for example, like we do with TripleO in Queens/Train
> with Tripleo Capabilities Filter:
> https://github.com/openstack/tripleo-common/blob/stable/train/tripleo_common/
> filters/capabilities_filter.py
> 
> If we restrict this to n number of filters, we will be limiting the
> customers ability to use Nova features. I think we should let dfg:compute
> rule on this though and see what they say, maybe we don't want to support
> users with custom scheduler filters.

I am adding 2 compute DFG members for more thoughts

Comment 5 smooney 2021-12-02 13:28:50 UTC
The user can create custom filters and we allow them to be used vai a supprot excption
so form a compute dfg point of view i do not think it would be correct to limit this in THT.

a validation pre-check/validation coudl be created for people upgrading but we should not block this via the tht parmater in my view.
it should really just be a warning. while unlikely that someone will create a custom filter called RetryFilter it should still be possibel to
do so and deploy it. we also shoudl not need to update validation in ooo every time we remove filters. the rety filter is not the first filter we have removed
in the past we have alwasy just removed it form the default tepmpeltes after it was deprecated in nova.

Comment 6 Eran Kuris 2021-12-05 07:46:28 UTC
(In reply to smooney from comment #5)
> The user can create custom filters and we allow them to be used vai a
> supprot excption
> so form a compute dfg point of view i do not think it would be correct to
> limit this in THT.
> 
> a validation pre-check/validation coudl be created for people upgrading but
> we should not block this via the tht parmater in my view.
> it should really just be a warning. while unlikely that someone will create
> a custom filter called RetryFilter it should still be possibel to
> do so and deploy it. we also shoudl not need to update validation in ooo
> every time we remove filters. the rety filter is not the first filter we
> have removed
> in the past we have alwasy just removed it form the default tepmpeltes after
> it was deprecated in nova.

smooney the point is that we allow the customers to use templates that will not fail the deployment but will cause problems in the system, 
like I faced when I enabled the RetryFilter the deployment passed but nova-scheduler did not run on the system.

Comment 7 Bogdan Dobrelya 2021-12-07 11:52:19 UTC
Please note that deployment framework provides no control for user inputs but listing allowed values, whenever applicable.
Improvements should be done to the validations framework to cover upgrade cases impacted by the removed RetryFilter.

Comment 8 Jan Buchta 2022-01-19 13:24:38 UTC
Moving to DFG:Upgrades as this needs a validation from Upgrades, not an adjustment of the framework, IMHO.
if I am wrong, give me a shout.
Jan

Comment 9 Sofer Athlan-Guyot 2022-01-24 14:17:51 UTC
Understood that we need to take care somehow of the RetryFilter before upgrading from 16 to 17.

Comment 10 Sofer Athlan-Guyot 2022-02-22 11:55:58 UTC
Hi,

so after another at this, what I understand is that:
 1. RetryFilter is removed is OSP17;
 2. We could implement a validation that output a Warning if the RetryFilter is in the OSP17 configuration:
    - warning not error in the unlikely but possible chance that the user created a custom RetryFilter
 3. When RetryFilter is defined and doesn't exist nova-scheduler crash.

So 2. should be implemented by Compute as they see fit, upgrade cannot really tackle this as:
 - there is not clear way yet to test osp16->17
 - Compute will have far more knowledge about how to do that :)

But, I think that 3. could be solved.  Maybe a this should be warning in the nova-scheduler instead of a "silent" (from the user point of view) error.

Furthermore, it seems to me that having a successful deployment with a failed service is a issue on its own and that should be fixed.

So, problem 2. would be for Compute, and 3. Compute/DF ?

WDYT: @smooney , @bdobreli 

Overall Compute DFG seems the most appropriate place to land this.

Comment 11 Bogdan Dobrelya 2022-03-03 10:17:22 UTC
I think we (like compute with DF/validations teams) should provide a pre-overcloud-prepare validation interface to look for some data patterns in the --templates path used with the deployment command. That interface, once provided, could address the point #2 in above comment

Comment 28 errata-xmlrpc 2022-09-21 12:18:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543


Note You need to log in before you can comment on or make changes to this bug.