Description of problem: It is possible to deploy OSP17 and use the RetryFilter in the nova-scheduler template while they are deprecated. The deployment is passing but when we check the nova services it looks like the nova-scheduler is not running. overcloud) [stack@undercloud-0 ~]$ nova service-list ------------------------------------------------------------------------------- Id Binary Host Zone Status State Updated_at Disabled Reason Forced down -------------------------------------------------------------------------------- 2797fef1-2503-456b-a60d-5f0025d5ba62 nova-conductor controller-1.redhat.local internal enabled up 2021-12-01T09:47:10.000000 False 63a3f84c-c52e-4a7d-9fe5-6fb2081e09b4 nova-conductor controller-2.redhat.local internal enabled up 2021-12-01T09:47:11.000000 False 5d9c46ff-d3d6-4a03-b8ca-8325877f1467 nova-conductor controller-0.redhat.local internal enabled up 2021-12-01T09:47:11.000000 False 7943715f-af52-4dcd-9a18-e2381fbed829 nova-compute compute-1.redhat.local nova enabled up 2021-12-01T09:47:14.000000 False 29002683-7e8c-49e0-9d02-1b3e31555a52 nova-compute compute-0.redhat.local nova enabled up 2021-12-01T09:47:14.000000 False -------------------------------------------------------------------------------- We need to avoid the user using those templates and failing the deployment at the beginning in case the user-provided deprecated templates. in case the THT is provided we can see this Traceback in the logs : 2021-11-30 15:09:19.844 8 CRITICAL nova [req-e4c01711-5492-405b-ada8-74d46d458cdb - - - - -] Unhandled error: nova.exception.SchedulerHostFilterNotFound: Scheduler Host Filter RetryFilter could not be found. 2021-11-30 15:09:19.844 8 ERROR nova Traceback (most recent call last): 2021-11-30 15:09:19.844 8 ERROR nova File "/usr/bin/nova-scheduler", line 10, in <module> 2021-11-30 15:09:19.844 8 ERROR nova sys.exit(main()) 2021-11-30 15:09:19.844 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/cmd/scheduler.py", line 48, in main 2021-11-30 15:09:19.844 8 ERROR nova binary='nova-scheduler', topic=rpcapi.RPC_TOPIC) 2021-11-30 15:09:19.844 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/service.py", line 256, in create 2021-11-30 15:09:19.844 8 ERROR nova periodic_interval_max=periodic_interval_max) 2021-11-30 15:09:19.844 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/service.py", line 116, in _init_ 2021-11-30 15:09:19.844 8 ERROR nova self.manager = manager_class(host=self.host, *args, **kwargs) 2021-11-30 15:09:19.844 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/scheduler/manager.py", line 59, in _init_ 2021-11-30 15:09:19.844 8 ERROR nova self.driver = filter_scheduler.FilterScheduler() 2021-11-30 15:09:19.844 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py", line 44, in _init_ 2021-11-30 15:09:19.844 8 ERROR nova super(FilterScheduler, self)._init_(*args, **kwargs) 2021-11-30 15:09:19.844 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/scheduler/driver.py", line 45, in _init_ 2021-11-30 15:09:19.844 8 ERROR nova self.host_manager = host_manager.HostManager() 2021-11-30 15:09:19.844 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/scheduler/host_manager.py", line 344, in _init_ 2021-11-30 15:09:19.844 8 ERROR nova self.enabled_filters = self._choose_host_filters(self._load_filters()) 2021-11-30 15:09:19.844 8 ERROR nova File "/usr/lib/python3.6/site-packages/nova/scheduler/host_manager.py", line 484, in _choose_host_filters 2021-11-30 15:09:19.844 8 ERROR nova raise exception.SchedulerHostFilterNotFound(filter_name=msg) 2021-11-30 15:09:19.844 8 ERROR nova nova.exception.SchedulerHostFilterNotFound: Scheduler Host Filter RetryFilter could not be found. 2021-11-30 15:09:19.844 8 ERROR nova 2021-11-30 15:10:27.253 7 INFO oslo_service.periodic_task [-] Skipping periodic task _discover_hosts_in_cells because its interval is negative Version-Release number of selected component (if applicable): OSP17 tripleo-ansible-3.3.1-0.20211029010151.85b8610.el8ost.noarch openstack-tripleo-common-15.4.1-0.20211102001918.c404125.el8ost.noarch puppet-tripleo-14.2.3-0.20211030221907.4c2c990.el8ost.noarch ansible-tripleo-ipa-0.2.3-0.20211006201837.63d70bb.el8ost.noarch ansible-tripleo-ipsec-11.0.1-0.20210910002917.b5559c8.el8ost.noarch ansible-role-tripleo-modify-image-1.2.3-0.20210908102348.0b9fdcc.el8ost.noarch openstack-tripleo-heat-templates-14.3.1-0.20211030221907.6b66b70.el8ost.noarch openstack-tripleo-common-containers-15.4.1-0.20211102001918.c404125.el8ost.noarch python3-tripleo-common-15.4.1-0.20211102001918.c404125.el8ost.noarch python3-tripleoclient-16.4.1-0.20211030001911.e2f3acd.el8ost.noarch openstack-tripleo-validations-14.2.2-0.20211014002028.d91ed58.el8ost.noarch (overcloud) [stack@undercloud-0 ~]$ cat core_puddle_version RHOS-17.0-RHEL-8-20211105.n.0 How reproducible: 100% Steps to Reproduce: 1. run the deployment and provide set-nova-scheduler template files 2. 3. Actual results: Expected results: remove the RetryFilter from the template Additional info:
I don't see RetryFilter being enabled by default in any templates on the Wallaby branch? Do you see it enabled in your templates by default? Or, do you mean that you're adding it to NovaSchedulerEnabledFilters: in one of your environment files and then nova-scheduler is failing? If the latter is the case, I don't believe that would be a bug, we would allow users to specify any filters they like. But we can let dfg:compute weigh in on that.
(In reply to Brendan Shephard from comment #1) > I don't see RetryFilter being enabled by default in any templates on the > Wallaby branch? Do you see it enabled in your templates by default? > Or, do you mean that you're adding it to NovaSchedulerEnabledFilters: in one > of your environment files and then nova-scheduler is failing? > > If the latter is the case, I don't believe that would be a bug, we would > allow users to specify any filters they like. But we can let dfg:compute > weigh in on that. Hi Brendan, so in my job, I am using infrared parameter set-nova-scheduler-filter {https://github.com/redhat-openstack/infrared/blob/master/plugins/tripleo-overcloud/vars/overcloud/templates/set-nova-scheduler-filter.yml} and as you can see it include the RetryFilter. You are right that it's a user's wrong input but from my point of view as QE when the user provides incorrect THT/configuration, we have to block/ fail his/her deployment. In that case I provided the wrong that and the deployment passed.
Hey, Yeah, I see where you're coming from. But I believe it's possible to use custom Nova filters, for example, like we do with TripleO in Queens/Train with Tripleo Capabilities Filter: https://github.com/openstack/tripleo-common/blob/stable/train/tripleo_common/filters/capabilities_filter.py If we restrict this to n number of filters, we will be limiting the customers ability to use Nova features. I think we should let dfg:compute rule on this though and see what they say, maybe we don't want to support users with custom scheduler filters.
(In reply to Brendan Shephard from comment #3) > Hey, > > Yeah, I see where you're coming from. But I believe it's possible to use > custom Nova filters, for example, like we do with TripleO in Queens/Train > with Tripleo Capabilities Filter: > https://github.com/openstack/tripleo-common/blob/stable/train/tripleo_common/ > filters/capabilities_filter.py > > If we restrict this to n number of filters, we will be limiting the > customers ability to use Nova features. I think we should let dfg:compute > rule on this though and see what they say, maybe we don't want to support > users with custom scheduler filters. I am adding 2 compute DFG members for more thoughts
The user can create custom filters and we allow them to be used vai a supprot excption so form a compute dfg point of view i do not think it would be correct to limit this in THT. a validation pre-check/validation coudl be created for people upgrading but we should not block this via the tht parmater in my view. it should really just be a warning. while unlikely that someone will create a custom filter called RetryFilter it should still be possibel to do so and deploy it. we also shoudl not need to update validation in ooo every time we remove filters. the rety filter is not the first filter we have removed in the past we have alwasy just removed it form the default tepmpeltes after it was deprecated in nova.
(In reply to smooney from comment #5) > The user can create custom filters and we allow them to be used vai a > supprot excption > so form a compute dfg point of view i do not think it would be correct to > limit this in THT. > > a validation pre-check/validation coudl be created for people upgrading but > we should not block this via the tht parmater in my view. > it should really just be a warning. while unlikely that someone will create > a custom filter called RetryFilter it should still be possibel to > do so and deploy it. we also shoudl not need to update validation in ooo > every time we remove filters. the rety filter is not the first filter we > have removed > in the past we have alwasy just removed it form the default tepmpeltes after > it was deprecated in nova. smooney the point is that we allow the customers to use templates that will not fail the deployment but will cause problems in the system, like I faced when I enabled the RetryFilter the deployment passed but nova-scheduler did not run on the system.
Please note that deployment framework provides no control for user inputs but listing allowed values, whenever applicable. Improvements should be done to the validations framework to cover upgrade cases impacted by the removed RetryFilter.
Moving to DFG:Upgrades as this needs a validation from Upgrades, not an adjustment of the framework, IMHO. if I am wrong, give me a shout. Jan
Understood that we need to take care somehow of the RetryFilter before upgrading from 16 to 17.
Hi, so after another at this, what I understand is that: 1. RetryFilter is removed is OSP17; 2. We could implement a validation that output a Warning if the RetryFilter is in the OSP17 configuration: - warning not error in the unlikely but possible chance that the user created a custom RetryFilter 3. When RetryFilter is defined and doesn't exist nova-scheduler crash. So 2. should be implemented by Compute as they see fit, upgrade cannot really tackle this as: - there is not clear way yet to test osp16->17 - Compute will have far more knowledge about how to do that :) But, I think that 3. could be solved. Maybe a this should be warning in the nova-scheduler instead of a "silent" (from the user point of view) error. Furthermore, it seems to me that having a successful deployment with a failed service is a issue on its own and that should be fixed. So, problem 2. would be for Compute, and 3. Compute/DF ? WDYT: @smooney , @bdobreli Overall Compute DFG seems the most appropriate place to land this.
I think we (like compute with DF/validations teams) should provide a pre-overcloud-prepare validation interface to look for some data patterns in the --templates path used with the deployment command. That interface, once provided, could address the point #2 in above comment
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543