Bug 1803150

Summary: Hint for nova-scheduler seems to be ignored
Product: Red Hat OpenStack Reporter: Filip Hubík <fhubik>
Component: openstack-tempestAssignee: Chandan Kumar <chkumar>
Status: CLOSED ERRATA QA Contact: Martin Kopec <mkopec>
Severity: low Docs Contact:
Priority: low    
Version: 13.0 (Queens)CC: apevec, dasmith, eglynn, jhakimra, kchamart, lhh, lyarwood, mkopec, sbauza, sgordon, slinaber, udesale, vromanso, wznoinsk
Target Milestone: ---Keywords: Reopened, Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tempest-18.0.0-14.el7ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-24 11:41:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
nova_logs_all
none
tempest.log
none
tempest.conf
none
verification output none

Description Filip Hubík 2020-02-14 15:05:26 UTC
Created attachment 1663137 [details]
nova_logs_all

Description of problem:

Deployment uses 2 compute nodes, Tempest tries to spawn 2 VMs on the same node. Second VM is always scheduled to the another node though - even when we explicitely ask scheduler using hints.

Running Tempest test, OSP13 OC environment:

    def test_create_servers_on_same_host(self):
        hints = {'same_host': self.server01}
        server02 = self.create_test_server(scheduler_hints=hints,
                                           wait_until='ACTIVE')['id']
        host02 = self._get_host(server02)
        self.assertEqual(self.host01, host02)

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tempest/api/compute/admin/test_servers_on_multinodes.py", line 80, in test_create_servers_on_same_host
    self.assertEqual(self.host01, host02)
  File "/usr/lib/python2.7/site-packages/testtools/testcase.py", line 350, in assertEqual
    self.assertThat(observed, matcher, message)
  File "/usr/lib/python2.7/site-packages/testtools/testcase.py", line 435, in assertThat
    raise mismatch_error
testtools.matchers._impl.MismatchError: u'compute-1.redhat.local' != u'compute-0.redhat.local'

Ran 1 test in 44.117s
FAILED (failures=1)
---

Scheduler seems to be ignoring these hints, provided by tempest test(s), I tried both "same_host" and "different_host" (see attached logs).

enabled_filters in nova.conf is kept default:
...
# Deprecated group;name - DEFAULT;scheduler_default_filters
#enabled_filters=RetryFilter,AvailabilityZoneFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter

python2-tempestconf-2.4.0-1.el7ost.noarch
python2-tempest-18.0.0-12.el7ost.noarch
openstack-tempest-18.0.0-12.el7ost.noarch
openstack-nova-scheduler-17.0.12-1.el7ost.noarch (in nova_scheduler container)

Puddle: 2020-02-06.2

Attached:
nova logs from 3 controllers and 2 compute nodes isolated just during this testcase
tempest.log

Comment 1 Filip Hubík 2020-02-14 15:06:01 UTC
Created attachment 1663138 [details]
tempest.log

Comment 3 Filip Hubík 2020-02-17 16:37:53 UTC
Created attachment 1663570 [details]
tempest.conf

Comment 5 Filip Hubík 2020-02-18 12:45:31 UTC
Yes, this smells like Tempest issue to me too, just few notes:
Afaik we were not adding custom scheduler filters to Tempest (see tempest.conf attached) in CI in past. From Tempest code tempest/common/compute.py:

def is_scheduler_filter_enabled(filter_name):
    """Check the list of enabled compute scheduler filters from config.

    This function checks whether the given compute scheduler filter is
    available and configured in the config file. If the
    scheduler_available_filters option is set to 'all' (Default value. which
    means default filters are configured in nova) in tempest.conf then, this
    function returns True with assumption that requested filter 'filter_name'
    is one of available filter in nova ("nova.scheduler.filters.all_filters").
    """

and also tempest/api/compute/admin/test_servers_on_multinodes.py:

    @decorators.idempotent_id('26a9d5df-6890-45f2-abc4-a659290cb130')
    @testtools.skipUnless(
        compute.is_scheduler_filter_enabled("SameHostFilter"),
        'SameHostFilter is not available.')
    def test_create_servers_on_same_host(self):
        hints = {'same_host': self.server01}


From https://docs.openstack.org/tempest/latest/sampleconf.html I read that nova configuration is taken by default - I assume that means all available filters in this case. Should we really change Tempest config explicitely if it tries to gather this information from nova?

Also from nova.conf on controllers, we have default config, these options commented out:
#available_filters=nova.scheduler.filters.all_filters
#enabled_filters=RetryFilter,AvailabilityZoneFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter

but SameHostFilter doesn't seem to be part of "default" "enabled_filters" and must be enabled explicitely. I tried to add it at the end of "enabled_filters" (/var/lib/config-data/puppet-generated/nova/etc/nova/nova.conf, docker restart nova_scheduler on controllers), and mentioned test passed.

This leads me to 2 conclusions:
1) Tempest shoudln't detect that "SameHostFilter" is enabled since it is not part of "filters enabled by default" - making this Tempest bug
2) Nova should include it into "filters enabled by default" since maybe it is ommited from default set unintentionally? - nova bug/or I misunderstood Tempest/nova documentation?

Comment 6 Artom Lifshitz 2020-02-18 18:16:34 UTC
(In reply to Filip Hubík from comment #5)
> Yes, this smells like Tempest issue to me too, just few notes:
> Afaik we were not adding custom scheduler filters to Tempest (see
> tempest.conf attached) in CI in past. From Tempest code
> tempest/common/compute.py:
> 
> def is_scheduler_filter_enabled(filter_name):
>     """Check the list of enabled compute scheduler filters from config.
> 
>     This function checks whether the given compute scheduler filter is
>     available and configured in the config file. If the
>     scheduler_available_filters option is set to 'all' (Default value. which
>     means default filters are configured in nova) in tempest.conf then, this
>     function returns True with assumption that requested filter 'filter_name'
>     is one of available filter in nova
> ("nova.scheduler.filters.all_filters").
>     """
> 
> and also tempest/api/compute/admin/test_servers_on_multinodes.py:
> 
>     @decorators.idempotent_id('26a9d5df-6890-45f2-abc4-a659290cb130')
>     @testtools.skipUnless(
>         compute.is_scheduler_filter_enabled("SameHostFilter"),
>         'SameHostFilter is not available.')
>     def test_create_servers_on_same_host(self):
>         hints = {'same_host': self.server01}
> 
> 
> From https://docs.openstack.org/tempest/latest/sampleconf.html I read that
> nova configuration is taken by default - I assume that means all available
> filters in this case. Should we really change Tempest config explicitely if
> it tries to gather this information from nova?
> 
> Also from nova.conf on controllers, we have default config, these options
> commented out:
> #available_filters=nova.scheduler.filters.all_filters
> #enabled_filters=RetryFilter,AvailabilityZoneFilter,ComputeFilter,
> ComputeCapabilitiesFilter,ImagePropertiesFilter,
> ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter
> 
> but SameHostFilter doesn't seem to be part of "default" "enabled_filters"
> and must be enabled explicitely. I tried to add it at the end of
> "enabled_filters"
> (/var/lib/config-data/puppet-generated/nova/etc/nova/nova.conf, docker
> restart nova_scheduler on controllers), and mentioned test passed.
> 
> This leads me to 2 conclusions:
> 1) Tempest shoudln't detect that "SameHostFilter" is enabled since it is not
> part of "filters enabled by default" - making this Tempest bug
> 2) Nova should include it into "filters enabled by default" since maybe it
> is ommited from default set unintentionally? - nova bug/or I misunderstood
> Tempest/nova documentation?

We've been through this before - although I can't find anything besides the patch I did upstream [1]. As you can read in that commit message, the problem was that the Nova and Tempest defaults for "what scheduler filters are enabled in this deployment" didn't match (and the confusingly named tempest.conf option `scheduler_available_filters` - `available` doesn't mean anything, a filter is either enabled or it isn't).

So I did [1] upstream to fix this, and IIRC there was something done in InfraRed or THT or some other non-Tempest and non-Nova code to make Nova and Tempest agree on what filters are enabled in the deployment - but as I said, I can't find any written trace of that.

[1] could be backported to 13, or the deployment tooling could make sure to match Nova's and Tempest's configuration of enabled filters.

[1] https://review.opendev.org/#/c/570207/

Comment 7 Filip Hubík 2020-02-19 15:38:56 UTC
Ok, after discussion with Tempest maintainer (mkopec) it seems like best possible solution is ask for backport, since in current state it means Tempest is doing wrong assumption, assuming that with default config all filters are enabled to use, but in fact only these 6 are https://github.com/openstack/nova/blob/master/nova/conf/scheduler.py#L320 by nova.

Comment 8 Filip Hubík 2020-02-19 15:42:59 UTC
Reopened, re-targeted against Tempest now.

Comment 9 Lee Yarwood 2020-02-20 12:58:41 UTC
(In reply to Filip Hubík from comment #7)
> Ok, after discussion with Tempest maintainer (mkopec) it seems
> like best possible solution is ask for backport, since in current state it
> means Tempest is doing wrong assumption, assuming that with default config
> all filters are enabled to use, but in fact only these 6 are
> https://github.com/openstack/nova/blob/master/nova/conf/scheduler.py#L320 by
> nova.

Thanks Filip, apologies for missing your previous Trello ping about this.

ACK to backporting the change but remember this is for OSP 13 so use the following list:

https://github.com/openstack/nova/blob/stable/queens/nova/conf/scheduler.py#L268-L277

This can also change depending on the value of the NovaSchedulerDefaultFilters parameter in TripleO envs:

https://github.com/openstack/tripleo-heat-templates/blob/24fa8936738c9b45eb7dd7e96506c27a8abe5cd5/puppet/services/nova-scheduler.yaml#L37-L43

For example within the undercloud we set the following:

https://github.com/openstack/tripleo-heat-templates/blob/24fa8936738c9b45eb7dd7e96506c27a8abe5cd5/environments/undercloud.yaml#L24

Comment 18 Martin Kopec 2020-06-01 06:15:56 UTC
Created attachment 1694018 [details]
verification output

The fixed in version package contains the fix, scheduler_enabled_filters is not set by default to 'all' since the openstack-tempest-18.0.0-14 - thanks to that the failing test is skipped by default. When enabled_filters option in nova.conf on all nodes is set so that it contains SameHostFilter filter and compute_feature_enabled].scheduler_available_filters in tempest.conf contains SameHostFilter as well, the test is passing. 
The output from testing is attached.
The BZ is VERIFIED.

Comment 20 errata-xmlrpc 2020-06-24 11:41:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2719