Bug 1441809 - Nova scheduler_driver regression
Summary: Nova scheduler_driver regression
Keywords:
Status: CLOSED DUPLICATE of bug 1566148
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: async
: 10.0 (Newton)
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks: 1335596 1356451
TreeView+ depends on / blocked
 
Reported: 2017-04-12 18:41 UTC by David Paterson
Modified: 2023-09-18 00:12 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-06 22:29:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description David Paterson 2017-04-12 18:41:02 UTC
Description of problem:
I am seeing some new test failures in OSP 10
tempest.api.compute.admin.test_servers_on_multinodes.ServersOnMultiNodesTest.test_create_servers_on_different_hosts[id-cc7ca884-6e3e-42a3-a92f-c522fcf25e8e]
tempest.api.compute.admin.test_servers_on_multinodes.ServersOnMultiNodesTest.test_create_servers_on_same_host[id-26a9d5df-6890-45f2-abc4-a659290cb130]

I did some digging and those test are based on the FilterScheduler being configured in nova.conf.

In OSP9 Nova.conf#scheduler_driver correctly points to a specific FilterScheduler class:
scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler

Yet in OSP 10 it looks to be incorrect by default and has a bogus value.
scheduler_driver=filter_scheduler


Version-Release number of selected component (if applicable):
OSP 10

How reproducible:
Every time

Steps to Reproduce:
1. Run tempest against default install of OSP 10
2.
3.

Actual results:
Tests fail

Expected results:
Test pass and the scheduler_hints passed to the filter should effect what host a server is spawned on.

Additional info:

Comment 1 Vladik Romanovsky 2017-04-21 14:23:46 UTC
Hello,

The scheduler_driver option has been changed in this release and it is being set to an entry point of the driver, instead of the full path. Seeing "scheduler_driver=filter_scheduler" setting in the config file is expected.
Here is an output of the release notes for this release[1]:

The option "scheduler_driver" is now changed to use entrypoint
  instead of full class path. Set one of the entrypoints under the
  namespace 'nova.scheduler.driver' in 'setup.cfg'. Its default value
  is 'host_manager'. The full class path style is still supported in
  current release. But it is not recommended because class path can be
  changed and this support will be dropped in the next major release.

Here is a link to the patch that makes this change. [2]

Regarding the test failures that you see, could you please provide the errors/traces that you see?

Thanks,
Vladik   


[1] http://lists.openstack.org/pipermail/openstack-announce/2016-April/001071.html
[2] https://review.openstack.org/gitweb?p=openstack/nova.git;a=commitdiff;h=bbfa3b4f7251b1ae9a90c84692304ecb34b4b376

Comment 2 David Paterson 2017-04-21 14:32:19 UTC
Here are the two tempest test failures,  basically the scheduler hints are not taking any effect:

tempest.api.compute.admin.test_servers_on_multinodes.ServersOnMultiNodesTest.test_create_servers_on_different_hosts[id-cc7ca884-6e3e-42a3-a92f-c522fcf25e8e

failure	"Traceback (most recent call last):
  File ""/home/osp_admin/tempest/tempest/api/compute/admin/test_servers_on_multinodes.py"", line 64, in test_create_servers_on_different_hosts
    self.assertNotEqual(host01, host02)
  File ""/usr/lib/python2.7/site-packages/unittest2/case.py"", line 842, in assertNotEqual
    raise self.failureException(msg)
AssertionError: u'r7-13g-compute-1.oss.labs' == u'r7-13g-compute-1.oss.labs'"

==================================================================================================================================================================

tempest.api.compute.admin.test_servers_on_multinodes.ServersOnMultiNodesTest.test_create_servers_on_same_host[id-26a9d5df-6890-45f2-abc4-a659290cb130]

failure	"Traceback (most recent call last):
  File ""/home/osp_admin/tempest/tempest/api/compute/admin/test_servers_on_multinodes.py"", line 50, in test_create_servers_on_same_host
    self.assertEqual(host01, host02)
  File ""/usr/lib/python2.7/site-packages/testtools/testcase.py"", line 350, in assertEqual
    self.assertThat(observed, matcher, message)
  File ""/usr/lib/python2.7/site-packages/testtools/testcase.py"", line 435, in assertThat
    raise mismatch_error
testtools.matchers._impl.MismatchError: u'r7-13g-compute-3.oss.labs' != u'r7-13g-compute-1.oss.labs'"

Comment 3 Sean Merrow 2017-05-04 15:40:47 UTC
Hopefully the traces/errors provided by David are helpful. Let us know if any other data is needed.

Thanks,
Sean

Comment 4 Sean Merrow 2017-05-30 17:51:14 UTC
Hi Vladik, Any thoughts on the traces/errors provided by David? Are they helpful?

Thanks,
Sean

Comment 5 Dan Smith 2017-07-06 18:48:38 UTC
Ollie, is this some deployment config thing that needs changing? I'm not sure how this could be wrong without entirely breaking everything, unless it's only breaking certain types of environments.

Comment 6 Stephen Gordon 2017-07-06 18:58:17 UTC
(In reply to Dan Smith from comment #5)
> Ollie, is this some deployment config thing that needs changing? I'm not
> sure how this could be wrong without entirely breaking everything, unless
> it's only breaking certain types of environments.

Which is actually the correct/expected value here? I get the impression from the description that a specific test or set of tests is failing, but maybe this is a case where it's actually working and we need to take a closer look at the test itself rather than the deployment?

Comment 7 Ollie Walsh 2017-07-06 22:26:17 UTC

(In reply to Stephen Gordon from comment #6)
> (In reply to Dan Smith from comment #5)
> > Ollie, is this some deployment config thing that needs changing? I'm not
> > sure how this could be wrong without entirely breaking everything, unless
> > it's only breaking certain types of environments.
> 
> Which is actually the correct/expected value here? I get the impression from
> the description that a specific test or set of tests is failing, but maybe
> this is a case where it's actually working and we need to take a closer look
> at the test itself rather than the deployment?

Either format should work for OSP9/OSP10: 
https://github.com/openstack/nova/commit/33d906a2e060447778e95449a78e6583f18afcfd
From OSP11 on the entrypoint approach must be used:
https://github.com/openstack/nova/commit/fe3d6dba3d1db8f97dab4baffcc11dda56368096

AFAICT those tests should have been skipped:
https://github.com/openstack/tempest/commit/10d5af250b60c2e161851d75d0055c0d7368ddfc

because we don't enable any filters OOTB: https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/puppet/services/nova-scheduler.yaml#L25

Comment 8 Ollie Walsh 2017-07-06 22:46:41 UTC
Should say for the last item, we use the default filters when NovaSchedulerDefaultFilters is empty, which do not include the filters that these test require:
https://github.com/openstack/nova/blob/stable/newton/nova/conf/scheduler.py#L52

Comment 9 Ollie Walsh 2017-07-07 14:45:16 UTC
Hi David,

If possible, could you please provide the following info:
 * the tempest rpm version
 * the tempest config
 * the nova config

Thanks,
Ollie

Comment 10 David Paterson 2017-07-14 19:03:25 UTC
==========================================================
tempest packages on director node
==========================================================

openstack-tempest.noarch      1:13.0.0-12.bafe630git.el7ost
python-tempest.noarch         1:13.0.0-12.bafe630git.el7ost
python-tempest-tests.noarch   1:13.0.0-12.bafe630git.el7ost
=========================================================
nova.conf from controller
=========================================================
[neutron]
project_name=service
service_metadata_proxy=True
extension_sync_interval=600
password=78pKhanbpVWYJZ6rFZmz2jCTZ
auth_url=http://192.168.120.250:35357/v3
user_domain_name=Default
timeout=30
project_domain_name=Default
url=http://192.168.140.250:9696
ovs_bridge=br-int
region_name=RegionOne
auth_type=v3password
metadata_proxy_shared_secret=6VgPttc67mJtzJabefbDkFnFE
username=neutron
[DEFAULT]
use_neutron=True
log_dir=/var/log/nova
notify_api_faults=False
osapi_compute_listen=192.168.140.21
instance_name_template=instance-%08x
state_path=/var/lib/nova
report_interval=10
enabled_apis=osapi_compute,metadata
osapi_compute_listen_port=8774
image_service=nova.image.glance.GlanceImageService
notify_on_state_change=vm_and_task_state
firewall_driver=nova.virt.firewall.NoopFirewallDriver
ram_allocation_ratio=1.0
default_floating_pool=public
use_ipv6=False
vif_plugging_is_fatal=True
metadata_listen_port=8775
service_down_time=60
host=rlpple-controller-0.ol.rllple.edu
use_forwarded_for=False
osapi_volume_listen=192.168.140.21
metadata_listen=192.168.140.21
auth_strategy=keystone
osapi_compute_workers=0
rootwrap_config=/etc/nova/rootwrap.conf
rpc_backend=rabbit
vif_plugging_timeout=300
metadata_workers=0
dhcp_domain=novalocal
allow_resize_to_same_host=False
fping_path=/usr/sbin/fping
scheduler_use_baremetal_filters=False
max_io_ops_per_host=8
scheduler_weight_classes=nova.scheduler.weights.all_weighers
scheduler_host_subset_size=1
scheduler_driver=filter_scheduler
scheduler_max_attempts=3
max_instances_per_host=50
scheduler_host_manager=host_manager
[keystone_authtoken]
username=nova
project_name=service
memcached_servers=192.168.140.21:11211,192.168.140.22:11211,192.168.140.23:11211
auth_type=password
auth_url=http://192.168.120.250:35357
password=76pTn89tvymhQZ4gVw6FcVnP9
auth_uri=http://192.168.140.250:5000/v2.0
[api_database]
connection=mysql+pymysql://nova_api:76pTn89tvymhQZ4gVw6FcVnP9.140.250/nova_api?bind_address=192.168.140.21
[oslo_messaging_rabbit]
rabbit_userid=guest
rabbit_password=F9Tfq8BPD84Ec92GQJkz8TQjg
rabbit_ha_queues=True
heartbeat_timeout_threshold=60
rabbit_use_ssl=False
rabbit_hosts=192.168.140.21:5672,192.168.140.22:5672,192.168.140.23:5672
[oslo_messaging_notifications]
driver=messagingv2
[database]
max_retries=-1
connection=mysql+pymysql://nova:76pTn89tvymhQZ4gVw6FcVnP9.140.250/nova?bind_address=192.168.140.21
db_max_retries=-1
[glance]
api_servers=http://192.168.170.250:9292
[cinder]
catalog_info=volumev2:cinderv2:internalURL
[oslo_policy]
policy_file=/etc/nova/policy.json
[cache]
backend=oslo_cache.memcache_pool
enabled=True
memcache_servers=192.168.140.21:11211,192.168.140.22:11211,192.168.140.23:11211
[wsgi]
api_paste_config=api-paste.ini
[oslo_middleware]
enable_proxy_headers_parsing=True
[oslo_concurrency]
lock_path=/var/lib/nova/tmp
[vnc]
novncproxy_port=6080
novncproxy_host=192.168.140.21
novncproxy_base_url=http://192.168.196.250:6080/vnc_auto.html
[conductor]
workers=0

========================================================
tempest.conf
========================================================
[DEFAULT]
debug = true
use_stderr = false
log_file = tempest.log

[auth]
tempest_roles = _member_

[compute]
image_ssh_user = cirros
flavor_ref = cb3a365c-a49e-430b-9e6c-938af7923b53
flavor_ref_alt = 4c894164-9586-4da5-883b-b5c8c4baf522
image_ref = 8320bfdf-8af1-4f4c-82a2-aae33e7f364c
image_ref_alt = cd915a04-5881-41fe-84e2-84bfb46dc931

[identity]
username = demo
tenant_name = demo
password = secrete
alt_username = alt_demo
alt_tenant_name = alt_demo
alt_password = secrete
admin_username = admin
admin_tenant_name = admin
admin_domain_name = Default
disable_ssl_certificate_validation = true
uri = http://192.168.196.250:5000/v2.0
admin_password = Kg84P6qpJPMhBPkjVqeyXnE7f
uri_v3 = http://192.168.196.250:5000/v3
admin_tenant_id = 0a1abb3045ee4744b9e8c15cfb886dd3

[object-storage]
operator_role = SwiftOperator

[data_processing]
catalog_type = data-processing

[orchestration]

[scenario]
img_dir = etc

[oslo_concurrency]
lock_path = /tmp

[volume-feature-enabled]
bootable = true
volume_services = true
api_v1 = True
api_v2 = True
api_extensions = OS-SCH-HNT,os-hosts,os-vol-tenant-attr,os-quota-sets,os-availability-zone,os-volume-encryption-metadata,backups,os-snapshot-actions,cgsnapshots,os-snapshot-manage,os-volume-unmanage,consistencygroups,os-vol-host-attr,encryption,os-vol-image-meta,os-types-manage,capabilities,os-volume-actions,os-types-extra-specs,os-used-limits,os-vol-mig-status-attr,os-volume-type-access,os-image-create,os-extended-services,os-extended-snapshot-attributes,os-snapshot-unmanage,qos-specs,os-quota-class-sets,os-volume-transfer,os-volume-manage,os-admin-actions,os-services,scheduler-stats

[compute-feature-enabled]
live_migration = false
live_migrate_paused_instances = true
preserve_ports = true
api_extensions = NMN,OS-DCF,OS-EXT-AZ,OS-EXT-IMG-SIZE,OS-EXT-IPS,OS-EXT-IPS-MAC,OS-EXT-SRV-ATTR,OS-EXT-STS,OS-FLV-DISABLED,OS-FLV-EXT-DATA,OS-SCH-HNT,OS-SRV-USG,os-access-ips,os-admin-actions,os-admin-password,os-agents,os-aggregates,os-assisted-volume-snapshots,os-attach-interfaces,os-availability-zone,os-baremetal-ext-status,os-baremetal-nodes,os-block-device-mapping,os-block-device-mapping-v2-boot,os-cell-capacities,os-cells,os-certificates,os-cloudpipe,os-cloudpipe-update,os-config-drive,os-console-auth-tokens,os-console-output,os-consoles,os-create-backup,os-create-server-ext,os-deferred-delete,os-evacuate,os-extended-evacuate-find-host,os-extended-floating-ips,os-extended-hypervisors,os-extended-networks,os-extended-quotas,os-extended-rescue-with-image,os-extended-services,os-extended-services-delete,os-extended-status,os-extended-volumes,os-fixed-ips,os-flavor-access,os-flavor-extra-specs,os-flavor-manage,os-flavor-rxtx,os-flavor-swap,os-floating-ip-dns,os-floating-ip-pools,os-floating-ips,os-floating-ips-bulk,os-fping,os-hide-server-addresses,os-hosts,os-hypervisor-status,os-hypervisors,os-instance-actions,os-instance_usage_audit_log,os-keypairs,os-lock-server,os-migrate-server,os-migrations,os-multiple-create,os-networks,os-networks-associate,os-pause-server,os-personality,os-preserve-ephemeral-rebuild,os-quota-class-sets,os-quota-sets,os-rescue,os-security-group-default-rules,os-security-groups,os-server-diagnostics,os-server-external-events,os-server-group-quotas,os-server-groups,os-server-list-multi-status,os-server-password,os-server-sort-keys,os-server-start-stop,os-services,os-shelve,os-simple-tenant-usage,os-suspend-server,os-tenant-networks,os-used-limits,os-used-limits-for-admin,os-user-data,os-user-quotas,os-virtual-interfaces,os-volume-attachment-update,os-volumes

[network-feature-enabled]
ipv6_subnet_attributes = False
api_extensions = default-subnetpools,qos,network-ip-availability,network_availability_zone,auto-allocated-topology,ext-gw-mode,binding,trunk-details,agent,subnet_allocation,l3_agent_scheduler,tag,external-net,flavors,net-mtu,availability_zone,quotas,l3-ha,provider,multi-provider,address-scope,trunk,extraroute,subnet-service-types,standard-attr-timestamp,service-type,l3-flavors,port-security,extra_dhcp_opt,standard-attr-revisions,pagination,sorting,security-group,dhcp_agent_scheduler,router_availability_zone,rbac-policies,standard-attr-description,router,allowed-address-pairs,project-id,dvr

[service_available]
swift = False
sahara = False
aodh = True
glance = True
manila = False
cinder = True
nova = True
neutron = True
trove = False
ceilometer = True
ironic = False
heat = True
zaqar = False
horizon = True

[object-storage-feature-enabled]
discoverability = False
discoverable_apis =

[network]
public_network_id = 79eeecb5-da2d-49d9-96c6-d63ccb9a86ce

[image-feature-enabled]
api_v1 = True
api_v2 = True

[identity-feature-enabled]
api_v2 = True
api_v3 = True
api_extensions = s3tokens,OS-EP-FILTER,OS-REVOKE,OS-FEDERATION,OS-INHERIT,OS-KSCRUD,OS-SIMPLE-CERT,OS-TRUST,OS-PKI,OS-ENDPOINT-POLICY,OS-OAUTH1,OS-EC2

[dashboard]
dashboard_url = http://192.168.196.250/dashboard/
login_url = http://192.168.196.250/dashboard/auth/login/

Comment 11 Matthew Booth 2017-07-21 12:12:50 UTC
The immediate cause is that scheduler_available_filters isn't specified in the tempest config, so we're using the default value, which is 'all'. Tempest assumes that the filter is configured, but by default it isn't.

You probably just need to set this appropriately in tempest.conf, but as our defaults don't seem to line up here we'll discuss it in the weekly bug meeting. Perhaps we should tweak something.

Comment 12 Matthew Booth 2017-07-21 14:22:16 UTC
Lets discuss upstream. I think the resolution is to change the defaults in tempest.conf to match Nova.

Comment 13 Sean Merrow 2017-10-12 16:32:07 UTC
Has the discussion upstream happened, and if so, what was the outcome? Is there an upstream link you can add?

Comment 14 Mike Orazi 2018-11-06 21:49:05 UTC
Can we get a status update on this bz?

Comment 15 Ollie Walsh 2018-11-06 22:29:13 UTC
See https://bugzilla.redhat.com/show_bug.cgi?id=1566148#c42

*** This bug has been marked as a duplicate of bug 1566148 ***

Comment 16 Red Hat Bugzilla 2023-09-18 00:12:23 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.