Description of problem: OpenStack Nova has a default max_limit of 1000 results returned from the api service. The OpenShift on OpenStack implementation uses a dynamic inventory openshift-ansible/playbooks/openstack/inventory.py which uses Python to generate a list of all the nodes in the OpenShift cluster. When attempting scale past 1000 nodes we found that inventory.py was only returning 1000 hosts and this was the latest nodes created, so it did not include the master, infra, or cns nodes in the output. Only returning 1000 records could cause a problem with a large cluster on scaleup or other Ansible operations that use the dynamic inventory file. There are two workarounds for this issue. 1) From the command line you can use `openstack server list --limit -1` this will remove the limit and you are able to see all the nodes in the OpenStack cluster. However the inventory.py does not use the command line and will only return 1000 records. 2) On all controller systems you can edit /etc/nova/nova.conf and uncomment "max_limit" with a value greater than 1000. Then restart the nova services on all controllers (we have 3 controllers). This also works around the 1000 limit, but could be an issue for customers to edit a production OpenStack cluster and restart services could cause downtime. Version-Release number of selected component (if applicable): We noticed this in 3.9 while using the OpenShift on OpenStack ansible installer. But this is an OpenStack limitation. How reproducible: 100% of the time we have been over 1000 nodes in our OpenShift on OpenStack cluster. Steps to Reproduce: 1. Deploy OpenShift on OpenStack following the method outlined here: https://github.com/openshift/openshift-ansible/tree/master/playbooks/openstack 2. Scale up the cluster to past 1000 nodes. 3. Notice that inventory.py only returns 1000 nodes. Actual results: The inventory was only returning 1000 hosts at a time. Expected results: The inventory should return all hosts in the cluster. If there is an api limit the inventory should keep calling the API until all nodes are returned and then return the cluster listing. Additional info: The nova limit is described here: https://docs.openstack.org/ocata/config-reference/compute/api.html This appears to be a limit for all OpenStack versions not just ocata. Let me know what other information you may need.
We'll have to fix the inventory script to handle pagination. We could document & advise the nova change, but I'd prefer to only do that as the last resort. Looks like something we should be able to fix on our side.
This is addressed by the following patch in python-shade: https://review.openstack.org/#/c/555876/ It should be fixed in 1.28.0 version.
Based on comment 4, it is kind of openstack bug fix, moving this bug to OpenStack Component.
Verified on python2-shade-1.27.1-2.el7ost Set the mac_limit to 2 on nova.conf (shiftstack) [cloud-user@ansible-host-0 ~]$ openstack server list +--------------------------------------+----------------------------------+--------+----------------------------------------------------------------------+----------+---------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+----------------------------------+--------+----------------------------------------------------------------------+----------+---------+ | 68345b69-09e6-4839-aaf6-1d096deafe84 | master-0.openshift.example.com | ACTIVE | openshift-ansible-openshift.example.com-net=192.168.99.6, 10.0.0.223 | rhel-7.6 | | | 735bad92-7f66-4791-9a2c-7bf42433b289 | app-node-0.openshift.example.com | ACTIVE | openshift-ansible-openshift.example.com-net=192.168.99.9, 10.0.0.219 | rhel-7.6 | m1.node | +--------------------------------------+----------------------------------+--------+----------------------------------------------------------------------+----------+---------+ With shade it is more then max limit (2) (shiftstack) [cloud-user@ansible-host-0 ~]$ python -c 'import shade; cloud = shade.openstack_cloud(); print [server.name for server in cloud.list_servers()]' [u'master-0.openshift.example.com', u'app-node-0.openshift.example.com', u'infra-node-0.openshift.example.com', u'app-node-1.openshift.example.com', u'ansible_host-0', u'openshift_dns-0'] Same for inventory.py: (shiftstack) [cloud-user@ansible-host-0 ~]$ /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py --list { "OSEv3": { "hosts": [ "app-node-0.openshift.example.com", "app-node-1.openshift.example.com", "master-0.openshift.example.com", "infra-node-0.openshift.example.com" ], "vars": {} }, "_meta": { "hostvars": { "app-node-0.openshift.example.com": { "ansible_host": "10.0.0.219", "openshift_ip": "192.168.99.9", "openshift_node_group_name": "node-config-compute", "openshift_public_hostname": "app-node-0.openshift.example.com", "openshift_public_ip": "10.0.0.219", "private_v4": "192.168.99.9", "public_v4": "10.0.0.219" }, "app-node-1.openshift.example.com": { "ansible_host": "10.0.0.210", "openshift_ip": "192.168.99.4", "openshift_node_group_name": "node-config-compute", "openshift_public_hostname": "app-node-1.openshift.example.com", "openshift_public_ip": "10.0.0.210", "private_v4": "192.168.99.4", "public_v4": "10.0.0.210" }, "infra-node-0.openshift.example.com": { "ansible_host": "10.0.0.224", "openshift_ip": "192.168.99.15", "openshift_node_group_name": "node-config-infra", "openshift_public_hostname": "infra-node-0.openshift.example.com", "openshift_public_ip": "10.0.0.224", "private_v4": "192.168.99.15", "public_v4": "10.0.0.224" }, "master-0.openshift.example.com": { "ansible_host": "10.0.0.223", "openshift_ip": "192.168.99.6", "openshift_node_group_name": "node-config-master", "openshift_public_hostname": "master-0.openshift.example.com", "openshift_public_ip": "10.0.0.223", "private_v4": "192.168.99.6", "public_v4": "10.0.0.223" } } }, "app": { "hosts": [ "app-node-0.openshift.example.com", "app-node-1.openshift.example.com" ] }, "cluster_hosts": { "hosts": [ "master-0.openshift.example.com", "app-node-0.openshift.example.com", "infra-node-0.openshift.example.com", "app-node-1.openshift.example.com" ] }, "dns": { "hosts": [] }, "etcd": { "hosts": [ "master-0.openshift.example.com" ] }, "glusterfs": { "hosts": [] }, "infra.openshift.example.com": { "hosts": [ "infra-node-0.openshift.example.com" ] }, "infra_hosts": { "hosts": [ "infra-node-0.openshift.example.com" ] }, "lb": { "hosts": [] }, "localhost": { "ansible_connection": "local" }, "masters": { "hosts": [ "master-0.openshift.example.com" ] }, "masters.openshift.example.com": { "hosts": [ "master-0.openshift.example.com" ] }, "nodes": { "hosts": [ "app-node-0.openshift.example.com", "app-node-1.openshift.example.com", "master-0.openshift.example.com", "infra-node-0.openshift.example.com" ] }, "nodes.openshift.example.com": { "hosts": [ "app-node-0.openshift.example.com", "app-node-1.openshift.example.com" ] } }
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3611