Bug 1559947

Summary: OpenShift on OpenStack - Nova has a max_limit of 1000
Product: Red Hat OpenStack Reporter: Matt Bruzek <mbruzek>
Component: python-shadeAssignee: Antoni Segura Puimedon <asegurap>
Status: CLOSED ERRATA QA Contact: Udi Shkalim <ushkalim>
Severity: high Docs Contact:
Priority: high    
Version: unspecifiedCC: aos-bugs, asegurap, jokerman, jschluet, knylande, mfojtik, mmccomas, tsedovic, tzumainn
Target Milestone: z3Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-shade-1.27.1-2.el7ost Doc Type: Bug Fix
Doc Text:
When trying to get server lists, OpenStack Nova's API paging feature would return at most N elements, with N being the page size configured in Nova. This was not apparent to shade library users. Shade now deals with the paging on behalf of the users, so the users will get all the servers that they request.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-14 01:14:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matt Bruzek 2018-03-23 15:23:24 UTC
Description of problem:

OpenStack Nova has a default max_limit of 1000 results returned from the api service. The OpenShift on OpenStack implementation uses a dynamic inventory openshift-ansible/playbooks/openstack/inventory.py which uses Python to generate a list of all the nodes in the OpenShift cluster.

When attempting scale past 1000 nodes we found that inventory.py was only returning 1000 hosts and this was the latest nodes created, so it did not include the master, infra, or cns nodes in the output. 

Only returning 1000 records could cause a problem with a large cluster on scaleup or other Ansible operations that use the dynamic inventory file.

There are two workarounds for this issue.

1) From the command line you can use `openstack server list --limit -1` this will remove the limit and you are able to see all the nodes in the OpenStack cluster. However the inventory.py does not use the command line and will only return 1000 records.
2) On all controller systems you can edit /etc/nova/nova.conf and uncomment "max_limit" with a value greater than 1000. Then restart the nova services on all  controllers (we have 3 controllers). This also works around the 1000 limit, but could be an issue for customers to edit a production OpenStack cluster and restart services could cause downtime.

Version-Release number of selected component (if applicable): 
We noticed this in 3.9 while using the OpenShift on OpenStack ansible installer. But this is an OpenStack limitation.


How reproducible:
100% of the time we have been over 1000 nodes in our OpenShift on OpenStack cluster.

Steps to Reproduce:
1. Deploy OpenShift on OpenStack following the method outlined here: https://github.com/openshift/openshift-ansible/tree/master/playbooks/openstack
2. Scale up the cluster to past 1000 nodes.
3. Notice that inventory.py only returns 1000 nodes.

Actual results:
The inventory was only returning 1000 hosts at a time.


Expected results:
The inventory should return all hosts in the cluster. If there is an api limit the inventory should keep calling the API until all nodes are returned and then return the cluster listing.


Additional info:

The nova limit is described here: https://docs.openstack.org/ocata/config-reference/compute/api.html

This appears to be a limit for all OpenStack versions not just ocata.

Let me know what other information you may need.

Comment 1 Tomas Sedovic 2018-03-23 15:37:01 UTC
We'll have to fix the inventory script to handle pagination.

We could document & advise the nova change, but I'd prefer to only do that as the last resort. Looks like something we should be able to fix on our side.

Comment 4 Tomas Sedovic 2018-07-17 12:06:20 UTC
This is addressed by the following patch in python-shade:

https://review.openstack.org/#/c/555876/

It should be fixed in 1.28.0 version.

Comment 5 Johnny Liu 2018-08-31 08:49:47 UTC
Based on comment 4, it is kind of openstack bug fix, moving this bug to OpenStack Component.

Comment 16 Udi Shkalim 2018-11-12 11:12:13 UTC
Verified on python2-shade-1.27.1-2.el7ost
Set the mac_limit to 2 on nova.conf

(shiftstack) [cloud-user@ansible-host-0 ~]$ openstack server list
+--------------------------------------+----------------------------------+--------+----------------------------------------------------------------------+----------+---------+
| ID                                   | Name                             | Status | Networks                                                             | Image    | Flavor  |
+--------------------------------------+----------------------------------+--------+----------------------------------------------------------------------+----------+---------+
| 68345b69-09e6-4839-aaf6-1d096deafe84 | master-0.openshift.example.com   | ACTIVE | openshift-ansible-openshift.example.com-net=192.168.99.6, 10.0.0.223 | rhel-7.6 |         |
| 735bad92-7f66-4791-9a2c-7bf42433b289 | app-node-0.openshift.example.com | ACTIVE | openshift-ansible-openshift.example.com-net=192.168.99.9, 10.0.0.219 | rhel-7.6 | m1.node |
+--------------------------------------+----------------------------------+--------+----------------------------------------------------------------------+----------+---------+




With shade it is more then max limit (2)

(shiftstack) [cloud-user@ansible-host-0 ~]$ python -c 'import shade; cloud = shade.openstack_cloud(); print [server.name for server in cloud.list_servers()]'
[u'master-0.openshift.example.com', u'app-node-0.openshift.example.com', u'infra-node-0.openshift.example.com', u'app-node-1.openshift.example.com', u'ansible_host-0', u'openshift_dns-0']




Same for inventory.py:

(shiftstack) [cloud-user@ansible-host-0 ~]$ /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py --list
{
    "OSEv3": {
        "hosts": [
            "app-node-0.openshift.example.com",
            "app-node-1.openshift.example.com",
            "master-0.openshift.example.com",
            "infra-node-0.openshift.example.com"
        ],
        "vars": {}
    },
    "_meta": {
        "hostvars": {
            "app-node-0.openshift.example.com": {
                "ansible_host": "10.0.0.219",
                "openshift_ip": "192.168.99.9",
                "openshift_node_group_name": "node-config-compute",
                "openshift_public_hostname": "app-node-0.openshift.example.com",
                "openshift_public_ip": "10.0.0.219",
                "private_v4": "192.168.99.9",
                "public_v4": "10.0.0.219"
            },
            "app-node-1.openshift.example.com": {
                "ansible_host": "10.0.0.210",
                "openshift_ip": "192.168.99.4",
                "openshift_node_group_name": "node-config-compute",
                "openshift_public_hostname": "app-node-1.openshift.example.com",
                "openshift_public_ip": "10.0.0.210",
                "private_v4": "192.168.99.4",
                "public_v4": "10.0.0.210"
            },
            "infra-node-0.openshift.example.com": {
                "ansible_host": "10.0.0.224",
                "openshift_ip": "192.168.99.15",
                "openshift_node_group_name": "node-config-infra",
                "openshift_public_hostname": "infra-node-0.openshift.example.com",
                "openshift_public_ip": "10.0.0.224",
                "private_v4": "192.168.99.15",
                "public_v4": "10.0.0.224"
            },
            "master-0.openshift.example.com": {
                "ansible_host": "10.0.0.223",
                "openshift_ip": "192.168.99.6",
                "openshift_node_group_name": "node-config-master",
                "openshift_public_hostname": "master-0.openshift.example.com",
                "openshift_public_ip": "10.0.0.223",
                "private_v4": "192.168.99.6",
                "public_v4": "10.0.0.223"
            }
        }
    },
    "app": {
        "hosts": [
            "app-node-0.openshift.example.com",
            "app-node-1.openshift.example.com"
        ]
    },
    "cluster_hosts": {
        "hosts": [
            "master-0.openshift.example.com",
            "app-node-0.openshift.example.com",
            "infra-node-0.openshift.example.com",
            "app-node-1.openshift.example.com"
        ]
    },
    "dns": {
        "hosts": []
    },
    "etcd": {
        "hosts": [
            "master-0.openshift.example.com"
        ]
    },
    "glusterfs": {
        "hosts": []
    },
    "infra.openshift.example.com": {
        "hosts": [
            "infra-node-0.openshift.example.com"
        ]
    },
    "infra_hosts": {
        "hosts": [
            "infra-node-0.openshift.example.com"
        ]
    },
    "lb": {
        "hosts": []
    },
    "localhost": {
        "ansible_connection": "local"
    },
    "masters": {
        "hosts": [
            "master-0.openshift.example.com"
        ]
    },
    "masters.openshift.example.com": {
        "hosts": [
            "master-0.openshift.example.com"
        ]
    },
    "nodes": {
        "hosts": [
            "app-node-0.openshift.example.com",
            "app-node-1.openshift.example.com",
            "master-0.openshift.example.com",
            "infra-node-0.openshift.example.com"
        ]
    },
    "nodes.openshift.example.com": {
        "hosts": [
            "app-node-0.openshift.example.com",
            "app-node-1.openshift.example.com"
        ]
    }
}

Comment 18 errata-xmlrpc 2018-11-14 01:14:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3611