Created attachment 1037552 [details] logs from one compute node Description of problem: this issue was originally reported via email. It is not critical yet and it can only be triggered following a very specific set of events based on the old implementation of the NovaCompute resource agent. It requires a fresh install every time to reproduce of both controllers and compute nodes. Rolling back the db or so won't do it. The resource agent code (as provided to us) used to do: export LIBGUESTFS_ATTACH_METHOD=appliance su nova -s /bin/sh -c /usr/bin/nova-compute & rc=$OCF_NOT_RUNNING ocf_log info "Waiting for nova to start" while [ $rc != $OCF_SUCCESS ]; do nova_monitor rc=$? done if [ "x${OCF_RESKEY_domain}" != x ]; then export service_host="${NOVA_HOST}.${OCF_RESKEY_domain}" else export service_host="${NOVA_HOST}" fi python -c "import os; from novaclient import client as nova_client; nova = nova_client.Client('2', os.environ.get('OCF_RESKEY_username'), os.environ.get('OCF_RESKEY_password'), os.environ.get('OCF_RESKEY_tenant_name'), os.environ.get('OCF_RESKEY_auth_url')); nova.services.enable(os.environ.get('service_host'), 'nova-compute');" It appears, from what we were able to see, that nova-compute would start, and while nova-compute starts to register itself as hypervisor, the subsequent call to nova would happen "too fast" or in a racy matter that left the db in a non consistent state. Any attempt to start an instance on that given compute node would fail. After a full environment reset, and dropped the call to python, everything would work just fine. Version-Release number of selected component (if applicable): controllers: openstack-nova-common-2015.1.0-4.el7ost.noarch openstack-nova-console-2015.1.0-4.el7ost.noarch openstack-nova-scheduler-2015.1.0-4.el7ost.noarch openstack-nova-novncproxy-2015.1.0-4.el7ost.noarch openstack-nova-conductor-2015.1.0-4.el7ost.noarch openstack-nova-api-2015.1.0-4.el7ost.noarch python-nova-2015.1.0-4.el7ost.noarch python-novaclient-2.23.0-1.el7ost.noarch computes: openstack-nova-common-2015.1.0-4.el7ost.noarch python-nova-2015.1.0-4.el7ost.noarch python-novaclient-2.23.0-1.el7ost.noarch openstack-nova-compute-2015.1.0-4.el7ost.noarch How reproducible: always Steps to Reproduce: 1. install super fresh environment without compute nodes 2. prepare one compute node (configure et all) WITHTOUT starting nova-compute 3. put the above code in a small shell script so it's executed as fast as pacemaker would execute it when starting the resource agent 4. execute it to start nova-compute 5. try to fire up an instance. Actual results: Instances will fail to start Expected results: Instances should start. Additional info: A nova-compute.log is attached from an old run. Hopefully it is good enough to be useful, but otherwise it might take a few days before we can grab another one. Also note that we did workaround this issue temporary by disabling the call to python (it provides an optimization but it's not critical path), tho a user can potentially do the same by start nova-compute via systemctl and enable the service right away.
Given the age of this thing, the rather specific reproduction steps, and the lack of customer case, I think we can safely close this.