Created attachment 1430151 [details] Artifacts from the job Description of problem: We're seeing issues where VMs sometimes fail to start after one of the controllers get taken down (either stopping the container or blocking 2550) Version-Release number of selected component (if applicable): opendaylight-8.0.0-8.el7ost How reproducible: Random Steps to Reproduce: 1. Remove a controller from the cluster 2. Start a VM Actual results: VM stays in BUILD and eventually transitions to ERROR. And reports that it can't get a valid host. - https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/DFG-opendaylight-odl-netvirt-13_director-rhel-virthost-3cont_2comp-ipv4-vxlan-ha-csit/28/robot/report/log.html Expected results: VM should start Additional info:
another job that saw this is here: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/DFG-opendaylight-odl-netvirt-13_director-rhel-virthost-3cont_2comp-ipv4-vxlan-ha-csit/26/robot/report/log.html#s1-s5-t11-k4-k1-k3-k4-k2 the "fault" in "server show" is: {"message": "Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 464b5640-8ef3-4237-9a3b-22fb05f29787.", "code": 500, "details": " File \"/usr/lib/python2.7/site-packages/nova/conductor/manager.py\", line 580, in build_instances | | | raise exception.MaxRetriesExceeded(reason=msg) We need to dig in the other openstack logs, I think. like nova and maybe neutron.
Please attach logs from neutron & ODL.
Created attachment 1432698 [details] controller-0 karaf.log
Created attachment 1432699 [details] odl and neutron logs for all three controllers
This seems like a duplicate of bug 1575150 judging by the logs, they have the same time outs while trying to contact ODL. The difference seems to be that here the "agents" were detected as "dead" before the VM creation so it seems like a different error, but the root cause is the same. If the other bug solution doesn't solve this one, please reopen. *** This bug has been marked as a duplicate of bug 1575150 ***