Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1618989

Summary: [Infra] OSP13 Exhausted all hosts available for retrying build failures for instance
Product: Red Hat OpenStack Reporter: Noam Manos <nmanos>
Component: opendaylightAssignee: Stephen Kitt <skitt>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Noam Manos <nmanos>
Severity: high Docs Contact:
Priority: unspecified    
Version: 13.0 (Queens)CC: aadam, mkolesni, nmanos, nyechiel, oblaut, tfreger
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: Infra
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-04 14:21:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
full console output
none
Could not determine a suitable URL for the plugin none

Description Noam Manos 2018-08-19 11:16:59 UTC
Created attachment 1476874 [details]
full console output

Description of problem:
Creating cirros VM fails with error: Exhausted all hosts available for retrying build failures for instance.

Version-Release number of selected component (if applicable):
OSP13 with ODL, puddle 2018-08-08.2
virt on RHEL 7.4 (3.10.0-693.21.1.el7.x86_64)

How reproducible:


Steps to Reproduce:
. overcloudrc
wget -N https://download.cirros-cloud.net/0.3.5/cirros-0.3.5-x86_64-disk.img
openstack image create --container-format bare --disk-format qcow2 --public --file cirros-0.3.5-x86_64-disk.img cirros35
flavor=cirros_flavor
openstack flavor create --public $flavor --id auto --ram 512 --disk 1 --vcpus 1
openstack router create Router_eNet
openstack network create net_ipv64_1
openstack subnet create --subnet-range 10.0.1.0/24 --network net_ipv64_1 --dhcp subnet_ipv4_1
openstack subnet create --subnet-range 2001::/64 --network net_ipv64_1 --ipv6-address-mode slaac --ipv6-ra-mode slaac --ip-version 6 subnet_ipv6_1
openstack network create net_ipv64_2
openstack subnet create --subnet-range 10.0.2.0/24 --network net_ipv64_2 --dhcp subnet_ipv4_2
openstack subnet create --subnet-range 2002::/64 --network net_ipv64_2 --ipv6-address-mode slaac --ipv6-ra-mode slaac --ip-version 6 subnet_ipv6_2
openstack router add subnet $router_id subnet_ipv4_2
openstack router add subnet $router_id subnet_ipv6_2
openstack router set --external-gateway $ext_net $router_id
sec_id=$(openstack security group create sec_group | awk -F'[ \t]*\\|[ \t]*' '/ id / {print $3}')
openstack security group rule create $sec_id --protocol tcp --dst-port 80 --remote-ip 0.0.0.0/0
openstack security group rule create $sec_id --protocol tcp --dst-port 22 --remote-ip 0.0.0.0/0
openstack security group rule create $sec_id --protocol tcp --dst-port 443 --remote-ip 0.0.0.0/0
openstack security group rule create $sec_id --protocol icmp --dst-port -1 --remote-ip 0.0.0.0/0
openstack keypair create my_rsa-key --private-key my_key.pem
chmod 400 my_key.pem
fip=$(openstack floating ip create $ext_net -c floating_ip_address -f value)
image_id=$(openstack image list | grep $image | head -1 | cut -d " " -f 2)
vm_name=${image}_vm1_net1
openstack server create --flavor $flavor --image $image_id --nic net-id=net_ipv64_1 --security-group $sec_id --key-name my_rsa-key $vm_name




Actual results:
(overcloud) [stack@undercloud-0 ~]$ openstack server list
+--------------------------------------+-------------------+--------+----------+----------+---------------+
| ID                                   | Name              | Status | Networks | Image    | Flavor        |
+--------------------------------------+-------------------+--------+----------+----------+---------------+
| 4c4a9cba-9024-4075-91b2-9eb2d14a6a9a | cirros35_vm1_net1 | ERROR  |          | cirros35 | cirros_flavor |
+--------------------------------------+-------------------+--------+----------+----------+---------------+

(overcloud) [stack@undercloud-0 ~]$ openstack server show cirros35_vm1_net1 
+-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field                               | Value                                                                                                                                                                                                                                                                                                                                                                                        |
+-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                                                                                                                                                                                                                                                                                                                                                       |
| OS-EXT-AZ:availability_zone         | nova                                                                                                                                                                                                                                                                                                                                                                                         |
| OS-EXT-SRV-ATTR:host                | None                                                                                                                                                                                                                                                                                                                                                                                         |
| OS-EXT-SRV-ATTR:hypervisor_hostname | None                                                                                                                                                                                                                                                                                                                                                                                         |
| OS-EXT-SRV-ATTR:instance_name       | instance-00000016                                                                                                                                                                                                                                                                                                                                                                            |
| OS-EXT-STS:power_state              | NOSTATE                                                                                                                                                                                                                                                                                                                                                                                      |
| OS-EXT-STS:task_state               | None                                                                                                                                                                                                                                                                                                                                                                                         |
| OS-EXT-STS:vm_state                 | error                                                                                                                                                                                                                                                                                                                                                                                        |
| OS-SRV-USG:launched_at              | None                                                                                                                                                                                                                                                                                                                                                                                         |
| OS-SRV-USG:terminated_at            | None                                                                                                                                                                                                                                                                                                                                                                                         |
| accessIPv4                          |                                                                                                                                                                                                                                                                                                                                                                                              |
| accessIPv6                          |                                                                                                                                                                                                                                                                                                                                                                                              |
| addresses                           |                                                                                                                                                                                                                                                                                                                                                                                              |
| config_drive                        |                                                                                                                                                                                                                                                                                                                                                                                              |
| created                             | 2018-08-19T09:52:15Z                                                                                                                                                                                                                                                                                                                                                                         |
| fault                               | {u'message': u'Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 4c4a9cba-9024-4075-91b2-9eb2d14a6a9a.', u'code': 500, u'details': u'  File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 581, in build_instances\n    raise exception.MaxRetriesExceeded(reason=msg)\n', u'created': u'2018-08-19T09:52:53Z'} |
| flavor                              | cirros_flavor (1cd2b5ae-8028-403f-b69e-9b0b39c6d6f9)                                                                                                                                                                                                                                                                                                                                         |
| hostId                              |                                                                                                                                                                                                                                                                                                                                                                                              |
| id                                  | 4c4a9cba-9024-4075-91b2-9eb2d14a6a9a                                                                                                                                                                                                                                                                                                                                                         |
| image                               | cirros35 (a0ab2e66-1d4c-4c17-a6f2-18189b3604e9)                                                                                                                                                                                                                                                                                                                                              |
| key_name                            | my_rsa-key                                                                                                                                                                                                                                                                                                                                                                                   |
| name                                | cirros35_vm1_net1                                                                                                                                                                                                                                                                                                                                                                            |
| project_id                          | e6b24e7ebe3f423a8228a415599de901                                                                                                                                                                                                                                                                                                                                                             |
| properties                          |                                                                                                                                                                                                                                                                                                                                                                                              |
| status                              | ERROR                                                                                                                                                                                                                                                                                                                                                                                        |
| updated                             | 2018-08-19T09:52:52Z                                                                                                                                                                                                                                                                                                                                                                         |
| user_id                             | d59a72fbb96441d4a0a8fbed1674a570                                                                                                                                                                                                                                                                                                                                                             |
| volumes_attached                    |                                                                                                                                                                                                                                                                                                                                                                                              |
+-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
(overcloud) [stack@undercloud-0 ~]$ 


Expected results:
The cirros VM should be up and running (Active server).

Additional info:
1. Similar problem also happened when I tried to run RHEL 7.5 vm. 
The VM status changed from Build to Error, and the message was: "No valid host was found".

2. This bug might be related to https://bugs.launchpad.net/nova/+bug/1758112.
(It seems like Nova issue, but since it happens with cirros (not only rhel image), then it might be related to neutron, as mentioned in https://bugs.launchpad.net/nova/+bug/1758112/comments/5

Comment 1 Noam Manos 2018-08-19 12:37:20 UTC
sosreports from all nodes uploaded to:

http://rhos-release.virt.bos.redhat.com/log/bz1618989/

Comment 2 Noam Manos 2018-08-19 15:41:10 UTC
Created attachment 1476908 [details]
Could not determine a suitable URL for the plugin

After creating the sosreport, I'm getting:

(overcloud) [stack@undercloud-0 ~]$ openstack server list

Failed to discover available identity versions when contacting http://10.0.0.105:5000//v3. Attempting to parse version from URL.
Could not determine a suitable URL for the plugin

Comment 3 Mike Kolesnik 2018-08-29 11:25:58 UTC
I see a lot of errors in the openstack logs (nova, neutron, etc):

  Can't connect to MySQL server on '172.17.1.15'

2018-08-19 12:32:36.769 20 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 110] Connection timed out
2018-08-19 12:32:36.861 19 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] Connection refused

Also in the rabbitmq log you can see:
=INFO REPORT==== 19-Aug-2018::12:32:21 ===
RabbitMQ is asked to stop...


I suspect the controller or some of these containers went into shutdown, but either way this isn't related to ODL nor to openstack, perhaps to the deployment that was in a weird state before?

From the sosreport it's visible that the system was up but its not clear what has happened to it before seeing this failure.


Noam, I would ask that you reproduce this with a clean deployment and provide with clear reproduction steps.

Comment 5 Noam Manos 2018-09-03 14:02:01 UTC
Was just hitting a similar issue now (this time on OSP13 RHEL 7.5):
Bug 1624875 - OSP13 with ODL - Build of instance failed: Binding failed for port.

might be the same issue, please look at it.

Comment 6 Mike Kolesnik 2018-09-04 08:02:28 UTC
Bug 1624875 is incomplete information, also this bugs has not enough information so I don't see a reason to open another bug for the same issues.

Please try to reproduce on a clean deployment which succeeded and that you see that everything is working.

If the deployment is not working in itself, then an appropriate bug should be open for that issue.

Comment 7 Noam Manos 2018-09-04 08:58:27 UTC
Both bugs (this and 1624875) have been run on a working OSP13 with ODL, on 2 different systems (not the same baremetal host), and their deployments went clean, no errors.

In this bug I've also linked to sos-reports of ALL nodes, what other information do you need ?

Comment 8 Mike Kolesnik 2018-09-04 14:21:47 UTC
After talking with Noam it seems there's not enough info to investigate the bug properly at this point since the information given is too broad.

Should the bug be reproduced we will reopen it.