Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1619015

Summary: LB failed to start - Amphora VM which was marked for delete during startup.
Product: Red Hat OpenStack Reporter: Udi Shkalim <ushkalim>
Component: openstack-octaviaAssignee: Carlos Goncalves <cgoncalves>
Status: CLOSED WORKSFORME QA Contact: Alexander Stafeyev <astafeye>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 13.0 (Queens)CC: amuller, cgoncalves, ihrachys, juriarte, lpeer, majopela, nyechiel, oblaut, tsedovic, ushkalim
Target Milestone: ---Keywords: AutomationBlocker, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-16 15:02:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Compute-0 sos report none

Description Udi Shkalim 2018-08-19 14:45:50 UTC
Created attachment 1476907 [details]
Compute-0 sos report

Description of problem:
During installation of openshift on openstack we got 3 lb in an Error state:
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer list
+--------------------------------------+------------------------------------------------+----------------------------------+----------------+---------------------+----------+
| id                                   | name                                           | project_id                       | vip_address    | provisioning_status | provider |
+--------------------------------------+------------------------------------------------+----------------------------------+----------------+---------------------+----------+
| c3777e3b-16a8-437a-a3c3-85cce1f9a24c | openshift-ansible-openshift.example.com-api-lb | 4233e63b718e45238317316947bbe7fe | 172.30.0.1     | ERROR               | octavia  |
| c6d52d5e-b866-4826-bb0e-9d931a8c04d8 | openshift-web-console/webconsole               | 4233e63b718e45238317316947bbe7fe | 172.30.184.224 | ERROR               | octavia  |
| edf39e7f-abe1-49e4-a8bb-fce4730a6b1b | default/registry-console                       | 4233e63b718e45238317316947bbe7fe | 172.30.97.20   | ERROR               | octavia  |
+--------------------------------------+------------------------------------------------+----------------------------------+----------------+---------------------+----------+

"openstack loadbalancer amphora list" was empty.
The deployment was done on Aug 16.

Worker log error:

2018-08-16 13:00:46.907 23 WARNING octavia.controller.worker.tasks.database_tasks [-] Reverting create amphora in DB for amp id 4c31b056-3d1c-4246-86d8-ce0eeff533a5
2018-08-16 13:00:46.919 23 WARNING octavia.controller.worker.controller_worker [-] Task 'STANDALONE-octavia-create-amp-for-lb-subflow-octavia-create-amphora-indb' (f32b1e1a-af35-41c2-a156-01782009165d) transitioned into state 'REVERTED' from state 'REVERTING'
2018-08-16 13:00:46.922 23 WARNING octavia.controller.worker.tasks.database_tasks [-] Reverting Amphora allocation for the load balancer 20036c95-e677-4c33-881f-d392b7a979a8 in the database.
2018-08-16 13:00:46.928 23 WARNING octavia.controller.worker.controller_worker [-] Task 'STANDALONE-octavia-get-amphora-for-lb-subflow-octavia-mapload-balancer-to-amphora' (12d51513-1f2e-45ef-ba49-7f5f0fc514a0) transitioned into state 'REVERTED' from state 'REVERTING'
2018-08-16 13:00:46.935 23 WARNING octavia.controller.worker.controller_worker [-] Task 'octavia.controller.worker.tasks.lifecycle_tasks.LoadBalancerIDToErrorOnRevertTask' (dfe42b1d-4425-4829-ae13-d4f080e38c28) transitioned into state 'REVERTED' from state 'REVERTING'
2018-08-16 13:00:46.941 23 WARNING octavia.controller.worker.controller_worker [-] Flow 'octavia-create-loadbalancer-flow' (ac2f3f8b-b95c-4902-92e3-cc9e646af494) transitioned into state 'REVERTED' from state 'RUNNING'
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server [-] Exception during message handling: ComputeWaitTimeoutException: Waiting for compute to go active timeout.
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/octavia/controller/queue/endpoint.py", line 44, in create_load_balancer
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server     self.worker.create_load_balancer(load_balancer_id)
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/octavia/controller/worker/controller_worker.py", line 284, in create_load_balancer
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server     create_lb_tf.run()
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/engine.py", line 247, in run
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server     for _state in self.run_iter(timeout=timeout):
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server     self.worker.create_load_balancer(load_balancer_id)
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/octavia/controller/worker/controller_worker.py", line 284, in create_load_balancer
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server     create_lb_tf.run()
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/engine.py", line 247, in run
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server     for _state in self.run_iter(timeout=timeout):
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/engine.py", line 340, in run_iter
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server     failure.Failure.reraise_if_any(er_failures)
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/taskflow/types/failure.py", line 336, in reraise_if_any
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server     failures[0].reraise()
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/taskflow/types/failure.py", line 343, in reraise
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server     six.reraise(*self._exc_info)
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server     result = task.execute(**arguments)
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/octavia/controller/worker/tasks/compute_tasks.py", line 198, in execute
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server     raise exceptions.ComputeWaitTimeoutException()
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server ComputeWaitTimeoutException: Waiting for compute to go active timeout.
2018-08-16 13:00:46.941 23 ERROR oslo_messaging.rpc.server
2018-08-16 13:00:48.809 23 INFO octavia.controller.queue.endpoint [-] Deleting load balancer '20036c95-e677-4c33-881f-d392b7a979a8'...

Tracking the compute logs with the instance id we took from the worker logs, we saw the following in the compute log:

[root@compute-0 nova]# zgrep -i 1ad6b6d5-8940-4097-8222-3008d328ae89 *
nova-compute.log.3.gz:2018-08-16 12:59:03.709 1 INFO nova.compute.claims [req-3d0172ce-a0fa-4d43-8a5e-5ea95eb72086 4b44a292f83d43a3a04d5bff3e8cd999 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] Attempting claim on node compute-0.localdomain: memory 1024 MB, disk 3 GB, vcpus 1 CPU
nova-compute.log.3.gz:2018-08-16 12:59:03.710 1 INFO nova.compute.claims [req-3d0172ce-a0fa-4d43-8a5e-5ea95eb72086 4b44a292f83d43a3a04d5bff3e8cd999 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] Total memory: 130946 MB, used: 8192.00 MB
nova-compute.log.3.gz:2018-08-16 12:59:03.710 1 INFO nova.compute.claims [req-3d0172ce-a0fa-4d43-8a5e-5ea95eb72086 4b44a292f83d43a3a04d5bff3e8cd999 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] memory limit not specified, defaulting to unlimited
nova-compute.log.3.gz:2018-08-16 12:59:03.710 1 INFO nova.compute.claims [req-3d0172ce-a0fa-4d43-8a5e-5ea95eb72086 4b44a292f83d43a3a04d5bff3e8cd999 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] Total disk: 419 GB, used: 20.00 GB
nova-compute.log.3.gz:2018-08-16 12:59:03.710 1 INFO nova.compute.claims [req-3d0172ce-a0fa-4d43-8a5e-5ea95eb72086 4b44a292f83d43a3a04d5bff3e8cd999 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] disk limit not specified, defaulting to unlimited
nova-compute.log.3.gz:2018-08-16 12:59:03.711 1 INFO nova.compute.claims [req-3d0172ce-a0fa-4d43-8a5e-5ea95eb72086 4b44a292f83d43a3a04d5bff3e8cd999 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] Total vcpu: 32 VCPU, used: 2.00 VCPU
nova-compute.log.3.gz:2018-08-16 12:59:03.711 1 INFO nova.compute.claims [req-3d0172ce-a0fa-4d43-8a5e-5ea95eb72086 4b44a292f83d43a3a04d5bff3e8cd999 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] vcpu limit not specified, defaulting to unlimited
nova-compute.log.3.gz:2018-08-16 12:59:03.712 1 INFO nova.compute.claims [req-3d0172ce-a0fa-4d43-8a5e-5ea95eb72086 4b44a292f83d43a3a04d5bff3e8cd999 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] Claim successful on node compute-0.localdomain
nova-compute.log.3.gz:2018-08-16 12:59:05.134 1 INFO nova.virt.libvirt.driver [req-3d0172ce-a0fa-4d43-8a5e-5ea95eb72086 4b44a292f83d43a3a04d5bff3e8cd999 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] Creating image
nova-compute.log.3.gz:2018-08-16 13:01:31.405 1 INFO nova.virt.libvirt.driver [req-c51e7a96-7b83-4b73-8cd9-a994fe19a4e8 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] Using config drive
nova-compute.log.3.gz:2018-08-16 13:01:31.536 1 INFO nova.virt.libvirt.driver [req-c51e7a96-7b83-4b73-8cd9-a994fe19a4e8 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] Creating config drive at /var/lib/nova/instances/1ad6b6d5-8940-4097-8222-3008d328ae89/disk.config
nova-compute.log.3.gz:2018-08-16 13:01:32.406 1 INFO nova.compute.manager [req-c51e7a96-7b83-4b73-8cd9-a994fe19a4e8 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] VM Started (Lifecycle Event)
nova-compute.log.3.gz:2018-08-16 13:01:32.450 1 INFO nova.compute.manager [req-c51e7a96-7b83-4b73-8cd9-a994fe19a4e8 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] VM Paused (Lifecycle Event)
nova-compute.log.3.gz:2018-08-16 13:01:32.535 1 INFO nova.compute.manager [req-c51e7a96-7b83-4b73-8cd9-a994fe19a4e8 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] During sync_power_state the instance has a pending task (deleting). Skip.
nova-compute.log.3.gz:2018-08-16 13:01:34.240 1 INFO nova.compute.manager [req-5aab9583-3e7d-40f6-ac30-7f565275cfe4 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] VM Resumed (Lifecycle Event)
nova-compute.log.3.gz:2018-08-16 13:01:34.245 1 INFO nova.virt.libvirt.driver [req-5aab9583-3e7d-40f6-ac30-7f565275cfe4 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] Instance spawned successfully.
nova-compute.log.3.gz:2018-08-16 13:01:34.246 1 INFO nova.compute.manager [req-5aab9583-3e7d-40f6-ac30-7f565275cfe4 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] Took 149.11 seconds to spawn the instance on the hypervisor.
nova-compute.log.3.gz:2018-08-16 13:01:34.331 1 INFO nova.compute.manager [req-5aab9583-3e7d-40f6-ac30-7f565275cfe4 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] During sync_power_state the instance has a pending task (deleting). Skip.
nova-compute.log.3.gz:2018-08-16 13:01:34.332 1 INFO nova.compute.manager [req-5aab9583-3e7d-40f6-ac30-7f565275cfe4 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] VM Resumed (Lifecycle Event)
nova-compute.log.3.gz:2018-08-16 13:01:34.419 1 INFO nova.compute.manager [req-5aab9583-3e7d-40f6-ac30-7f565275cfe4 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] During sync_power_state the instance has a pending task (deleting). Skip.
nova-compute.log.3.gz:2018-08-16 13:01:34.987 1 INFO nova.compute.manager [req-5444191c-03a0-4d7d-bb86-ba86d9e9b660 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] Neutron deleted interface 25b8c937-8f8f-4591-beb4-f215871565c0; detaching it from the instance and deleting it from the info cache
nova-compute.log.3.gz:2018-08-16 13:01:35.014 1 INFO nova.compute.manager [req-5444191c-03a0-4d7d-bb86-ba86d9e9b660 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] Took 0.67 seconds to deallocate network for instance.
nova-compute.log.3.gz:2018-08-16 13:01:35.144 1 INFO nova.scheduler.client.report [req-5444191c-03a0-4d7d-bb86-ba86d9e9b660 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] Deleted allocation for instance 1ad6b6d5-8940-4097-8222-3008d328ae89
nova-compute.log.3.gz:2018-08-16 13:01:35.147 1 INFO nova.compute.manager [req-5444191c-03a0-4d7d-bb86-ba86d9e9b660 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] Terminating instance
nova-compute.log.3.gz:2018-08-16 13:01:35.361 1 INFO nova.virt.libvirt.driver [req-5444191c-03a0-4d7d-bb86-ba86d9e9b660 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] Instance destroyed successfully.
nova-compute.log.3.gz:2018-08-16 13:01:35.388 1 INFO nova.virt.libvirt.driver [req-5444191c-03a0-4d7d-bb86-ba86d9e9b660 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] Deleting instance files /var/lib/nova/instances/1ad6b6d5-8940-4097-8222-3008d328ae89_del
nova-compute.log.3.gz:2018-08-16 13:01:35.389 1 INFO nova.virt.libvirt.driver [req-5444191c-03a0-4d7d-bb86-ba86d9e9b660 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] Deletion of /var/lib/nova/instances/1ad6b6d5-8940-4097-8222-3008d328ae89_del complete
nova-compute.log.3.gz:2018-08-16 13:01:35.476 1 INFO nova.compute.manager [req-5444191c-03a0-4d7d-bb86-ba86d9e9b660 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] Took 0.33 seconds to destroy the instance on the hypervisor.
nova-compute.log.3.gz:2018-08-16 13:01:35.561 1 INFO nova.compute.manager [req-5444191c-03a0-4d7d-bb86-ba86d9e9b660 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] Took 0.08 seconds to deallocate network for instance.
nova-compute.log.3.gz:2018-08-16 13:01:50.358 1 INFO nova.compute.manager [req-6b4051a6-ca4a-4801-b090-73e5bd60b804 2921a2a310184d40a66c506212593846 5574ecf19dad4e63bf5ce82a93fbfc1f - default default] [instance: 1ad6b6d5-8940-4097-8222-3008d328ae89] VM Stopped (Lifecycle Event)

It seems that the vm was marked for delete before it was up. It took the vm 149.11 sec to start.

sos reports: compute is attached, see comments for controller-0 sos report

Version-Release number of selected component (if applicable):
puppet-octavia-12.4.0-2.el7ost.noarch
python2-octaviaclient-1.4.0-1.el7ost.noarch
octavia-amphora-image-x86_64-13.0-20180808.1.el7ost.noarch
openstack-octavia-health-manager-2.0.1-6.d137eaagit.el7ost.noarch
puppet-octavia-12.4.0-2.el7ost.noarch
openstack-octavia-common-2.0.1-6.d137eaagit.el7ost.noarch
python-octavia-2.0.1-6.d137eaagit.el7ost.noarch
openstack-octavia-api-2.0.1-6.d137eaagit.el7ost.noarch
python2-octaviaclient-1.4.0-1.el7ost.noarch
openstack-octavia-housekeeping-2.0.1-6.d137eaagit.el7ost.noarch
openstack-octavia-worker-2.0.1-6.d137eaagit.el7ost.noarch

How reproducible:


Steps to Reproduce:
1. Deploy OpenStack 13
2. Run openshift-ansible 
3. 

Actual results:
Playbook failed - Pods got into an error state
OpenStack LB failed to start

Expected results:
Playbook passed - pods are created successfully
OpenStack LB created successfully

Additional info:
(overcloud) [stack@undercloud-0 ~]$ openstack loadbalancer list
+--------------------------------------+------------------------------------------------+----------------------------------+----------------+---------------------+----------+
| id                                   | name                                           | project_id                       | vip_address    | provisioning_status | provider |
+--------------------------------------+------------------------------------------------+----------------------------------+----------------+---------------------+----------+
| c3777e3b-16a8-437a-a3c3-85cce1f9a24c | openshift-ansible-openshift.example.com-api-lb | 4233e63b718e45238317316947bbe7fe | 172.30.0.1     | ERROR               | octavia  |
| c6d52d5e-b866-4826-bb0e-9d931a8c04d8 | openshift-web-console/webconsole               | 4233e63b718e45238317316947bbe7fe | 172.30.184.224 | ERROR               | octavia  |
| edf39e7f-abe1-49e4-a8bb-fce4730a6b1b | default/registry-console                       | 4233e63b718e45238317316947bbe7fe | 172.30.97.20   | ERROR               | octavia  |
+--------------------------------------+------------------------------------------------+----------------------------------+----------------+---------------------+----------+


(overcloud) [stack@undercloud-0 ~]$ openstack server list
+--------------------------------------+------------------------------------+--------+------------------------------------------------------------------------+----------+-----------+
| ID                                   | Name                               | Status | Networks                                                               | Image    | Flavor    |
+--------------------------------------+------------------------------------+--------+------------------------------------------------------------------------+----------+-----------+
| 0bb03f42-5325-4128-933c-1dbaeccb6c0e | infra-node-0.openshift.example.com | ACTIVE | openshift-ansible-openshift.example.com-net=192.168.99.6, 10.46.22.170 | rhel-7.5 | m1.node   |
| a3771084-23a4-4f6c-aac2-01ce0f756e7e | master-0.openshift.example.com     | ACTIVE | openshift-ansible-openshift.example.com-net=192.168.99.7, 10.46.22.164 | rhel-7.5 | m1.master |
| a70e5b60-86a0-48c0-bd8a-9575e92b4322 | app-node-1.openshift.example.com   | ACTIVE | openshift-ansible-openshift.example.com-net=192.168.99.8, 10.46.22.162 | rhel-7.5 | m1.node   |
| 2fd257ac-e584-4dd4-98aa-064f272cf82e | app-node-0.openshift.example.com   | ACTIVE | openshift-ansible-openshift.example.com-net=192.168.99.4, 10.46.22.174 | rhel-7.5 | m1.node   |
| 2f545aa3-6453-4a79-aaed-ba0c6d04ae3c | openshift-dns                      | ACTIVE | openshift-dns=192.168.23.10, 10.46.22.163                              |          | m1.small  |
| 7c1c023c-76d1-4201-b410-eef8c8539e23 | ansible_host-0                     | ACTIVE | private_openshift=172.16.40.4, 10.46.22.171                            | rhel-7.5 | m1.small  |
+--------------------------------------+------------------------------------+--------+------------------------------------------------------------------------+----------+-----------+



(overcloud) [stack@undercloud-0 ~]$ openstack network list
+--------------------------------------+-----------------------------------------------------+--------------------------------------+
| ID                                   | Name                                                | Subnets                              |
+--------------------------------------+-----------------------------------------------------+--------------------------------------+
| 03bfb687-01e5-4b9b-ac1d-24e4e729773a | openshift-dns                                       | 5f924684-c52f-462a-8a17-e518f57a8d9d |
| 4800aa94-fa1d-4926-a739-1902a2809840 | lb-mgmt-net                                         | d0df698a-04ca-4871-8d9d-77b6258ecf22 |
| 53b00d39-ed2d-4e45-a135-cb6a350b9714 | openshift-ansible-openshift.example.com-net         | 8ae35b74-8b2e-4dc8-aed0-ac272b4f1771 |
| 645679a4-251a-4899-905a-89afbc61908e | openshift-ansible-openshift.example.com-pod-net     | 11c5fcc7-6ebf-410f-926d-7a81ded93013 |
| 807f6adb-33ef-4332-98ed-9379036601d6 | openshift-ansible-openshift.example.com-service-net | d6e2f6a5-90f8-46f3-ad4e-06ea69120239 |
| a616c074-cddf-4a16-a197-86517ba814fc | public                                              | 46e1e3b3-8514-4ac9-b8e4-d0508820c0cf |
| d9238d4e-8dc3-4936-950d-e4ababf123b7 | private_openshift                                   | 1d476bd0-be6b-48d9-851a-86c6b73e7230 |
+--------------------------------------+-----------------------------------------------------+--------------------------------------+

Comment 6 Carlos Goncalves 2018-08-22 14:16:51 UTC
Octavia log indicates issues coming from the compute service. Looking further, nova compute log shows errors in connecting to MySQL server as well as ECONNREFUSED when trying to reach AMQP server.

Please make sure compute node is fully operational and retry creation of load balancers.

Comment 7 Udi Shkalim 2018-08-23 08:01:39 UTC
The errors you see can happen during startup of the cluster when the amqp and db services are still down.

I've just booted an instance (the first one) and all compute services reported status OK:
(overcloud) [stack@undercloud-0 ~]$ openstack server list
+--------------------------------------+------------------------------------+--------+-------------------------------------------------------------------------+----------+-----------+
| ID                                   | Name                               | Status | Networks                                                                | Image    | Flavor    |
+--------------------------------------+------------------------------------+--------+-------------------------------------------------------------------------+----------+-----------+
| 504fbed2-7377-417d-8188-8ff2f3facaef | test-for-octavia-bug               | ACTIVE | private_openshift=172.16.40.6                                           | rhel-7.5 | m1.node   |
| ed1d7dd1-13dc-417b-b3e7-9628013a8865 | master-0.openshift.example.com     | ACTIVE | openshift-ansible-openshift.example.com-net=192.168.99.14, 10.46.22.162 | rhel-7.5 | m1.master |
| c108d532-82dc-494a-a752-08df53adac39 | app-node-0.openshift.example.com   | ACTIVE | openshift-ansible-openshift.example.com-net=192.168.99.10, 10.46.22.175 | rhel-7.5 | m1.node   |
| ef29c4d2-5f29-4412-923e-ccc38c0ba61d | app-node-1.openshift.example.com   | ACTIVE | openshift-ansible-openshift.example.com-net=192.168.99.12, 10.46.22.174 | rhel-7.5 | m1.node   |
| b21a657c-0352-4f87-b085-7dd1bf4b189e | infra-node-0.openshift.example.com | ACTIVE | openshift-ansible-openshift.example.com-net=192.168.99.15, 10.46.22.167 | rhel-7.5 | m1.node   |
| a1ec1bd6-48ca-43ae-a912-e6649bdc47b3 | ansible_host-0                     | ACTIVE | private_openshift=172.16.40.4, 10.46.22.168                             | rhel-7.5 | m1.small  |
| f8617c5f-3403-4679-a551-1a3983cb04db | openshift-dns                      | ACTIVE | openshift-dns=192.168.23.5, 10.46.22.161                                |          |           |
+--------------------------------------+------------------------------------+--------+-------------------------------------------------------------------------+----------+-----------+


(overcloud) [stack@undercloud-0 ~]$ openstack compute service list
+----+------------------+--------------------------+----------+---------+-------+----------------------------+
| ID | Binary           | Host                     | Zone     | Status  | State | Updated At                 |
+----+------------------+--------------------------+----------+---------+-------+----------------------------+
|  1 | nova-scheduler   | controller-0.localdomain | internal | enabled | up    | 2018-08-23T07:52:12.000000 |
|  2 | nova-consoleauth | controller-0.localdomain | internal | enabled | up    | 2018-08-23T07:52:12.000000 |
|  3 | nova-conductor   | controller-0.localdomain | internal | enabled | up    | 2018-08-23T07:52:13.000000 |
|  5 | nova-compute     | compute-0.localdomain    | nova     | enabled | up    | 2018-08-23T07:52:10.000000 |
+----+------------------+--------------------------+----------+---------+-------+----------------------------+

Comment 8 Carlos Goncalves 2018-09-08 19:44:01 UTC
(In reply to Udi Shkalim from comment #7)
> The errors you see can happen during startup of the cluster when the amqp
> and db services are still down.

The cluster was rebooted? How do you define "cluster" (OSP cluster or OSP cluster)?

The controller logs seem have been truncated to last lines. Would it be possible share untruncated ones please?

This rhbz could be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1623146.

Comment 9 Udi Shkalim 2018-09-12 13:08:14 UTC
(In reply to Carlos Goncalves from comment #8)
> (In reply to Udi Shkalim from comment #7)
> > The errors you see can happen during startup of the cluster when the amqp
> > and db services are still down.
> 
> The cluster was rebooted? How do you define "cluster" (OSP cluster or OSP
> cluster)?
Cluster as OpenStack Pacemaker Cluster. No, it was not rebooted.
> 
> The controller logs seem have been truncated to last lines. Would it be
> possible share untruncated ones please?
So you are saying that we have a bug in the sosreport? The sosreport should include _all_ logs so we can free the setup. I'm sure you understand that keeping a setup till the bug is resolved is not an option.
> This rhbz could be a duplicate of
> https://bugzilla.redhat.com/show_bug.cgi?id=1623146.
Could be. We found out that there was an issue with the neutron firewall driver and now we don't have a problem to deploy LB.

Comment 10 Carlos Goncalves 2018-09-16 15:02:16 UTC
(In reply to Udi Shkalim from comment #9)
> (In reply to Carlos Goncalves from comment #8)
> > This rhbz could be a duplicate of
> > https://bugzilla.redhat.com/show_bug.cgi?id=1623146.
> Could be. We found out that there was an issue with the neutron firewall
> driver and now we don't have a problem to deploy LB.

It could be that the firewall was preventing communication between amphora and Octavia services. Please reopen if it turns out not to a firewall nor DB connection issues, and I'll be glad to revisit this rhbz.