Bug 1264690 - neutron service is not running after a reboot
Summary: neutron service is not running after a reboot
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 8.0 (Liberty)
Assignee: Hugh Brock
QA Contact: yeylon@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-20 18:51 UTC by bigswitch
Modified: 2023-09-14 03:05 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-02-28 07:46:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1263777 0 high CLOSED neutron agent-list on undercloud show service as down but its actually up on nova controller 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1266183 0 medium CLOSED neutron services is not running on one controller node 2021-02-22 00:41:40 UTC

Internal Links: 1263777 1266183

Description bigswitch 2015-09-20 18:51:21 UTC
Description of problem:
related to bugzilla 1264688. After all three controller node is rebooted and rabbitmq is running, I notice ip netns is not showing up on two controller node. There is supposed to be one dhcp agent per controller. Only on controller-0 is it running, but is running with a different IP address
10.1.1.2, 10.1.1.3 and 10.1.1.4 is dhcp port for network 0e4dc72c-343f-49e9-98cc-a77e9311c280
from neutron, 10.1.1.3 is supposed to be running on controller-1, while 10.1.1.2 is supposed to be running on controller-0

 neutron port-list
+--------------------------------------+------+-------------------+-----------------------------------------------------------------------------------+
| id                                   | name | mac_address       | fixed_ips                                                                         |
+--------------------------------------+------+-------------------+-----------------------------------------------------------------------------------+
| 025443dd-f7af-4a5d-87e3-ae17158cd357 |      | fa:16:3e:35:3d:15 | {"subnet_id": "15294705-732c-44c2-a416-5b6bcfbc210f", "ip_address": "10.1.1.3"}   |
| 064f2a04-a354-419d-be26-ad2791d0cdc1 |      | fa:16:3e:f9:09:86 | {"subnet_id": "15294705-732c-44c2-a416-5b6bcfbc210f", "ip_address": "10.1.1.4"}   |
| 09c269a8-ff3f-4fe4-a399-225335d0868d |      | fa:16:3e:fe:2e:cc | {"subnet_id": "6e0923b6-24cb-4b8d-a8fc-b140f7eb89c2", "ip_address": "10.2.2.3"}   |
| 19605ab3-d95d-4f7a-941a-a1cefda65729 |      | fa:16:3e:77:15:e1 | {"subnet_id": "b1f7b878-4993-4fe9-8c7b-05ee021951a6", "ip_address": "10.1.2.3"}   |
| 20e5c70c-1a9b-4c35-81a4-c818e7fcf920 |      | fa:16:3e:cc:38:9f | {"subnet_id": "4d9a5df1-39b4-43e7-9c41-7fba6d22e732", "ip_address": "10.2.1.4"}   |
| 2f2f2e52-6f3e-43ab-b733-19852cce58d2 |      | fa:16:3e:66:c4:5c | {"subnet_id": "6e0923b6-24cb-4b8d-a8fc-b140f7eb89c2", "ip_address": "10.2.2.5"}   |
| 3b0a66f2-cf7b-4b2d-9740-13a29ef7ca18 |      | fa:16:3e:e1:19:ff | {"subnet_id": "b1f7b878-4993-4fe9-8c7b-05ee021951a6", "ip_address": "10.1.2.4"}   |
| 3d3d8521-4a2e-4fc5-bdbc-fde97783e2ba |      | fa:16:3e:c1:f7:97 | {"subnet_id": "c7ed0f3c-537c-483e-b9aa-43b4e4520f3f", "ip_address": "10.8.86.12"} |
| 47895e82-ee16-4428-b25a-5238c230340b |      | fa:16:3e:97:f9:36 | {"subnet_id": "4d9a5df1-39b4-43e7-9c41-7fba6d22e732", "ip_address": "10.2.1.6"}   |
| 68de1b0f-cddd-46b8-85ac-1b6e05533b88 |      | fa:16:3e:4e:0b:72 | {"subnet_id": "15294705-732c-44c2-a416-5b6bcfbc210f", "ip_address": "10.1.1.2"}   |
| 7366ed70-22ff-4953-aaf2-dfa7c5ff448e |      | fa:16:3e:41:ef:b9 | {"subnet_id": "b1f7b878-4993-4fe9-8c7b-05ee021951a6", "ip_address": "10.1.2.1"}   |
| 73d1ef94-351b-4608-ac8a-7ce9aa0a9171 |      | fa:16:3e:bc:e8:65 | {"subnet_id": "6e0923b6-24cb-4b8d-a8fc-b140f7eb89c2", "ip_address": "10.2.2.4"}   |
| 7ddce143-004e-4f8e-8dc8-5b324cc4b8e1 |      | fa:16:3e:1f:1a:1d | {"subnet_id": "4d9a5df1-39b4-43e7-9c41-7fba6d22e732", "ip_address": "10.2.1.3"}   |
| 8208da39-af2f-4d95-aff4-5eade41bff90 |      | fa:16:3e:d1:7d:18 | {"subnet_id": "15294705-732c-44c2-a416-5b6bcfbc210f", "ip_address": "10.1.1.6"}   |
| 8897f981-6909-4406-a6e4-d8eac5bcb6ea |      | fa:16:3e:a3:ab:c7 | {"subnet_id": "4d9a5df1-39b4-43e7-9c41-7fba6d22e732", "ip_address": "10.2.1.1"}   |
| 907f5282-f846-4dbe-a1ff-13c1ce918d40 |      | fa:16:3e:5a:f2:ed | {"subnet_id": "15294705-732c-44c2-a416-5b6bcfbc210f", "ip_address": "10.1.1.1"}   |
| 9fc454e8-d14e-49ef-a395-681f53e350b5 |      | fa:16:3e:d4:33:d6 | {"subnet_id": "6e0923b6-24cb-4b8d-a8fc-b140f7eb89c2", "ip_address": "10.2.2.6"}   |
| a4682b7d-bd75-431c-9f41-e7d34d0910a3 |      | fa:16:3e:b3:4b:40 | {"subnet_id": "c7ed0f3c-537c-483e-b9aa-43b4e4520f3f", "ip_address": "10.8.86.11"} |
| a55b5227-c7e9-4be9-8ce8-5209551d0c08 |      | fa:16:3e:cd:27:9d | {"subnet_id": "b1f7b878-4993-4fe9-8c7b-05ee021951a6", "ip_address": "10.1.2.5"}   |
| a649f035-d534-48d8-8c5d-777983a8e40e |      | fa:16:3e:21:56:a8 | {"subnet_id": "b1f7b878-4993-4fe9-8c7b-05ee021951a6", "ip_address": "10.1.2.2"}   |
| ae3ea413-ecc7-479b-b9a0-b6fc3a11657b |      | fa:16:3e:37:63:9b | {"subnet_id": "6e0923b6-24cb-4b8d-a8fc-b140f7eb89c2", "ip_address": "10.2.2.1"}   |
| b4be4230-bb33-40cc-a3af-85e921b9d7e5 |      | fa:16:3e:43:87:a4 | {"subnet_id": "b1f7b878-4993-4fe9-8c7b-05ee021951a6", "ip_address": "10.1.2.6"}   |
| b56c3ca5-1748-4f09-a483-0a6c0b1fc3d2 |      | fa:16:3e:78:8c:40 | {"subnet_id": "c7ed0f3c-537c-483e-b9aa-43b4e4520f3f", "ip_address": "10.8.86.13"} |
| b5fab58f-527e-42e7-9189-3f3592f2b855 |      | fa:16:3e:0d:f5:2f | {"subnet_id": "15294705-732c-44c2-a416-5b6bcfbc210f", "ip_address": "10.1.1.5"}   |
| bc0422c2-460d-4430-adb6-72b5cf10cbc3 |      | fa:16:3e:bc:20:9c | {"subnet_id": "4d9a5df1-39b4-43e7-9c41-7fba6d22e732", "ip_address": "10.2.1.5"}   |
| ddc56141-24a6-40b1-8012-011ddaef5cac |      | fa:16:3e:0c:7f:e0 | {"subnet_id": "4d9a5df1-39b4-43e7-9c41-7fba6d22e732", "ip_address": "10.2.1.2"}   |
| f6d8e59d-3ded-4de5-864f-16fd9a87ec2d |      | fa:16:3e:1f:e3:f0 | {"subnet_id": "6e0923b6-24cb-4b8d-a8fc-b140f7eb89c2", "ip_address": "10.2.2.2"}   |
+--------------------------------------+------+-------------------+-----------------------------------------------------------------------------------+
[stack@c5220-01 ~]$ neutron port-show 025443dd-f7af-4a5d-87e3-ae17158cd357
+-----------------------+---------------------------------------------------------------------------------+
| Field                 | Value                                                                           |
+-----------------------+---------------------------------------------------------------------------------+
| admin_state_up        | True                                                                            |
| allowed_address_pairs |                                                                                 |
| binding:host_id       | overcloud-controller-1.localdomain                                              |
| binding:profile       | {}                                                                              |
| binding:vif_details   | {"port_filter": true, "ovs_hybrid_plug": true}                                  |
| binding:vif_type      | ovs                                                                             |
| binding:vnic_type     | normal                                                                          |
| device_id             | dhcp827da361-9c56-50f7-913f-5a01f7bfed2c-0e4dc72c-343f-49e9-98cc-a77e9311c280   |
| device_owner          | network:dhcp                                                                    |
| extra_dhcp_opts       |                                                                                 |
| fixed_ips             | {"subnet_id": "15294705-732c-44c2-a416-5b6bcfbc210f", "ip_address": "10.1.1.3"} |
| id                    | 025443dd-f7af-4a5d-87e3-ae17158cd357                                            |
| mac_address           | fa:16:3e:35:3d:15                                                               |
| name                  |                                                                                 |
| network_id            | 0e4dc72c-343f-49e9-98cc-a77e9311c280                                            |
| security_groups       |                                                                                 |
| status                | ACTIVE                                                                          |
| tenant_id             | 07c3ac8ce96b4e938c4917df5b1f3ce9                                                |
+-----------------------+---------------------------------------------------------------------------------+
[stack@c5220-01 ~]$ neutron port-show 064f2a04-a354-419d-be26-ad2791d0cdc1
+-----------------------+---------------------------------------------------------------------------------+
| Field                 | Value                                                                           |
+-----------------------+---------------------------------------------------------------------------------+
| admin_state_up        | True                                                                            |
| allowed_address_pairs |                                                                                 |
| binding:host_id       | overcloud-controller-2.localdomain                                              |
| binding:profile       | {}                                                                              |
| binding:vif_details   | {"port_filter": true, "ovs_hybrid_plug": true}                                  |
| binding:vif_type      | ovs                                                                             |
| binding:vnic_type     | normal                                                                          |
| device_id             | reserved_dhcp_port                                                              |
| device_owner          | network:dhcp                                                                    |
| extra_dhcp_opts       |                                                                                 |
| fixed_ips             | {"subnet_id": "15294705-732c-44c2-a416-5b6bcfbc210f", "ip_address": "10.1.1.4"} |
| id                    | 064f2a04-a354-419d-be26-ad2791d0cdc1                                            |
| mac_address           | fa:16:3e:f9:09:86                                                               |
| name                  |                                                                                 |
| network_id            | 0e4dc72c-343f-49e9-98cc-a77e9311c280                                            |
| security_groups       |                                                                                 |
| status                | ACTIVE                                                                          |
| tenant_id             | 07c3ac8ce96b4e938c4917df5b1f3ce9                                                |
+-----------------------+---------------------------------------------------------------------------------+
[stack@c5220-01 ~]$ neutron port-show 68de1b0f-cddd-46b8-85ac-1b6e05533b88
+-----------------------+---------------------------------------------------------------------------------+
| Field                 | Value                                                                           |
+-----------------------+---------------------------------------------------------------------------------+
| admin_state_up        | True                                                                            |
| allowed_address_pairs |                                                                                 |
| binding:host_id       | overcloud-controller-0.localdomain                                              |
| binding:profile       | {}                                                                              |
| binding:vif_details   | {"port_filter": true, "ovs_hybrid_plug": true}                                  |
| binding:vif_type      | ovs                                                                             |
| binding:vnic_type     | normal                                                                          |
| device_id             | reserved_dhcp_port                                                              |
| device_owner          | network:dhcp                                                                    |
| extra_dhcp_opts       |                                                                                 |
| fixed_ips             | {"subnet_id": "15294705-732c-44c2-a416-5b6bcfbc210f", "ip_address": "10.1.1.2"} |
| id                    | 68de1b0f-cddd-46b8-85ac-1b6e05533b88                                            |
| mac_address           | fa:16:3e:4e:0b:72                                                               |
| name                  |                                                                                 |
| network_id            | 0e4dc72c-343f-49e9-98cc-a77e9311c280                                            |
| security_groups       |                                                                                 |
| status                | ACTIVE                                                                          |
| tenant_id             | 07c3ac8ce96b4e938c4917df5b1f3ce9                                                |
+-----------------------+---------------------------------------------------------------------------------+
[stack@c5220-01 ~]$

on controller-0, instead of 10.1.1.2, it is showing as 10.1.1.3

ip netns show
qrouter-b916dadb-673a-4660-8773-311df803e25a
qrouter-5681857e-d21c-4e7b-8bd0-346984816dfb
qdhcp-5190f71b-d2e7-4815-a7d4-7d7c021bd76d
qdhcp-4a101684-84aa-405a-8559-d200d6a377e4
qdhcp-7356dbf7-f909-4fb5-9c64-dc6ce06c9193
qdhcp-0e4dc72c-343f-49e9-98cc-a77e9311c280
[root@overcloud-controller-0 heat-admin]# ip netns exec qdhcp-0e4dc72c-343f-49e9-98cc-a77e9311c280 ifconfig
ns-025443dd-f7: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.1.1.3  netmask 255.255.255.0  broadcast 10.1.1.255
        inet6 fe80::f816:3eff:fe35:3d15  prefixlen 64  scopeid 0x20<link>
        ether fa:16:3e:35:3d:15  txqueuelen 1000  (Ethernet)
        RX packets 1710  bytes 102750 (100.3 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 11  bytes 774 (774.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@overcloud-controller-0 heat-admin]#

On the other two controller:
[root@overcloud-controller-1 heat-admin]# sudo ip netns show
[root@overcloud-controller-1 heat-admin]#

[root@overcloud-controller-2 heat-admin]# sudo ip netns show
[root@overcloud-controller-2 heat-admin]#



Version-Release number of selected component (if applicable):


How reproducible:
seen once did not attempt to reproduce

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
sosreport from all three controller is at

https://bigswitch.box.com/s/05sukvk0jzi5g5rsroyilpq583uy64m6

Comment 2 bigswitch 2015-09-20 19:14:18 UTC
Change title description, neutron server is not running on two controller nodes. After starting those services, ip netns shows up.
however, the ip is still mess up

on controller0
[root@overcloud-controller-0 heat-admin]# ip netns exec qdhcp-0e4dc72c-343f-49e9-98cc-a77e9311c280 ifconfig
ns-025443dd-f7: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.1.1.3  netmask 255.255.255.0  broadcast 10.1.1.255
        inet6 fe80::f816:3eff:fe35:3d15  prefixlen 64  scopeid 0x20<link>
        ether fa:16:3e:35:3d:15  txqueuelen 1000  (Ethernet)
        RX packets 2370  bytes 142350 (139.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 11  bytes 774 (774.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@overcloud-controller-0 heat-admin]#

on controller1
[root@overcloud-controller-1 heat-admin]#  ip netns exec qdhcp-0e4dc72c-343f-49e9-98cc-a77e9311c280 ifconfig
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ns-064f2a04-a3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.1.1.4  netmask 255.255.255.0  broadcast 10.1.1.255
        inet6 fe80::f816:3eff:fef9:986  prefixlen 64  scopeid 0x20<link>
        ether fa:16:3e:f9:09:86  txqueuelen 1000  (Ethernet)
        RX packets 107  bytes 6588 (6.4 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 107  bytes 4806 (4.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@overcloud-controller-1 heat-admin]#

on controller2
[root@overcloud-controller-2 heat-admin]#  ip netns exec qdhcp-0e4dc72c-343f-49e9-98cc-a77e9311c280 ifconfig
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ns-68de1b0f-cd: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.1.1.2  netmask 255.255.255.0  broadcast 10.1.1.255
        inet6 fe80::f816:3eff:fe4e:b72  prefixlen 64  scopeid 0x20<link>
        ether fa:16:3e:4e:0b:72  txqueuelen 1000  (Ethernet)
        RX packets 106  bytes 6528 (6.3 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 8  bytes 648 (648.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@overcloud-controller-2 heat-admin]#

Comment 3 Mike Orazi 2015-09-24 17:42:54 UTC
What was triggering the reboot of all nodes?

Do we have any status information from pacemaker during this interaction?

Comment 4 Jason Guiditta 2015-09-24 18:16:04 UTC
Some questions.  First, if this is related to the BZ you list above, is it the same setup with a later problem?  If so, it could be that there are additional complications with neutron due to rabbit failure.  Next, how did you restart the services after reboot, 'pcs resource cleanup {name}'? If not, what did you use?  Can you give us the output of 'pcs status' and 'pcs constraint' as well as the crm_report?

Comment 5 bigswitch 2015-09-25 16:34:09 UTC
I attempt to recover the setup by rebooting all three controller node after bugzilla 1264688.
it is the same setup with a later problem. I did a systemctl start neutron-server on the controller nodes.

Comment 6 Mike Burns 2015-09-28 17:18:01 UTC
Re-adding needinfo.  

Can you provide the pcs status and pcs constraint output?

Also, just a note, systemctl should not be used.  You should be using pcs resource cleanup <name> and then pcs resource start <name>

Comment 8 Andrew Beekhof 2015-09-29 22:20:58 UTC
Sep 20 14:17:03 overcloud-controller-1 pengine[3163]: warning: unpack_rsc_op_failure: Processing failed op stop for openstack-cinder-volume on overcloud-controller-2: OCF_TIMEOUT (198)
Sep 20 14:17:03 overcloud-controller-1 pengine[3163]: warning: unpack_rsc_op_failure: Processing failed op stop for openstack-cinder-volume on overcloud-controller-2: OCF_TIMEOUT (198)


Failed stops with no fencing == unsupportable

Comment 9 Andrew Beekhof 2015-09-30 03:00:09 UTC
Whatever else is going on, there are a bunch of missing constraints as covered in bug #1257414

Comment 10 Fabio Massimo Di Nitto 2015-09-30 03:02:16 UTC
(In reply to Andrew Beekhof from comment #9)
> Whatever else is going on, there are a bunch of missing constraints as
> covered in bug #1257414

Andrew, thanks for checking.

The root problems are:

1) missing constraints as described in #1257414
2) missing fencing configuration

#1 specifically will break any stop/restart actions and you can apply those fixes manually as described in the bugzilla, while they will be part of the new OSPd update/release.

Comment 11 bigswitch 2015-09-30 22:40:44 UTC
Did following steps

[heat-admin@overcloud-controller-1 ~]$ sudo pcs resource cleanup
All resources/stonith devices successfully cleaned up

restarted neutron-l3-agent

[heat-admin@overcloud-controller-1 ~]$ sudo systemctl status neutron-l3-agent
neutron-l3-agent.service - Cluster Controlled neutron-l3-agent
   Loaded: loaded (/usr/lib/systemd/system/neutron-l3-agent.service; disabled)
  Drop-In: /run/systemd/system/neutron-l3-agent.service.d
           └─50-pacemaker.conf
   Active: failed (Result: exit-code) since Wed 2015-09-30 18:35:24 EDT; 1min 10s ago
 Main PID: 17916 (code=exited, status=1/FAILURE)

Sep 30 18:30:17 overcloud-controller-1.localdomain systemd[1]: Started Cluster Controlled neutron-l3-agent.
Sep 30 18:35:24 overcloud-controller-1.localdomain systemd[1]: neutron-l3-agent.service: main process exited, code=exited, status=1/FAILURE
Sep 30 18:35:24 overcloud-controller-1.localdomain systemd[1]: Unit neutron-l3-agent.service entered failed state.
[heat-admin@overcloud-controller-1 ~]$ sudo su

[heat-admin@overcloud-controller-1 ~]$ sudo pcs resource show
 ip-172.17.0.11	(ocf::heartbeat:IPaddr2):	Started
 ip-192.0.2.12	(ocf::heartbeat:IPaddr2):	Started
 Clone Set: haproxy-clone [haproxy]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 ip-172.18.0.10	(ocf::heartbeat:IPaddr2):	Started
 ip-10.17.66.11	(ocf::heartbeat:IPaddr2):	Started
 ip-172.17.0.10	(ocf::heartbeat:IPaddr2):	Started
 Master/Slave Set: galera-master [galera]
     Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 ip-172.19.0.10	(ocf::heartbeat:IPaddr2):	Started
 Master/Slave Set: redis-master [redis]
     Masters: [ overcloud-controller-2 ]
     Slaves: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: mongod-clone [mongod]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: memcached-clone [memcached]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler]
     openstack-nova-scheduler	(systemd:openstack-nova-scheduler):	FAILED
     Started: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: neutron-l3-agent-clone [neutron-l3-agent]
     Started: [ overcloud-controller-0 overcloud-controller-2 ]
     Stopped: [ overcloud-controller-1 ]
 Clone Set: openstack-ceilometer-alarm-notifier-clone [openstack-ceilometer-alarm-notifier]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-heat-engine-clone [openstack-heat-engine]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-ceilometer-api-clone [openstack-ceilometer-api]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup]
     Started: [ overcloud-controller-1 overcloud-controller-2 ]
     Stopped: [ overcloud-controller-0 ]
 Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup]
     Started: [ overcloud-controller-1 overcloud-controller-2 ]
     Stopped: [ overcloud-controller-0 ]
 Clone Set: openstack-heat-api-clone [openstack-heat-api]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-nova-api-clone [openstack-nova-api]
     openstack-nova-api	(systemd:openstack-nova-api):	FAILED
     Stopped: [ overcloud-controller-0 overcloud-controller-1 ]
 Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-keystone-clone [openstack-keystone]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth]
     openstack-nova-consoleauth	(systemd:openstack-nova-consoleauth):	FAILED
     openstack-nova-consoleauth	(systemd:openstack-nova-consoleauth):	FAILED
     openstack-nova-consoleauth	(systemd:openstack-nova-consoleauth):	FAILED
 Clone Set: openstack-glance-registry-clone [openstack-glance-registry]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-cinder-api-clone [openstack-cinder-api]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-glance-api-clone [openstack-glance-api]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy]
     openstack-nova-novncproxy	(systemd:openstack-nova-novncproxy):	FAILED
     Stopped: [ overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: delay-clone [delay]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-server-clone [neutron-server]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: httpd-clone [httpd]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-ceilometer-central-clone [openstack-ceilometer-central]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-ceilometer-alarm-evaluator-clone [openstack-ceilometer-alarm-evaluator]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn]
     Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 openstack-cinder-volume	(systemd:openstack-cinder-volume):	Stopped
 Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor]
     openstack-nova-conductor	(systemd:openstack-nova-conductor):	FAILED
     Stopped: [ overcloud-controller-1 overcloud-controller-2 ]


2015-09-30 18:34:23.839 17916 DEBUG oslo_messaging._drivers.amqpdriver [req-102987a4-c9a1-4412-9975-32bf95a20daf ] MSG_ID is 5d8139a41dd740f5b6adbc6084480f2e _send /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:311
2015-09-30 18:34:23.839 17916 DEBUG oslo_messaging._drivers.amqp [req-102987a4-c9a1-4412-9975-32bf95a20daf ] UNIQUE_ID is f9c244d7ae21450d91817cec3cc05b1c. _add_unique_id /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqp.py:258
2015-09-30 18:35:18.646 17916 DEBUG oslo_concurrency.lockutils [-] Lock "_check_child_processes" acquired by "_check_child_processes" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:444
2015-09-30 18:35:18.646 17916 DEBUG oslo_concurrency.lockutils [-] Lock "_check_child_processes" released by "_check_child_processes" :: held 0.001s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:456
2015-09-30 18:35:24.510 17916 CRITICAL neutron [req-102987a4-c9a1-4412-9975-32bf95a20daf ] MessagingTimeout: Timed out waiting for a reply to message ID 5d8139a41dd740f5b6adbc6084480f2e
2015-09-30 18:35:24.510 17916 TRACE neutron Traceback (most recent call last):
2015-09-30 18:35:24.510 17916 TRACE neutron   File "/usr/bin/neutron-l3-agent", line 10, in <module>
2015-09-30 18:35:24.510 17916 TRACE neutron     sys.exit(main())
2015-09-30 18:35:24.510 17916 TRACE neutron   File "/usr/lib/python2.7/site-packages/neutron/cmd/eventlet/agents/l3.py", line 17, in main
2015-09-30 18:35:24.510 17916 TRACE neutron     l3_agent.main()
2015-09-30 18:35:24.510 17916 TRACE neutron   File "/usr/lib/python2.7/site-packages/neutron/agent/l3_agent.py", line 53, in main
2015-09-30 18:35:24.510 17916 TRACE neutron     manager=manager)
2015-09-30 18:35:24.510 17916 TRACE neutron   File "/usr/lib/python2.7/site-packages/neutron/service.py", line 264, in create
2015-09-30 18:35:24.510 17916 TRACE neutron     periodic_fuzzy_delay=periodic_fuzzy_delay)
2015-09-30 18:35:24.510 17916 TRACE neutron   File "/usr/lib/python2.7/site-packages/neutron/service.py", line 197, in __init__
2015-09-30 18:35:24.510 17916 TRACE neutron     self.manager = manager_class(host=host, *args, **kwargs)
2015-09-30 18:35:24.510 17916 TRACE neutron   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 548, in __init__
2015-09-30 18:35:24.510 17916 TRACE neutron     super(L3NATAgentWithStateReport, self).__init__(host=host, conf=conf)
2015-09-30 18:35:24.510 17916 TRACE neutron   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 208, in __init__
2015-09-30 18:35:24.510 17916 TRACE neutron     continue
2015-09-30 18:35:24.510 17916 TRACE neutron   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2015-09-30 18:35:24.510 17916 TRACE neutron     six.reraise(self.type_, self.value, self.tb)
2015-09-30 18:35:24.510 17916 TRACE neutron   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 188, in __init__
2015-09-30 18:35:24.510 17916 TRACE neutron     self.plugin_rpc.get_service_plugin_list(self.context))
2015-09-30 18:35:24.510 17916 TRACE neutron   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 124, in get_service_plugin_list
2015-09-30 18:35:24.510 17916 TRACE neutron     return cctxt.call(context, 'get_service_plugin_list')
2015-09-30 18:35:24.510 17916 TRACE neutron   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 156, in call
2015-09-30 18:35:24.510 17916 TRACE neutron     retry=self.retry)
2015-09-30 18:35:24.510 17916 TRACE neutron   File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in _send
2015-09-30 18:35:24.510 17916 TRACE neutron     timeout=timeout, retry=retry)
2015-09-30 18:35:24.510 17916 TRACE neutron   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 350, in send
2015-09-30 18:35:24.510 17916 TRACE neutron     retry=retry)
2015-09-30 18:35:24.510 17916 TRACE neutron   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 339, in _send
2015-09-30 18:35:24.510 17916 TRACE neutron     result = self._waiter.wait(msg_id, timeout)
2015-09-30 18:35:24.510 17916 TRACE neutron   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 243, in wait
2015-09-30 18:35:24.510 17916 TRACE neutron     message = self.waiters.get(msg_id, timeout=timeout)
2015-09-30 18:35:24.510 17916 TRACE neutron   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 149, in get
2015-09-30 18:35:24.510 17916 TRACE neutron     'to message ID %s' % msg_id)
2015-09-30 18:35:24.510 17916 TRACE neutron MessagingTimeout: Timed out waiting for a reply to message ID 5d8139a41dd740f5b6adbc6084480f2e
2015-09-30 18:35:24.510 17916 TRACE neutron

Comment 12 Fabio Massimo Di Nitto 2015-10-01 03:54:28 UTC
We still need you to apply manually the fixes from bug mentioned in comment #10 and configure fencing.

The message you see is potentially related to the fact that neutron (and other services) are missing a start order dependency on rabbitmq. On restart rabbitmq can be shutdown before neutron has done stopping.

Please apply the fixes, configure fencing, cleanup those failed services.

Comment 13 bigswitch 2015-10-13 18:29:24 UTC
Hi,

Does 7.1 GA code has the fixes? 

Thanks

Comment 14 Assaf Muller 2015-10-13 21:07:58 UTC
Comment 13.

Comment 15 Fabio Massimo Di Nitto 2015-10-14 04:45:45 UTC
(In reply to bigswitch from comment #13)
> Hi,
> 
> Does 7.1 GA code has the fixes? 
> 
> Thanks

you need to have:
  fence-agents-4.0.11-13.el7_1.1 (or greater)
  pacemaker-1.1.12-22.el7_1.4.x86_64 (or greater)
  resource-agents-3.9.5-40.el7_1.5.x86_64 (or greater)

those updates have shipped after 7.1GA but they are available in the normal update channels.

Did you configure fencing and apply the fixes from comment #10?

Comment 16 Mike Burns 2015-10-16 00:33:15 UTC
Just a note, because I suspect that the question was actually OSPd 7.1, not RHEL 7.1

In the OSPd 7.1 image:

pacemaker-1.1.12-22.el7_1.4
resource-agents-3.9.5-40.el7_1.9
fence-agents-4.0.11-13.el7_1.2

This meets or exceeds the minimums in comment 15

Comment 17 Eran Kuris 2015-10-28 09:59:47 UTC
saw the issue in ospd ans osp7 puddle: 2015-10-16.2

Comment 20 Hugh Brock 2016-02-28 07:46:22 UTC
This should be fixed in OSP 7.3 and 8.0. Please re-open if you find the behavior again.

Comment 21 Red Hat Bugzilla 2023-09-14 03:05:35 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.