Hide Forgot
Description of problem: After some time and unknown trigger, routers namespace do not remove old floating ip's and newly attached floating ip's are not added to the namespace. As a result floating ip's for vms are not pingable. Version-Release number of selected component (if applicable): openstack-neutron-7.0.4-3.el7ost.noarch How reproducible: unknown Steps to Reproduce: 1.unknown 2. 3. Actual results: floating ip's not pingable Expected results: floaters pingable Additional info: stack@osstage-undercloud ~]$ nova list --all |grep cirros5 | 35381639-fd44-4767-a2d2-13426ea1dcac | cirros5 | af273396f30b45cdae05873f90867ceb | ACTIVE | - | Running | sr-kanata_net=192.168.110.33, 135.121.90.212 | 114a72fe-02ae-4b87-a2e7-70f962df0951 | sr-kanata_router | {"network_id": "ce4b20aa-5c61-49d0-80ab-03697c818822", "enable_snat": true, "external_fixed_ips": [{"subnet_id": "1ff39be2-ab26-4a17-8598-26f632dc6caf", "ip_address": "135.121.90.221"}]} | False | True | ### namespace isn't clearing out old floating ip's and new ip's are not added. [heat-admin@overcloud-controller-1 ~]$ sudo ip netns exec qrouter-114a72fe-02ae-4b87-a2e7-70f962df0951 ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 55: ha-086c409c-62: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue state UNKNOWN link/ether fa:16:3e:fd:19:b4 brd ff:ff:ff:ff:ff:ff inet 169.254.192.2/18 brd 169.254.255.255 scope global ha-086c409c-62 valid_lft forever preferred_lft forever inet 169.254.0.1/24 scope global ha-086c409c-62 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fefd:19b4/64 scope link valid_lft forever preferred_lft forever 56: qg-63602c60-a2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN link/ether fa:16:3e:d7:53:b0 brd ff:ff:ff:ff:ff:ff inet 135.121.90.221/23 scope global qg-63602c60-a2 valid_lft forever preferred_lft forever inet 135.121.90.241/32 scope global qg-63602c60-a2 valid_lft forever preferred_lft forever inet 135.121.90.242/32 scope global qg-63602c60-a2 valid_lft forever preferred_lft forever inet 135.121.90.207/32 scope global qg-63602c60-a2 valid_lft forever preferred_lft forever inet 135.121.90.187/32 scope global qg-63602c60-a2 valid_lft forever preferred_lft forever inet 135.121.90.190/32 brd 135.121.90.190 scope global qg-63602c60-a2 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fed7:53b0/64 scope link nodad valid_lft forever preferred_lft forever 58: qr-e63406e1-e7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue state UNKNOWN link/ether fa:16:3e:72:4e:a5 brd ff:ff:ff:ff:ff:ff inet 192.168.110.1/24 scope global qr-e63406e1-e7 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe72:4ea5/64 scope link nodad valid_lft forever preferred_lft forever [heat-admin@overcloud-controller-1 ~]$ ###service restart clears the issue ##namespace after l3-agent and neutron-server restart [heat-admin@overcloud-controller-1 ~]$ sudo ip netns exec qrouter-114a72fe-02ae-4b87-a2e7-70f962df0951 ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 55: ha-086c409c-62: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue state UNKNOWN link/ether fa:16:3e:fd:19:b4 brd ff:ff:ff:ff:ff:ff inet 169.254.192.2/18 brd 169.254.255.255 scope global ha-086c409c-62 valid_lft forever preferred_lft forever inet 169.254.0.1/24 scope global ha-086c409c-62 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fefd:19b4/64 scope link valid_lft forever preferred_lft forever 56: qg-63602c60-a2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN link/ether fa:16:3e:d7:53:b0 brd ff:ff:ff:ff:ff:ff inet 135.121.90.221/23 scope global qg-63602c60-a2 valid_lft forever preferred_lft forever inet 135.121.90.241/32 scope global qg-63602c60-a2 valid_lft forever preferred_lft forever inet 135.121.90.242/32 scope global qg-63602c60-a2 valid_lft forever preferred_lft forever inet 135.121.90.207/32 scope global qg-63602c60-a2 valid_lft forever preferred_lft forever inet 135.121.90.187/32 scope global qg-63602c60-a2 valid_lft forever preferred_lft forever inet 135.121.90.190/32 brd 135.121.90.190 scope global qg-63602c60-a2 valid_lft forever preferred_lft forever inet 135.121.90.213/32 scope global qg-63602c60-a2 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fed7:53b0/64 scope link nodad valid_lft forever preferred_lft forever 58: qr-e63406e1-e7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8900 qdisc noqueue state UNKNOWN link/ether fa:16:3e:72:4e:a5 brd ff:ff:ff:ff:ff:ff inet 192.168.110.1/24 scope global qr-e63406e1-e7 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe72:4ea5/64 scope link nodad valid_lft forever preferred_lft forever [heat-admin@overcloud-controller-1 ~]$ ^C [heat-admin@overcloud-controller-1 ~]$ /var/log/neutron/l3-agent.log" 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent [-] Failed to process compatible router '114a72fe-02ae-4b87-a2e7-70f962df0951' 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent Traceback (most recent call last): 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 504, in _process_router_update 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent self._process_router_if_compatible(router) 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 443, in _process_router_if_compatible 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent self._process_updated_router(router) 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 457, in _process_updated_router 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent ri.process(self) 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 376, in process 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent super(HaRouter, self).process(agent) 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 359, in call 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent self.logger(e) 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 204, in __exit__ 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent six.reraise(self.type_, self.value, self.tb) 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 356, in call 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent return func(*args, **kwargs) 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 702, in process 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent self._process_internal_ports(agent.pd) 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 390, in _process_internal_ports 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent self.internal_network_added(p) 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 274, in internal_network_added 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent self._disable_ipv6_addressing_on_interface(interface_name) 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 234, in _disable_ipv6_addressing_on_interface 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent if self._should_delete_ipv6_lladdr(ipv6_lladdr): 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 216, in _should_delete_ipv6_lladdr 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent if manager.get_process().active: 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent AttributeError: 'NoneType' object has no attribute 'get_process' 2016-11-08 16:20:19.530 13191 ERROR neutron.agent.l3.agent
Hello, Enabled debug logs; reproduced and see mysql deadlocks? 2016-11-09 19:15:43.968 17574 ERROR neutron.agent.l3.router_info [-] Remote error: DBDeadlock (pymysql.err.InternalError) (1213, u'Deadlock found when trying to get lock; try restarting transaction') [SQL: u'UPDATE floatingips SET status=%s WHERE floatingips.id = %s'] [parameters: (u'DOWN', u'f73c6ee8-1c43-483b-9b4c-ea9087e87db9')] 2016-11-09 19:15:43.880 17574 DEBUG neutron.agent.l3.agent [-] Sending floating ip statuses: {u'57538b3a-48a5-497e-94e6-fc36ee31720d': 'DOWN', u'f2d3d46c-faee-460c-b858-eaf23253324e': 'DOWN', u'74608000-7e41-4a3f-8b06-91cdb54a6e37': 'DOWN', u'f73c6ee8-1c43-483b-9b4c-ea9087e87db9': 'DOWN'} update_fip_statuses /usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py:377 2016-11-09 19:15:43.880 17574 DEBUG oslo_messaging._drivers.amqpdriver [-] MSG_ID is cbb6228848dc43d79e54c377753e5ef3 _send /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:392 2016-11-09 19:15:43.968 17574 ERROR neutron.agent.l3.router_info [-] Remote error: DBDeadlock (pymysql.err.InternalError) (1213, u'Deadlock found when trying to get lock; try restarting transaction') [SQL: u'UPDATE floatingips SET status=%s WHERE floatingips.id = %s'] [parameters: (u'DOWN', u'f73c6ee8-1c43-483b-9b4c-ea9087e87db9')] [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply\n executor_callback))\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch\n executor_callback)\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 129, in _do_dispatch\n result = func(ctxt, **new_args)\n', u' File "/usr/lib/python2.7/site-packages/neutron/api/rpc/handlers/l3_rpc.py", line 219, in update_floatingip_statuses\n status)\n', u' File "/usr/lib/python2.7/site-packages/neutron/db/l3_db.py", line 1045, in update_floatingip_status\n fip_query.update({\'status\': status}, synchronize_session=False)\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2950, in update\n update_op.exec_()\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1106, in exec_\n self._do_exec()\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1255, in _do_exec\n mapper=self.mapper)\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1023, in execute\n bind, close_with_result=True).execute(clause, params or {})\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute\n return meth(self, multiparams, params)\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection\n return connection._execute_clauseelement(self, multiparams, params)\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement\n compiled_sql, distilled_params\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context\n context)\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1337, in _handle_dbapi_exception\n util.raise_from_cause(newraise, exc_info)\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause\n reraise(type(exception), exception, tb=exc_tb)\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context\n context)\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute\n cursor.execute(statement, parameters)\n', u' File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 146, in execute\n result = self._query(query)\n', u' File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 296, in _query\n conn.query(q)\n', u' File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 781, in query\n self._affected_rows = self._read_query_result(unbuffered=unbuffered)\n', u' File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 942, in _read_query_result\n result.read()\n', u' File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1138, in read\n first_packet = self.connection._read_packet()\n', u' File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 906, in _read_packet\n packet.check_error()\n', u' File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 367, in check_error\n err.raise_mysql_exception(self._data)\n', u' File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 120, in raise_mysql_exception\n _check_mysql_exception(errinfo)\n', u' File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 115, in _check_mysql_exception\n raise InternalError(errno, errorvalue)\n', u"DBDeadlock: (pymysql.err.InternalError) (1213, u'Deadlock found when trying to get lock; try restarting transaction') [SQL: u'UPDATE floatingips SET status=%s WHERE floatingips.id = %s'] [parameters: (u'DOWN', u'f73c6ee8-1c43-483b-9b4c-ea9087e87db9')]\n"]. 2016-11-09 19:15:43.880 17574 DEBUG neutron.agent.l3.agent [-] Sending floating ip statuses: {u'57538b3a-48a5-497e-94e6-fc36ee31720d': 'DOWN', u'f2d3d46c-faee-460c-b858-eaf23253324e': 'DOWN', u'74608000-7e41-4a3f-8b06-91cdb54a6e37': 'DOWN', u'f73c6ee8-1c43-483b-9b4c-ea9087e87db9': 'DOWN'} update_fip_statuses /usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py:377 2016-11-09 19:15:43.880 17574 DEBUG oslo_messaging._drivers.amqpdriver [-] MSG_ID is cbb6228848dc43d79e54c377753e5ef3 _send /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:392 2016-11-09 19:15:43.968 17574 ERROR neutron.agent.l3.router_info [-] Remote error: DBDeadlock (pymysql.err.InternalError) (1213, u'Deadlock found when trying to get lock; try restarting transaction') [SQL: u'UPDATE floatingips SET status=%s WHERE floatingips.id = %s'] [parameters: (u'DOWN', u'f73c6ee8-1c43-483b-9b4c-ea9087e87db9')] [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply\n executor_callback))\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch\n executor_callback)\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 129, in _do_dispatch\n result = func(ctxt, **new_args)\n', u' File "/usr/lib/python2.7/site-packages/neutron/api/rpc/handlers/l3_rpc.py", line 219, in update_floatingip_statuses\n status)\n', u' File "/usr/lib/python2.7/site-packages/neutron/db/l3_db.py", line 1045, in update_floatingip_status\n fip_query.update({\'status\': status}, synchronize_session=False)\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2950, in update\n update_op.exec_()\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1106, in exec_\n self._do_exec()\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 1255, in _do_exec\n mapper=self.mapper)\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1023, in execute\n bind, close_with_result=True).execute(clause, params or {})\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute\n return meth(self, multiparams, params)\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection\n return connection._execute_clauseelement(self, multiparams, params)\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement\n compiled_sql, distilled_params\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context\n context)\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1337, in _handle_dbapi_exception\n util.raise_from_cause(newraise, exc_info)\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause\n reraise(type(exception), exception, tb=exc_tb)\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context\n context)\n', u' File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute\n cursor.execute(statement, parameters)\n', u' File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 146, in execute\n result = self._query(query)\n', u' File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 296, in _query\n conn.query(q)\n', u' File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 781, in query\n self._affected_rows = self._read_query_result(unbuffered=unbuffered)\n', u' File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 942, in _read_query_result\n result.read()\n', u' File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1138, in read\n first_packet = self.connection._read_packet()\n', u' File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 906, in _read_packet\n packet.check_error()\n', u' File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 367, in check_error\n err.raise_mysql_exception(self._data)\n', u' File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 120, in raise_mysql_exception\n _check_mysql_exception(errinfo)\n', u' File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 115, in _check_mysql_exception\n raise InternalError(errno, errorvalue)\n', u"DBDeadlock: (pymysql.err.InternalError) (1213, u'Deadlock found when trying to get lock; try restarting transaction') [SQL: u'UPDATE floatingips SET status=%s WHERE floatingips.id = %s'] [parameters: (u'DOWN', u'f73c6ee8-1c43-483b-9b4c-ea9087e87db9')]\n"].
From a quick glance, this sounds like some kind of a race condition in the neutron-server causing incomplete data to be sent to the agents. This causes the agents to be put into endless loops and not being able to do stuff (like add new fips). I'll look into the sosreport and report back with more findings.
It would seem that, for example for router_id=1e52bb8d-39d6-4057-9f4e-8d95a5734562, partial sync occurred in a way that the l3-agent over at overcloud-controller-1 received an HA router with no ha interface for that agent. In short, there was a race condition somewhere and the agent received partial data. In such cases, the agent enters an endless loop trying to recover, but the server never does. There are few patches that should deal with this error. I'm gonna hunt them down and make sure they are in rhos-8, and if they already are then in what version.
Hi, These bugs have been fixed in openstack-neutron-7.1.1-4 (as part of bz#1360352). Can we upgrade them to a newer version and see if this solves their issue? John.
Please see comment 5.
@Jeremy can you please check the path of keepalived? is it /usr/sbin/keepalived ?
yes [root@overcloud-controller-0 ~]# ls -l /usr/sbin/keepalived -rwxr-xr-x. 1 root root 317992 Jul 1 2016 /usr/sbin/keepalived [root@overcloud-controller-1 ~]# ls -l /usr/sbin/keepalived -rwxr-xr-x. 1 root root 317992 Jul 1 2016 /usr/sbin/keepalived [root@overcloud-controller-2 ~]# ls -l /usr/sbin/keepalived -rwxr-xr-x. 1 root root 317992 Jul 1 2016 /usr/sbin/keepalived
Customer just saw a recurrence of this bug in the wild. It looks like it was triggered by clustered services restarting rabbitmq: Feb 6 18:33:41 overcloud-controller-1 lrmd[13038]: warning: rabbitmq_monitor_10000 process (PID 23576) timed out Feb 6 18:33:41 overcloud-controller-1 lrmd[13038]: warning: rabbitmq_monitor_10000:23576 - timed out after 40000ms Feb 6 18:33:41 overcloud-controller-1 crmd[13041]: error: Operation rabbitmq_monitor_10000: Timed Out (node=overcloud-controller-1, call=2506, timeout=40000ms) Feb 6 18:33:41 overcloud-controller-1 crmd[13041]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: warning: Processing failed op monitor for rabbitmq:2 on overcloud-controller-1: unknown error (1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:1 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:2 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:1 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:2 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:1 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:2 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:1 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:2 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:1 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:2 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:1 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:2 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:1 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:2 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:1 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:2 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:1 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:2 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:1 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:2 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:1 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:2 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:1 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: error: Internal error: No action found for demote in redis:2 (then) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Recover rabbitmq:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-nova-scheduler:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart neutron-l3-agent:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-ceilometer-alarm-notifier:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-heat-engine:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-ceilometer-api:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart neutron-metadata-agent:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-heat-api:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-cinder-scheduler:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-nova-api:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-heat-api-cloudwatch:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-ceilometer-collector:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-keystone:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-nova-consoleauth:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-glance-registry:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-ceilometer-notification:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-cinder-api:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart neutron-dhcp-agent:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-glance-api:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart neutron-openvswitch-agent:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-nova-novncproxy:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart delay:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart neutron-server:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-ceilometer-central:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-ceilometer-alarm-evaluator:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-heat-api-cfn:2#011(Started overcloud-controller-1) Feb 6 18:33:42 overcloud-controller-1 pengine[13040]: notice: Restart openstack-nova-conductor:2#011(Started overcloud-controller-1) The current error trace for the l3-agent is 2017-02-06 18:34:18.539 26120 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qrouter-114a72fe-02ae-4b87-a2e7-70f962df0951', 'ip', '-o', 'link', 'show', 'qr-e6 3406e1-e7'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:101 2017-02-06 18:34:18.544 26120 DEBUG neutron.agent.linux.utils [-] Command: ['ip', 'netns', 'exec', u'qrouter-114a72fe-02ae-4b87-a2e7-70f962df0951', 'ip', '-o', 'link', 'show', u'qr-e63406e1-e7'] Exit code: 0 execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:156 2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info [-] 'NoneType' object has no attribute 'get_process' 2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info Traceback (most recent call last): 2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 359, in call 2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info return func(*args, **kwargs) 2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 744, in process 2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info self._process_internal_ports(agent.pd) 2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 394, in _process_internal_ports 2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info self.internal_network_added(p) 2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 275, in internal_network_added 2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info self._disable_ipv6_addressing_on_interface(interface_name) 2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 235, in _disable_ipv6_addressing_on_interface 2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info if self._should_delete_ipv6_lladdr(ipv6_lladdr): 2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 217, in _should_delete_ipv6_lladdr 2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info if manager.get_process().active: 2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info AttributeError: 'NoneType' object has no attribute 'get_process' 2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent [-] Failed to process compatible router '114a72fe-02ae-4b87-a2e7-70f962df0951' 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent Traceback (most recent call last): 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 506, in _process_router_update 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self._process_router_if_compatible(router) 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 445, in _process_router_if_compatible 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self._process_updated_router(router) 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 459, in _process_updated_router 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent ri.process(self) 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 377, in process 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent super(HaRouter, self).process(agent) 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 362, in call 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self.logger(e) 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 204, in __exit__ 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent six.reraise(self.type_, self.value, self.tb) 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 359, in call 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent return func(*args, **kwargs) 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 744, in process 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self._process_internal_ports(agent.pd) 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 394, in _process_internal_ports 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self.internal_network_added(p) 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 275, in internal_network_added 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self._disable_ipv6_addressing_on_interface(interface_name) 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 235, in _disable_ipv6_addressing_on_interface 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent if self._should_delete_ipv6_lladdr(ipv6_lladdr): 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 217, in _should_delete_ipv6_lladdr 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent if manager.get_process().active: 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent AttributeError: 'NoneType' object has no attribute 'get_process' 2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent 2017-02-06 18:34:18.565 26120 DEBUG neutron.agent.l3.agent [-] Starting router update for 9a54a769-4a0e-487b-9eb1-3d9a1923f3fb, action None, priority 1 _process_router_update /usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py:472
Customer ran pcs resource restart neutron-server; this did not solve the problem. I then ran pcs resource restart neutron-netns-cleanup (which also restarts neutron-ovs-cleanup); now the only neutron processes running from 2016 are root 7125 1 0 2016 ? 00:00:00 sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf root 7126 7125 0 2016 ? 00:00:00 /usr/bin/python2 /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf root 24727 1 0 2016 ? 00:00:00 sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf root 24729 24727 0 2016 ? 00:00:00 /usr/bin/python2 /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf This appears to have resolved our issues.
I will report this bug in upstream and provide the fix. Meanwhile can you please provide complete logs(ovs agent, l3 agent and server logs)?
u/s change https://review.openstack.org/#/c/431026/ submitted.
How should I verify this bz?
Is it solved on the customer site?
Customer is still waiting for a fix
Build openstack-neutron-7.2.0-12.el7ost has the fix.
Ok thanks, when that package is shipped I will notify the customer to apply the fix.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1742