Bug 2028164 - [FFWD 13->16.1] Can't spawn instance after second controller upgrade. Ovn fails with: "no response to inactivity probe after 60 seconds, disconnecting"
Summary: [FFWD 13->16.1] Can't spawn instance after second controller upgrade. Ovn fai...
Keywords:
Status: CLOSED DUPLICATE of bug 1942449
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Jakub Libosvar
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-01 15:58 UTC by Jose Luis Franco
Modified: 2022-02-15 07:15 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-14 12:56:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-11135 0 None None None 2021-12-01 16:04:29 UTC
Red Hat Issue Tracker UPG-4833 0 None None None 2021-12-01 17:55:16 UTC

Description Jose Luis Franco 2021-12-01 15:58:51 UTC
Description of problem:

All the upgrade jobs from OSP13 to OSP16.1 with OVN configured started failing recently when trying to spawn a VM after the second controller upgrade:

2021-12-01 04:38:50 | Error while executing command: No Network found for internal_net_a4c179d257
2021-12-01 04:38:50 | Creating overcloud instance instance_a4c179d257
2021-12-01 04:38:53 | Invalid nic argument 'net-id'. Nic arguments must be of the form --nic <net-id=net-uuid,v4-fixed-ip=ip-addr,v6-fixed-ip=ip-addr,port-id=port-uuid>.
2021-12-01 04:38:55 | No server with a name or ID of 'instance_a4c179d257' exists.
2021-12-01 04:39:00 | No server with a name or ID of 'instance_a4c179d257' exists.
2021-12-01 04:39:05 | No server with a name or ID of 'instance_a4c179d257' exists.
2021-12-01 04:39:10 | No server with a name or ID of 'instance_a4c179d257' exists.
2021-12-01 04:39:15 | No server with a name or ID of 'instance_a4c179d257' exists.
2021-12-01 04:39:20 | No server with a name or ID of 'instance_a4c179d257' exists.
2021-12-01 04:39:25 | No server with a name or ID of 'instance_a4c179d257' exists.
2021-12-01 04:39:30 | No server with a name or ID of 'instance_a4c179d257' exists.
Log: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-16.1-from-13-latest_cdn-3cont_2comp_3ceph_1ipa-ipv4-ovn_dvr/179/undercloud-0/home/stack/overcloud_upgrade_run-ceph-1,controller-1,ceph-0,controller-0.log.gz

When having a look at the neutron server, we can see several connectivity issues in the same timestamp:

2021-12-01 04:38:18.423 32 DEBUG futurist.periodics [req-382946a2-d2f6-4fea-832c-d96656ae9ca3 - - - - -] Submitting periodic callback 'networking_ovn.common.maintenance.HashRingHealthCheckPeriodics.touch_hash_ring_nodes' _process_scheduled /usr/lib/python3.6/site-packages/futurist/periodics.py:639
2021-12-01 04:38:20.234 28 INFO neutron.wsgi [req-deac2cc3-1a07-4630-b290-faf417309443 90c4b664775a48da9f426459539574d6 fec58c39bb2f4ee3964ce323f6e0efbd - default default] 172.17.1.74,::1 "OPTIONS / HTTP/1.1" status: 200  len: 252 time: 0.0107551
2021-12-01 04:38:20.240 28 INFO neutron.wsgi [req-c38771ce-1542-4cde-9bdf-638348fbb9f7 90c4b664775a48da9f426459539574d6 fec58c39bb2f4ee3964ce323f6e0efbd - default default] 172.17.1.50,::1 "OPTIONS / HTTP/1.1" status: 200  len: 252 time: 0.0051837
2021-12-01 04:38:22.250 26 INFO neutron.wsgi [req-51fb3f0d-583c-44aa-8648-67da47fd18cb 90c4b664775a48da9f426459539574d6 fec58c39bb2f4ee3964ce323f6e0efbd - default default] 172.17.1.50,::1 "OPTIONS / HTTP/1.1" status: 200  len: 252 time: 0.0061150
2021-12-01 04:38:22.253 28 INFO neutron.wsgi [req-6cd4c329-f202-4bf9-a097-6d94b6397704 90c4b664775a48da9f426459539574d6 fec58c39bb2f4ee3964ce323f6e0efbd - default default] 172.17.1.74,::1 "OPTIONS / HTTP/1.1" status: 200  len: 252 time: 0.0099258
2021-12-01 04:38:24.267 29 INFO neutron.wsgi [req-b590641b-52d8-4655-b351-f781ea7da679 f174c325fb04443fb4783ce8c7009f41 8923a209f40e46e7b7bae9b843fa5e29 - default default] 172.17.1.74,::1 "OPTIONS / HTTP/1.1" status: 200  len: 252 time: 0.0088241
2021-12-01 04:38:24.267 28 INFO neutron.wsgi [req-abfa3c15-1ee0-476c-b905-328c85e87d6f 90c4b664775a48da9f426459539574d6 fec58c39bb2f4ee3964ce323f6e0efbd - default default] 172.17.1.50,::1 "OPTIONS / HTTP/1.1" status: 200  len: 252 time: 0.0102952
2021-12-01 04:38:24.597 32 ERROR ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: no response to inactivity probe after 60 seconds, disconnecting
2021-12-01 04:38:24.598 32 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connection dropped
2021-12-01 04:38:24.605 28 ERROR ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: no response to inactivity probe after 60 seconds, disconnecting
2021-12-01 04:38:24.606 28 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connection dropped
2021-12-01 04:38:24.608 30 ERROR ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: no response to inactivity probe after 60 seconds, disconnecting
2021-12-01 04:38:24.608 26 ERROR ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: no response to inactivity probe after 60 seconds, disconnecting
2021-12-01 04:38:24.609 30 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connection dropped
2021-12-01 04:38:24.609 26 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connection dropped
2021-12-01 04:38:24.610 31 ERROR ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: no response to inactivity probe after 60 seconds, disconnecting
2021-12-01 04:38:24.611 31 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connection dropped
2021-12-01 04:38:24.613 31 ERROR ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: no response to inactivity probe after 60 seconds, disconnecting
2021-12-01 04:38:24.613 30 ERROR ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: no response to inactivity probe after 60 seconds, disconnecting
2021-12-01 04:38:24.613 31 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: connection dropped
2021-12-01 04:38:24.614 30 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: connection dropped
2021-12-01 04:38:24.615 29 ERROR ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: no response to inactivity probe after 60 seconds, disconnecting
2021-12-01 04:38:24.616 29 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connection dropped
2021-12-01 04:38:24.616 26 ERROR ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: no response to inactivity probe after 60 seconds, disconnecting
2021-12-01 04:38:24.617 26 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: connection dropped
2021-12-01 04:38:24.618 29 ERROR ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: no response to inactivity probe after 60 seconds, disconnecting
2021-12-01 04:38:24.618 29 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: connection dropped
2021-12-01 04:38:24.621 27 ERROR ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: no response to inactivity probe after 60 seconds, disconnecting
2021-12-01 04:38:24.622 27 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connection dropped
2021-12-01 04:38:24.624 27 ERROR ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: no response to inactivity probe after 60 seconds, disconnecting
2021-12-01 04:38:24.624 27 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: connection dropped
2021-12-01 04:38:24.626 28 ERROR ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: no response to inactivity probe after 60 seconds, disconnecting
2021-12-01 04:38:24.626 28 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: connection dropped
2021-12-01 04:38:24.643 33 ERROR ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: no response to inactivity probe after 60 seconds, disconnecting
2021-12-01 04:38:24.644 33 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connection dropped
2021-12-01 04:38:24.646 33 ERROR ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: no response to inactivity probe after 60 seconds, disconnecting
2021-12-01 04:38:24.647 33 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: connection dropped
2021-12-01 04:38:25.602 32 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connecting...
2021-12-01 04:38:25.609 32 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connected
2021-12-01 04:38:25.611 28 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connecting...
2021-12-01 04:38:25.612 30 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connecting...
2021-12-01 04:38:25.614 26 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connecting...
2021-12-01 04:38:25.614 31 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connecting...
2021-12-01 04:38:25.618 26 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: connecting...
2021-12-01 04:38:25.618 30 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: connecting...
2021-12-01 04:38:25.618 31 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: connecting...
2021-12-01 04:38:25.620 26 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connected
2021-12-01 04:38:25.622 28 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connected
2021-12-01 04:38:25.625 29 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connecting...
2021-12-01 04:38:25.626 29 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: connecting...
2021-12-01 04:38:25.626 27 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connecting...
2021-12-01 04:38:25.628 27 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: connecting...
2021-12-01 04:38:25.629 31 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connected
2021-12-01 04:38:25.629 28 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: connecting...
2021-12-01 04:38:25.631 30 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connected
2021-12-01 04:38:25.632 29 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connected
2021-12-01 04:38:25.632 26 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: connected
2021-12-01 04:38:25.633 29 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] OVSDB transaction returned TRY_AGAIN, retrying do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:93
2021-12-01 04:38:25.633 29 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=2 command(idx=0): LsAddCommand(switch=neutron-76ad301f-c782-48e6-9607-dbe32da25b12, columns={'external_ids': {'neutron:network_name': 'internal_net_a4c179d257', 'neutron:mtu': '1442', 'neutron:revision_number': '1'}, 'other_config': {'mcast_snoop': 'true', 'mcast_flood_unregistered': 'false', 'vlan-passthru': 'false'}}, may_exist=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
2021-12-01 04:38:25.635 31 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: connected
2021-12-01 04:38:25.635 30 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: connected
2021-12-01 04:38:25.636 27 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connected
2021-12-01 04:38:25.639 27 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: connected
2021-12-01 04:38:25.641 29 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: connected
2021-12-01 04:38:25.647 28 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6642: connected
2021-12-01 04:38:25.648 33 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:172.17.1.129:6641: connecting...
2021-12-01 04:38:25.647 29 ERROR ovsdbapp.backend.ovs_idl.transaction [req-1fa52d2f-43bb-4426-a8fc-8c5e9f21cf96 f174c325fb04443fb4783ce8c7009f41 8923a209f40e46e7b7bae9b843fa5e29 - default default] Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 128, in run
    txn.results.put(txn.do_commit())
  File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 123, in do_commit
    self.post_commit(txn)
  File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 70, in post_commit
    command.post_commit(txn)
  File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 90, in post_commit
    row = self.api.tables[self.table_name].rows[real_uuid]
  File "/usr/lib64/python3.6/collections/__init__.py", line 991, in __getitem__
    raise KeyError(key)
KeyError: UUID('efd81156-c1fb-4f53-ac28-0c80927a59be')

Logs: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-upgrades-ffu-16.1-from-13-latest_cdn-3cont_2comp_3ceph_1ipa-ipv4-ovn_dvr/179/controller-0/var/log/containers/neutron/server.log.gz

No changes were done to the test automation, just the source RHEL version changed.

 
Version-Release number of selected component (if applicable):

OSP13:
- imagename: registry-proxy.engineering.redhat.com/rh-osbs/rhosp13-openstack-ovn-controller:20210908.1
- imagename: registry-proxy.engineering.redhat.com/rh-osbs/rhosp13-openstack-ovn-nb-db-server:20210908.1
- imagename: registry-proxy.engineering.redhat.com/rh-osbs/rhosp13-openstack-ovn-northd:20210908.1
- imagename: registry-proxy.engineering.redhat.com/rh-osbs/rhosp13-openstack-ovn-sb-db-server:20210908.1

OSP16.1:

undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ovn-northd:16.1_20211111.1 
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-ovn-controller:16.1_20211111.1
undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-neutron-server-ovn:16.1_20211111.1


How reproducible:

Re-run CI job: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/upgrades/view/ffu/job/DFG-upgrades-ffu-16.1-from-13-latest_cdn-3cont_2comp_3ceph_1ipa-ipv4-ovn_dvr/

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Flavio Leitner 2021-12-01 17:58:07 UTC
Hi,

Could you please attach OVS and OVSDB relevant logs for us to review?
Thanks,
fbl

Comment 8 Sergii Golovatiuk 2021-12-14 12:56:20 UTC

*** This bug has been marked as a duplicate of bug 1942449 ***


Note You need to log in before you can comment on or make changes to this bug.