Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1720947

Summary: osp15 After controller restart ovn-controller containers are in Dead state
Product: Red Hat Enterprise Linux Fast Datapath Reporter: pkomarov
Component: openvswitch2.11Assignee: Numan Siddique <nusiddiq>
Status: CLOSED CURRENTRELEASE QA Contact: haidong li <haili>
Severity: urgent Docs Contact:
Priority: urgent    
Version: FDP 19.FCC: afariasa, apevec, astupnik, atragler, camorris, chrisw, ctrautma, dalvarez, ekuris, fhallal, jhsiao, jishi, jjoyce, jlibosva, jschluet, ldenny, lhh, liali, lmartins, lmiccini, ltoscano, majopela, michele, nusiddiq, qding, ralongi, rhos-maint, rsafrono, sasha, scohen, slinaber, tfreger, tredaelli, tvignaud, twilson
Target Milestone: ---Keywords: AutomationBlocker, Regression, Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-tripleo-10.5.1-0.20190812120435.ed6c6b0.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1746120 1746200 1757254 (view as bug list) Environment:
Last Closed: 2021-03-25 14:36:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1726217, 1731269, 1740114, 1740115, 1757254    

Description pkomarov 2019-06-16 18:21:06 UTC
Description of problem:
 After controller restart ovn-controller containers are in Dead state 

Version-Release number of selected component (if applicable):
RHOS_TRUNK-15.0-RHEL-8-20190523.n.1

How reproducible:
rerun : 
https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/pidone/view/sanity/job/DFG-pidone-sanity-15_director-rhel-virthost-3cont_2comp_3ceph-ipv4-geneve-sanity/

Steps to Reproduce:
1.deploy osp15
2.restart a controller
3.notice network agents are in dead state

Actual results:
(overcloud) [stack@undercloud-0 ~]$ openstack network agent list 
+--------------------------------------+----------------------+--------------------------+-------------------+-------+-------+-------------------------------+
| ID                                   | Agent Type           | Host                     | Availability Zone | Alive | State | Binary                        |
+--------------------------------------+----------------------+--------------------------+-------------------+-------+-------+-------------------------------+
| a70204c4-c5a6-4fa6-8927-39b5cb4392e1 | OVN Controller agent | controller-0.localdomain | n/a               | XXX   | UP    | ovn-controller                |
| 6a19511d-8bb2-4485-b338-41165481ebac | OVN Controller agent | compute-0.localdomain    | n/a               | XXX   | UP    | ovn-controller                |
| 51d095a9-3e19-4f04-bcbd-70e74f7e302e | OVN Metadata agent   | compute-0.localdomain    | n/a               | :-)   | UP    | networking-ovn-metadata-agent |
| 7ac54a99-2a0a-44a1-9a77-e48e8c81de87 | OVN Controller agent | controller-1.localdomain | n/a               | XXX   | UP    | ovn-controller                |
| f41e8e6a-e6a2-4a36-8736-32080a6cc8fd | OVN Controller agent | compute-1.localdomain    | n/a               | XXX   | UP    | ovn-controller                |
| e9bdb550-4cc6-4ab5-bfc9-55d68d0caa30 | OVN Metadata agent   | compute-1.localdomain    | n/a               | :-)   | UP    | networking-ovn-metadata-agent |
| 0018c150-8e34-4806-b795-d772a6bbac52 | OVN Controller agent | controller-2.localdomain | n/a               | XXX   | UP    | ovn-controller                |
+--------------------------------------+----------------------+--------------------------+-------------------+-------+-------+-------------------------------+


Expected results:


Additional info:

Comment 1 pkomarov 2019-06-16 18:49:45 UTC
sosreports and stack home are at : http://rhos-release.virt.bos.redhat.com/log/pkomarov_sosreports/BZ_1720947/

Comment 2 pkomarov 2019-06-16 18:55:31 UTC
I'm seeing an ovsdbapp.backend error : 

server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event [req-cc22c728-3c61-4d2e-97a5-a945e5d04759 - - - - -] Unexpected exception in notify_loop: RuntimeError: OVSDB Error: The transaction failed because the IDL has been configured to require a database lock but didn't get it yet or has already lost it
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event Traceback (most recent call last):
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 104, in transaction
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     yield self._nested_txns_map[cur_thread_id]
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event KeyError: 139842817010120
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event 
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event During handling of the above exception, another exception occurred:
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event 
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event Traceback (most recent call last):
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/ovsdbapp/event.py", line 137, in notify_loop
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     match.run(event, row, updates)
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/networking_ovn/ovsdb/ovsdb_monitor.py", line 183, in run
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     self.driver.set_port_status_up(row.name)
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/networking_ovn/ml2/mech_driver.py", line 756, in set_port_status_up
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     self._update_dnat_entry_if_needed(port_id)
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/networking_ovn/ml2/mech_driver.py", line 744, in _update_dnat_entry_if_needed
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     ('external_mac', mac)).execute(check_error=True)
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 40, in execute
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     txn.add(self)
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     next(self.gen)
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/networking_ovn/ovsdb/impl_idl_ovn.py", line 183, in transaction
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     yield t
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     next(self.gen)
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 112, in transaction
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     del self._nested_txns_map[cur_thread_id]
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 69, in __exit__
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     self.result = self.commit()
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 62, in commit
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     raise result.ex
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 122, in run
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     txn.results.put(txn.do_commit())
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 115, in do_commit
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     raise RuntimeError(msg)
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event RuntimeError: OVSDB Error: The transaction failed because the IDL has been configured to require a database lock but didn't get it yet or has already lost it
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event 
server.log.11.gz:2019-06-15 20:27:26.703 76 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: The transaction failed because the IDL has been configured to require a database lock but didn't get it yet or has already lost it
server.log.11.gz:2019-06-15 20:27:26.704 76 ERROR ovsdbapp.backend.ovs_idl.transaction [req-b1096a70-2581-4e5b-a879-02f0a662240e - - - - -] Traceback (most recent call last):

Comment 3 pkomarov 2019-06-16 18:56:46 UTC
a workaround is : 
podman restart ovn_controller
then the container gets up with a healthy state

Comment 5 Lucas Alvares Gomes 2019-06-25 12:10:26 UTC
*** Bug 1721560 has been marked as a duplicate of this bug. ***

Comment 12 Jakub Libosvar 2019-07-25 11:31:40 UTC
*** Bug 1732070 has been marked as a duplicate of this bug. ***

Comment 33 Jakub Libosvar 2019-09-17 14:26:43 UTC
*** Bug 1714949 has been marked as a duplicate of this bug. ***

Comment 40 Jakub Libosvar 2020-02-06 14:28:22 UTC
*** Bug 1726217 has been marked as a duplicate of this bug. ***