Bug 1720947

Summary: osp15 After controller restart ovn-controller containers are in Dead state
Product: Red Hat Enterprise Linux Fast Datapath Reporter: pkomarov
Component: openvswitch2.11Assignee: Numan Siddique <nusiddiq>
Status: CLOSED CURRENTRELEASE QA Contact: haidong li <haili>
Severity: urgent Docs Contact:
Priority: urgent    
Version: FDP 19.FCC: afariasa, apevec, astupnik, atragler, camorris, chrisw, ctrautma, dalvarez, ekuris, fhallal, jhsiao, jishi, jjoyce, jlibosva, jschluet, ldenny, lhh, liali, lmartins, lmiccini, ltoscano, majopela, michele, nusiddiq, qding, ralongi, rhos-maint, rsafrono, sasha, scohen, slinaber, tfreger, tredaelli, tvignaud, twilson
Target Milestone: ---Keywords: AutomationBlocker, Regression, Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-tripleo-10.5.1-0.20190812120435.ed6c6b0.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1746120 1746200 1757254 (view as bug list) Environment:
Last Closed: 2021-03-25 14:36:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1726217, 1731269, 1740114, 1740115, 1757254    

Description pkomarov 2019-06-16 18:21:06 UTC
Description of problem:
 After controller restart ovn-controller containers are in Dead state 

Version-Release number of selected component (if applicable):
RHOS_TRUNK-15.0-RHEL-8-20190523.n.1

How reproducible:
rerun : 
https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/pidone/view/sanity/job/DFG-pidone-sanity-15_director-rhel-virthost-3cont_2comp_3ceph-ipv4-geneve-sanity/

Steps to Reproduce:
1.deploy osp15
2.restart a controller
3.notice network agents are in dead state

Actual results:
(overcloud) [stack@undercloud-0 ~]$ openstack network agent list 
+--------------------------------------+----------------------+--------------------------+-------------------+-------+-------+-------------------------------+
| ID                                   | Agent Type           | Host                     | Availability Zone | Alive | State | Binary                        |
+--------------------------------------+----------------------+--------------------------+-------------------+-------+-------+-------------------------------+
| a70204c4-c5a6-4fa6-8927-39b5cb4392e1 | OVN Controller agent | controller-0.localdomain | n/a               | XXX   | UP    | ovn-controller                |
| 6a19511d-8bb2-4485-b338-41165481ebac | OVN Controller agent | compute-0.localdomain    | n/a               | XXX   | UP    | ovn-controller                |
| 51d095a9-3e19-4f04-bcbd-70e74f7e302e | OVN Metadata agent   | compute-0.localdomain    | n/a               | :-)   | UP    | networking-ovn-metadata-agent |
| 7ac54a99-2a0a-44a1-9a77-e48e8c81de87 | OVN Controller agent | controller-1.localdomain | n/a               | XXX   | UP    | ovn-controller                |
| f41e8e6a-e6a2-4a36-8736-32080a6cc8fd | OVN Controller agent | compute-1.localdomain    | n/a               | XXX   | UP    | ovn-controller                |
| e9bdb550-4cc6-4ab5-bfc9-55d68d0caa30 | OVN Metadata agent   | compute-1.localdomain    | n/a               | :-)   | UP    | networking-ovn-metadata-agent |
| 0018c150-8e34-4806-b795-d772a6bbac52 | OVN Controller agent | controller-2.localdomain | n/a               | XXX   | UP    | ovn-controller                |
+--------------------------------------+----------------------+--------------------------+-------------------+-------+-------+-------------------------------+


Expected results:


Additional info:

Comment 1 pkomarov 2019-06-16 18:49:45 UTC
sosreports and stack home are at : http://rhos-release.virt.bos.redhat.com/log/pkomarov_sosreports/BZ_1720947/

Comment 2 pkomarov 2019-06-16 18:55:31 UTC
I'm seeing an ovsdbapp.backend error : 

server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event [req-cc22c728-3c61-4d2e-97a5-a945e5d04759 - - - - -] Unexpected exception in notify_loop: RuntimeError: OVSDB Error: The transaction failed because the IDL has been configured to require a database lock but didn't get it yet or has already lost it
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event Traceback (most recent call last):
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 104, in transaction
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     yield self._nested_txns_map[cur_thread_id]
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event KeyError: 139842817010120
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event 
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event During handling of the above exception, another exception occurred:
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event 
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event Traceback (most recent call last):
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/ovsdbapp/event.py", line 137, in notify_loop
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     match.run(event, row, updates)
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/networking_ovn/ovsdb/ovsdb_monitor.py", line 183, in run
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     self.driver.set_port_status_up(row.name)
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/networking_ovn/ml2/mech_driver.py", line 756, in set_port_status_up
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     self._update_dnat_entry_if_needed(port_id)
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/networking_ovn/ml2/mech_driver.py", line 744, in _update_dnat_entry_if_needed
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     ('external_mac', mac)).execute(check_error=True)
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 40, in execute
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     txn.add(self)
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     next(self.gen)
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/networking_ovn/ovsdb/impl_idl_ovn.py", line 183, in transaction
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     yield t
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     next(self.gen)
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 112, in transaction
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     del self._nested_txns_map[cur_thread_id]
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 69, in __exit__
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     self.result = self.commit()
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 62, in commit
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     raise result.ex
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 122, in run
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     txn.results.put(txn.do_commit())
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 115, in do_commit
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event     raise RuntimeError(msg)
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event RuntimeError: OVSDB Error: The transaction failed because the IDL has been configured to require a database lock but didn't get it yet or has already lost it
server.log.11.gz:2019-06-15 20:27:12.946 76 ERROR ovsdbapp.event 
server.log.11.gz:2019-06-15 20:27:26.703 76 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: The transaction failed because the IDL has been configured to require a database lock but didn't get it yet or has already lost it
server.log.11.gz:2019-06-15 20:27:26.704 76 ERROR ovsdbapp.backend.ovs_idl.transaction [req-b1096a70-2581-4e5b-a879-02f0a662240e - - - - -] Traceback (most recent call last):

Comment 3 pkomarov 2019-06-16 18:56:46 UTC
a workaround is : 
podman restart ovn_controller
then the container gets up with a healthy state

Comment 5 Lucas Alvares Gomes 2019-06-25 12:10:26 UTC
*** Bug 1721560 has been marked as a duplicate of this bug. ***

Comment 12 Jakub Libosvar 2019-07-25 11:31:40 UTC
*** Bug 1732070 has been marked as a duplicate of this bug. ***

Comment 33 Jakub Libosvar 2019-09-17 14:26:43 UTC
*** Bug 1714949 has been marked as a duplicate of this bug. ***

Comment 40 Jakub Libosvar 2020-02-06 14:28:22 UTC
*** Bug 1726217 has been marked as a duplicate of this bug. ***