Bug 1865781 - ovn-controller restart
Summary: ovn-controller restart
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: OVN
Version: FDP 20.E
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: OVN Team
QA Contact: Ehsan Elahi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-04 07:16 UTC by Antonio Ojea
Modified: 2021-05-19 01:37 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-08 08:41:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ovn-controller logs (70.93 KB, application/x-xz)
2020-08-04 07:16 UTC, Antonio Ojea
no flags Details

Description Antonio Ojea 2020-08-04 07:16:49 UTC
Created attachment 1710266 [details]
ovn-controller logs

Description of problem:


The ovn controller restarts, but previously has this error

2020-08-03T18:46:21.721Z|00103|util|EMER|lib/ovsdb-idl.c:4612: assertion row->new_datum != NULL failed in ovsdb_idl_txn_write__()

Version-Release number of selected component (if applicable):


How reproducible:

It happens from time to time in OVN-Kubernetes CI jobs. 

Steps to Reproduce:

maybe is a red herring, but there is a Warning before that may cause this?

2020-08-03T18:44:22.247326319Z stdout F 2020-08-03T18:44:22.246Z|00090|lflow|WARN|error parsing match "((ct.new && !ct.est) || (!ct.new && ct.est && !ct.rpl && ct_label.blocked == 1)) && (ip4.src == {$a10956707444534956691, $a13122364957363372530, $a14245307639866612073, $a15617139200530899851, $a16235039932615691331, $a17794588778302438979, $a18363165982804349389, $a4433314167141470080, $a5154718082306775057, $a5270369249448027068, $a5675285926127865604, $a6536697762898383367, $a6937002112706621489, $a9320209671274442397} && outport == @a10019124622592575031)": Syntax error at `$a6536697762898383367' expecting address set name.


Actual results:


Expected results:

ovn-controller to handle the error and not restarting

Additional info:


it's more likely a problem with ovn-kube order of operations, but ovn should be more resilient and not restart.

Comment 1 Antonio Ojea 2020-09-08 08:41:02 UTC
The main problem here was that after restart, the ovnkube-node was not able to recover because the restart script deletes the ovn-remote field, thus the controller was not able to connect to the southdb
https://github.com/ovn-org/ovn-kubernetes/pull/1667


Note You need to log in before you can comment on or make changes to this bug.