The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1861087 - [upgrade] Various WARN messages under northd and sbdb logs with transaction errors
Summary: [upgrade] Various WARN messages under northd and sbdb logs with transaction e...
Keywords:
Status: CLOSED DUPLICATE of bug 1860522
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: OVN
Version: FDP 20.E
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Dumitru Ceara
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-27 18:51 UTC by Anurag saxena
Modified: 2020-08-07 19:35 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-07 19:35:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
nbdb sbdb and northd logs (110.83 KB, application/gzip)
2020-07-27 18:51 UTC, Anurag saxena
no flags Details

Description Anurag saxena 2020-07-27 18:51:56 UTC
Created attachment 1702569 [details]
nbdb sbdb and northd logs

Description of problem: Lot of transaction error were seen during a Baremetal cluster upgrade from 4.5.2 to 4.5.3

Excerpt from northd logs on master-1
------------------------------------

|WARN|transaction error: {"details":"Transaction causes multiple rows in \"Port_Binding\" table to have identical values (openshift-insights_insights-operator-6bd7f47568-j2l5t
) for index on column \"logical_port\".  First row, with UUID 9b34e9f1-dc8e-490f-9ec4-6d0eae081337, existed in the database before this transaction and was not modified by the transaction.  Second row, with UUID
 7e81ba01-fe35-4e8d-b819-648aac99e5d4, was inserted by this transaction.","error":"constraint violation"}

Excerpt from sbdb logs on master2
---------------------------------

2020-07-27T18:16:02Z|00624|ovsdb_server|ERR|syntax error: transaction deletes row 03055103-180b-4163-9739-4e7080534f63 that does not exist
2020-07-27T18:16:02Z|00625|ovsdb_server|INFO|referential integrity violation: Table Port_Binding column datapath row afdff1d1-b901-4d9b-960e-5d70c8bc9731 references nonexistent row 00000000-0000-0000-0000-000000000000 in table Datapath_Binding.


Master-1 is NB leader

Version-Release number of selected component (if applicable): 4.5.3

How reproducible:Rare


Steps to Reproduce:
1. Various testing were performed and cluster was leap upgraded time to time from 4.4.x ->  4.5.0.rc.7 -> 4.5.1-rc.0 -> 4.5.2 -> 4.5.3

Actual results:Upgrade failed with various operator errors


Expected results:


Additional info:
$ oc get nodes
NAME                 STATUS     ROLES               AGE   VERSION
openshift-master-0   Ready      master              18d   v1.18.3+b74c5ed
openshift-master-1   Ready      master              18d   v1.18.3+b74c5ed
openshift-master-2   Ready      master              18d   v1.18.3+b74c5ed
openshift-worker-0   NotReady   worker,worker-cnf   16d   v1.18.3+b74c5ed
openshift-worker-1   Ready      worker,worker-cnf   16d   v1.18.3+b74c5ed
openshift-worker-2   Ready      worker,worker-lb    18d   v1.18.3+b74c5ed
openshift-worker-3   Ready      worker,worker-lb    18d   v1.18.3+b74c5ed


$ oc get pods -n openshift-ovn-kubernetes
NAME                   READY   STATUS      RESTARTS   AGE
ovnkube-master-72wdp   4/4     Running     0          7h21m
ovnkube-master-7grjf   4/4     Running     0          4h53m
ovnkube-master-hdwt7   4/4     Running     0          8h
ovnkube-node-5ccnc     2/2     Running     0          8h
ovnkube-node-7d79g     1/2     NotReady    0          8h
ovnkube-node-7r9qk     2/2     Running     0          8h
ovnkube-node-c7hkn     2/2     Running     0          8h
ovnkube-node-pxnbl     2/2     Running     0          7h25m
ovnkube-node-sw5jc     2/2     Running     0          8h
ovnkube-node-wmr2k     2/2     Running     4          4h56m
ovs-node-5kds4         0/1     Completed   0          8h
ovs-node-75phm         1/1     Running     0          8h
ovs-node-79lc5         1/1     Running     0          8h
ovs-node-c7778         1/1     Running     0          8h
ovs-node-n4lm9         1/1     Running     1          4h55m
ovs-node-pcwfg         1/1     Running     0          8h
ovs-node-tx6lh         1/1     Running     0          8h



$ oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.5.3     True        True          False      17d
cloud-credential                           4.5.3     True        False         False      18d
cluster-autoscaler                         4.5.3     True        False         False      18d
config-operator                            4.5.3     True        False         False      18d
console                                    4.5.3     False       False         False      5h3m
csi-snapshot-controller                    4.5.3     True        False         False      11d
dns                                        4.5.2     True        True          True       18d
etcd                                       4.5.3     True        False         False      18d
image-registry                             4.5.3     True        False         False      3d8h
ingress                                    4.5.3     True        False         False      6d7h
insights                                   4.5.3     True        False         False      18d
kube-apiserver                             4.5.3     True        False         False      18d
kube-controller-manager                    4.5.3     True        False         True       18d
kube-scheduler                             4.5.3     True        False         False      18d
kube-storage-version-migrator              4.5.3     True        False         False      3d8h
machine-api                                4.5.3     True        False         False      18d
machine-approver                           4.5.3     True        False         False      18d
machine-config                             4.5.2     False       False         True       11m
marketplace                                4.5.3     True        False         False      8h
monitoring                                 4.5.3     False       True          True       14m
network                                    4.5.3     True        True          True       18d
node-tuning                                4.5.3     True        False         False      3d7h
openshift-apiserver                        4.5.3     True        False         True       5h9m
openshift-controller-manager               4.5.3     True        False         False      3d6h
openshift-samples                          4.5.3     True        False         False      3d7h
operator-lifecycle-manager                 4.5.3     True        False         False      18d
operator-lifecycle-manager-catalog         4.5.3     True        False         False      18d
operator-lifecycle-manager-packageserver   4.5.3     True        False         False      3d6h
service-ca                                 4.5.3     True        False         False      18d
storage                                    4.5.3     True        False         False      3d7h

Comment 6 Anurag saxena 2020-07-28 16:16:09 UTC
AFAIK, NB/SB dbs should all be the same on all masters (since they are synchronized).

Comment 7 Dumitru Ceara 2020-07-29 08:03:30 UTC
(In reply to Anurag saxena from comment #6)
> AFAIK, NB/SB dbs should all be the same on all masters (since they are
> synchronized).

From the logs it seems like the DBs are not synchronized. It would be useful to see if that's the case. Next time you hit the issue could you please save all DBs?

Thanks,
Dumitru

Comment 12 Dan Williams 2020-08-07 19:35:03 UTC

*** This bug has been marked as a duplicate of bug 1860522 ***


Note You need to log in before you can comment on or make changes to this bug.