Bug 1779464

Summary: Misleading errors in ovnkube-node logs
Product: OpenShift Container Platform Reporter: Russell Bryant <rbryant>
Component: NetworkingAssignee: Alexander Constantinescu <aconstan>
Networking sub component: ovn-kubernetes QA Contact: Ross Brattain <rbrattai>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: aconstan, anusaxen, bbennett, zzhao
Version: 4.3.0   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:12:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Russell Bryant 2019-12-04 03:05:29 UTC
I noticed several of the following errors in the first minute after ovnkube-node started.

time="2019-12-03T01:05:41Z" level=error msg="Error while obtaining gateway router addresses for ip-10-0-168-35.us-east-2.compute.internal, stdout: \"\", stderr: \"ovn-nbctl: etor-GR_ip-10-0-168-35.us-east-2.compute.internal: port name not found\\n\", error: OVN command '/usr/bin/ovn-nbctl --private-key=/ovn-cert/tls.key --certificate=/ovn-cert/tls.crt --bootstrap-ca-cert=/ovn-ca/ca-bundle.crt --db=ssl:[2600:1f16:81b:4f00:c2f2:ccf4:70ff:2d8f]:9641,ssl:[2600:1f16:81b:4f01:2b16:3a57:788c:b3ed]:9641,ssl:[2600:1f16:81b:4f02:40ff:5c0b:5644:905c]:9641 --timeout=15 lsp-get-addresses etor-GR_ip-10-0-168-35.us-east-2.compute.internal' failed: exit status 1"
time="2019-12-03T01:05:42Z" level=error msg="Error while obtaining gateway router addresses for ip-10-0-168-35.us-east-2.compute.internal, stdout: \"\", stderr: \"ovn-nbctl: etor-GR_ip-10-0-168-35.us-east-2.compute.internal: port name not found\\n\", error: OVN command '/usr/bin/ovn-nbctl --private-key=/ovn-cert/tls.key --certificate=/ovn-cert/tls.crt --bootstrap-ca-cert=/ovn-ca/ca-bundle.crt --db=ssl:[2600:1f16:81b:4f00:c2f2:ccf4:70ff:2d8f]:9641,ssl:[2600:1f16:81b:4f01:2b16:3a57:788c:b3ed]:9641,ssl:[2600:1f16:81b:4f02:40ff:5c0b:5644:905c]:9641 --timeout=15 lsp-get-addresses etor-GR_ip-10-0-168-35.us-east-2.compute.internal' failed: exit status 1"
 
These errors happened several times at startup (once per second for 20-30 seconds).

This comes from GatewayReady(), where ovnkube-node is waiting for the gateway router logical switch port to be created.

ovnkube-master is creating it in GatewayInit() as it reconciles node resource state.

Since this is a normal expected condition, it would be nice not to log these as errors in the log, making it look like a real problem.

Comment 1 Alexander Constantinescu 2019-12-16 10:31:12 UTC
Hi

Just FYI concerning this, we had a discussion during out networking face2face regarding logging in OVN (see https://docs.google.com/document/d/11MvIjjmDy2sVpcej4b-tE2SMaSwhPtygmK_tO7-2StE/edit#heading=h.azeprfp4neg4) and we had agreed that we'd need to change all spurious error logs (which are not actually "errors", but more like "waiting for something to happen that has not happened yet"). 

This bug could either be fixed in US assigned to clean-up all logging in OVN, or the clean-up could be done during the resolution of this bug.

/Alex

Comment 2 Alexander Constantinescu 2020-03-09 10:22:26 UTC
hi

Moving this to ON-QA as a PR I made upstreams fixed those log messages:

https://github.com/ovn-org/ovn-kubernetes/pull/1047/commits/7f7a02a93a0ec2e12aa1b5dcf48f5eaa96eb4d2a

/Alex

Comment 3 Anurag saxena 2020-03-17 20:44:30 UTC
Looks good to me on 4.5.0-0.nightly-2020-03-16-004817

Steps taken
1) Searched for error logs under all ovnkube-node pods like "oc logs ovnkube-node-28zvc -c ovnkube-node | grep -i error". Couldn't find any
2) restarted all ovnkube-node pods 
3) Repeat step 1, seems good.

Comment 6 errata-xmlrpc 2020-07-13 17:12:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409