Bug 1779464 - Misleading errors in ovnkube-node logs
Summary: Misleading errors in ovnkube-node logs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.5.0
Assignee: Alexander Constantinescu
QA Contact: Ross Brattain
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-04 03:05 UTC by Russell Bryant
Modified: 2020-07-13 17:13 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-13 17:12:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:13:05 UTC

Description Russell Bryant 2019-12-04 03:05:29 UTC
I noticed several of the following errors in the first minute after ovnkube-node started.

time="2019-12-03T01:05:41Z" level=error msg="Error while obtaining gateway router addresses for ip-10-0-168-35.us-east-2.compute.internal, stdout: \"\", stderr: \"ovn-nbctl: etor-GR_ip-10-0-168-35.us-east-2.compute.internal: port name not found\\n\", error: OVN command '/usr/bin/ovn-nbctl --private-key=/ovn-cert/tls.key --certificate=/ovn-cert/tls.crt --bootstrap-ca-cert=/ovn-ca/ca-bundle.crt --db=ssl:[2600:1f16:81b:4f00:c2f2:ccf4:70ff:2d8f]:9641,ssl:[2600:1f16:81b:4f01:2b16:3a57:788c:b3ed]:9641,ssl:[2600:1f16:81b:4f02:40ff:5c0b:5644:905c]:9641 --timeout=15 lsp-get-addresses etor-GR_ip-10-0-168-35.us-east-2.compute.internal' failed: exit status 1"
time="2019-12-03T01:05:42Z" level=error msg="Error while obtaining gateway router addresses for ip-10-0-168-35.us-east-2.compute.internal, stdout: \"\", stderr: \"ovn-nbctl: etor-GR_ip-10-0-168-35.us-east-2.compute.internal: port name not found\\n\", error: OVN command '/usr/bin/ovn-nbctl --private-key=/ovn-cert/tls.key --certificate=/ovn-cert/tls.crt --bootstrap-ca-cert=/ovn-ca/ca-bundle.crt --db=ssl:[2600:1f16:81b:4f00:c2f2:ccf4:70ff:2d8f]:9641,ssl:[2600:1f16:81b:4f01:2b16:3a57:788c:b3ed]:9641,ssl:[2600:1f16:81b:4f02:40ff:5c0b:5644:905c]:9641 --timeout=15 lsp-get-addresses etor-GR_ip-10-0-168-35.us-east-2.compute.internal' failed: exit status 1"
 
These errors happened several times at startup (once per second for 20-30 seconds).

This comes from GatewayReady(), where ovnkube-node is waiting for the gateway router logical switch port to be created.

ovnkube-master is creating it in GatewayInit() as it reconciles node resource state.

Since this is a normal expected condition, it would be nice not to log these as errors in the log, making it look like a real problem.

Comment 1 Alexander Constantinescu 2019-12-16 10:31:12 UTC
Hi

Just FYI concerning this, we had a discussion during out networking face2face regarding logging in OVN (see https://docs.google.com/document/d/11MvIjjmDy2sVpcej4b-tE2SMaSwhPtygmK_tO7-2StE/edit#heading=h.azeprfp4neg4) and we had agreed that we'd need to change all spurious error logs (which are not actually "errors", but more like "waiting for something to happen that has not happened yet"). 

This bug could either be fixed in US assigned to clean-up all logging in OVN, or the clean-up could be done during the resolution of this bug.

/Alex

Comment 2 Alexander Constantinescu 2020-03-09 10:22:26 UTC
hi

Moving this to ON-QA as a PR I made upstreams fixed those log messages:

https://github.com/ovn-org/ovn-kubernetes/pull/1047/commits/7f7a02a93a0ec2e12aa1b5dcf48f5eaa96eb4d2a

/Alex

Comment 3 Anurag saxena 2020-03-17 20:44:30 UTC
Looks good to me on 4.5.0-0.nightly-2020-03-16-004817

Steps taken
1) Searched for error logs under all ovnkube-node pods like "oc logs ovnkube-node-28zvc -c ovnkube-node | grep -i error". Couldn't find any
2) restarted all ovnkube-node pods 
3) Repeat step 1, seems good.

Comment 6 errata-xmlrpc 2020-07-13 17:12:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.