Bug 1462445 - Useless log messages from AWS API calls
Useless log messages from AWS API calls
Status: VERIFIED
Product: OpenShift Container Platform
Classification: Red Hat
Component: Kubernetes (Show other bugs)
3.5.1
Unspecified Unspecified
low Severity medium
: ---
: 3.7.0
Assigned To: Jeff Peeler
DeShuai Ma
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-17 14:20 EDT by Eric Paris
Modified: 2017-08-14 14:32 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously messages originating from the AWS SDK were causing partial log entries due to newlines in the message itself. Error messages are now properly quoted so all messages are fully reported.
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Github kubernetes/kubernetes/issues/47789 None None None 2017-06-20 10:39 EDT

  None (edit)
Description Eric Paris 2017-06-17 14:20:10 EDT
oc v3.5.5.19

/usr/bin/openshift start master controllers --config=/etc/origin/master/master-config.yaml --loglevel=2 --listen=https://0.0.0.0:8444

I see semi-regular spam in the logs like:

Jun 16 22:12:05 ip-172-31-54-162.ec2.internal atomic-openshift-master-controllers[48569]: status code: 400, request id:
Jun 16 22:12:05 ip-172-31-54-162.ec2.internal atomic-openshift-master-controllers[48569]: status code: 400, request id:
Jun 16 22:12:05 ip-172-31-54-162.ec2.internal atomic-openshift-master-controllers[48569]: status code: 400, request id:

This message tells me NOTHING of value and is just spam. We need to do two things.

1. Get rid of the useless spam (either by not logging or logging something useful)
2. Determine if it represents a problem and if so fix the problem.

It seems each useless spam is always proceeded by a log message which looks like one of the following:

Jun 16 22:13:40 ip-172-31-54-162.ec2.internal atomic-openshift-master-controllers[48569]: I0616 22:13:40.675705   48569 aws.go:1248] Assigned mount device bj -> volume vol-09e9fa5a07720c47e
Jun 16 22:13:40 ip-172-31-54-162.ec2.internal atomic-openshift-master-controllers[48569]: I0616 22:13:40.912467   48569 aws.go:1271] Releasing in-process attachment entry: ba -> volume vol-009ac4600758f6896
Comment 2 Robert Rati 2017-06-20 08:45:47 EDT
From the logs it looks like the attach detach controller is trying to attach volumes that are already attached to an instance to another instance (possibly the same instance?).  That doesn't seem like the appropriate behavior, but I'm not an expert in that controller.

The log message itself is coming from the aws sdk.  There has been some reworking of error handling in the newer sdk, but it isn't clear that would solve this issue.  Still tracking down where that message actually gets printed to see what control, if any, we have over it.

How is this issue reproduced?
Comment 3 Robert Rati 2017-06-20 08:46:43 EDT
Correction, the reconciler in the attach detach controller.
Comment 4 Eric Paris 2017-06-20 09:01:54 EDT
Unmarked your comments private. They contained no private information.

The attach/detach controller should definitely be better about not trying to mount volumes on a node while it is still mounted on another node. No question. But that is the not what this BZ is about. We get a very good reasonable log message when that happens:

We also get a completely useless spammy log message which provides nothing of value. This BZ is to address the completely useless spammy log message. You need to either make it provide useful actionable intelligent information not already being logged or you need to make it go away entirely.

I don't care which, but useless log spam must go.
Comment 5 Robert Rati 2017-06-20 09:11:22 EDT
Sure, I'm just highlighting that the error situation that brings this up might want to be investigated by someone more knowledgeable about the controller.  I'm looking at getting rid of the debug message.
Comment 6 Hemant Kumar 2017-06-20 09:17:21 EDT
The log line is coming from AWS SDK code. This is coming from - https://github.com/kubernetes/kubernetes/blob/master/vendor/github.com/aws/aws-sdk-go/aws/awserr/types.go#L142

And this is how AWS combines the error string - https://github.com/kubernetes/kubernetes/blob/master/vendor/github.com/aws/aws-sdk-go/aws/awserr/types.go#L9

They are joining those different error strings by "\n\t", so when we print a straight up error returned by AWS SDK, that is what we get. :(

It is bit crazy that AWS sdk joins the error strings by "\n\t" and thereby making them span several lines.
Comment 7 Eric Paris 2017-06-20 09:34:06 EDT
The handling of log message with \n are similarly a problem in a BZ I assigned to deads (his is about the trace subsystem).

I guess we need to make a concerted effort to drive all \n out of log messages?
Comment 8 Eric Paris 2017-06-20 09:38:05 EDT
Although, looking through the (raw) logs I don't see any following log lines that seem to include more related information. It looks like the spam is all I'm getting...
Comment 9 Robert Rati 2017-06-20 09:43:45 EDT
The information that looks like spam is part of the error message from the aws sdk.  I'm on the fence as to whether that information will ever be useful if the request id is never filled in by the sdk.

I'm going to log an issue upstream where kubernetes community can discuss.
Comment 10 Hemant Kumar 2017-06-20 09:44:35 EDT
I think log lines are getting interleaved because of multi-threaded nature of controller-manager and the fact that, error string from AWS spans multiple lines..
Comment 11 Hemant Kumar 2017-06-20 09:48:26 EDT
It also makes effort we put to improve that error kind of mute. https://trello.com/c/vy1B8QyL/467-1-improving-logging-of-volumeinuse-errors 

because if AWS error is in middle of log line, it will automatically split the device error too. :(
Comment 12 Robert Rati 2017-06-20 09:50:12 EDT
That is my guess as to what is happening. However even if we were to introspect the error message from aws and remove the \n from the log message so that it attaches to a previous error message it isn't clear to me having the status code and a blank request id adds anything to the error message.
Comment 14 Jeff Peeler 2017-06-28 16:27:59 EDT
Origin cherry pick: https://github.com/openshift/origin/pull/14948

Note You need to log in before you can comment on or make changes to this bug.