Bug 1866791 - storage co operatorLogLevel is not validated
Summary: storage co operatorLogLevel is not validated
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: Jan Safranek
QA Contact: Qin Ping
URL:
Whiteboard:
Depends On: 1877408 1878007
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-06 11:18 UTC by Qin Ping
Modified: 2020-10-27 16:25 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1877408 (view as bug list)
Environment:
Last Closed: 2020-10-27 16:25:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift aws-ebs-csi-driver-operator pull 90 0 None closed Bug 1866791: Bump library-go to get better operatorLogLevel validation 2020-10-26 02:34:17 UTC
Github openshift cluster-storage-operator pull 86 0 None closed Bug 1866791: Bump library-go to get better operatorLogLevel validation 2020-10-26 02:34:17 UTC
Github openshift csi-driver-manila-operator pull 59 0 None closed Bug 1866791: Bump library-go to get better operatorLogLevel validation 2020-10-26 02:34:17 UTC
Github openshift ovirt-csi-driver-operator pull 28 0 None closed Bug 1866791: Bump library-go to get better operatorLogLevel validation 2020-10-26 02:34:17 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:25:42 UTC

Description Qin Ping 2020-08-06 11:18:07 UTC
Description of problem:
The description of OperatorLogLevelChange event is not correct

Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-08-05-203353

How reproducible:
Always

Steps to Reproduce:
1.oc patch  storage cluster -p '{"spec": {"operatorLogLevel" : "Debug"}}' --type merge
2.oc patch  storage cluster -p '{"spec": {"operatorLogLevel" : "TraceALL"}}' --type merge
3.oc patch  storage cluster -p '{"spec": {"operatorLogLevel" : "test"}}' --type merge
4. oc patch  storage cluster -p '{"spec": {"operatorLogLevel" : "Normal"}}' --type merge
5. Check the log of cluster-storage-operator
6. Check the events of cluster-storage-operator deloyment

Actual results:
1. The logs did not change when updating the operatorLogLevel
2. All the events:
Normal  OperatorLogLevelChange  6m35s (x2 over 12m)  cluster-storage-operator-loggingsyncer  Operator log level changed from "Normal" to "Debug"
  Normal  OperatorLogLevelChange  6m20s                cluster-storage-operator-loggingsyncer  Operator log level changed from "Debug" to "TraceALL"
  Normal  OperatorLogLevelChange  5m54s (x3 over 30m)  cluster-storage-operator-loggingsyncer  Operator log level changed from "Normal" to "test"

When updating operatorloglevel from TraceALL to the test, the event report "changed from Normal to test"
When updating operatorloglevel from TraceALL to the normal, there is no event.

Expected results:
When changing the operatorLogLevel, the logs level is changed actually.
The events can record the operation correctly.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Jan Safranek 2020-08-06 13:25:28 UTC
3. oc patch  storage cluster -p '{"spec": {"operatorLogLevel" : "test"}}' --type merge

This one looks like a real bug, validation should reject value "test".

On the other hand, the other are working for me. Especially change from/to TraceAll is very noticable. Please make sure that CVO is not running, because it will overwrite your Storage changes back to Normal:
oc scale -n openshift-cluster-version  deployment/cluster-version-operator --replicas=0

On Trace level, you should see for example:
I0806 15:23:13.747678 2714590 crcontroller.go:154] CSIDriverOperatorCRController sync finished
I0806 15:23:13.811154 2714590 crcontroller.go:109] CSIDriverOperatorCRController sync started

On TraceAll, you should see: 
I0806 15:23:00.672438 2714590 request.go:1095] Request Body:
00000000  6b 38 73 00 0a 0f 0a 02  76 31 12 09 43 6f 6e 66  |k8s.....v1..Conf|
[whole protobuff dump]

Comment 2 Qin Ping 2020-08-07 02:43:43 UTC
Yes, "TraceAll" is working but "TraceALL" is not working.

Comment 3 Jan Safranek 2020-08-07 06:54:57 UTC
Oh, ok. It's the validation then.

Comment 4 Jan Safranek 2020-08-12 20:48:26 UTC
Since we use library-go, we cannot add hard validation of all OperatorSpec.LogLevel there - other CRs that use the same OperatorSpec and thus have the same validation can already have a CR with wrong LogLevel in etcd and this CR would get unusable after upgrade to new version with tight validation.

Adding soft validation (log message, event), but keep the CR working: https://github.com/openshift/library-go/pull/864

Comment 6 Jan Safranek 2020-09-11 11:08:38 UTC
Note to QA: operatorLogLevel should have very soft validation in CSO and all our CSI operators. Unknown operatorLoglevel leads to a log message + "Normal" level is used.

Check bug #1878007 for solution for all operator CRs + both logLevel / operatorLogLevel fields.

Comment 8 Qin Ping 2020-09-16 06:09:23 UTC
verified with: 4.6.0-0.nightly-2020-09-14-225526

Comment 11 errata-xmlrpc 2020-10-27 16:25:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.