1866791 – storage co operatorLogLevel is not validated

Bug 1866791 - storage co operatorLogLevel is not validated

Summary: storage co operatorLogLevel is not validated

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Jan Safranek
QA Contact:	Qin Ping
Docs Contact:
URL:
Whiteboard:
Depends On:	1877408 1878007
Blocks:
TreeView+	depends on / blocked

Reported:	2020-08-06 11:18 UTC by Qin Ping
Modified:	2020-10-27 16:25 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1877408 (view as bug list)
Environment:
Last Closed:	2020-10-27 16:25:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift aws-ebs-csi-driver-operator pull 90	None	closed	Bug 1866791: Bump library-go to get better operatorLogLevel validation	2020-10-26 02:34:17 UTC
Github	openshift cluster-storage-operator pull 86	None	closed	Bug 1866791: Bump library-go to get better operatorLogLevel validation	2020-10-26 02:34:17 UTC
Github	openshift csi-driver-manila-operator pull 59	None	closed	Bug 1866791: Bump library-go to get better operatorLogLevel validation	2020-10-26 02:34:17 UTC
Github	openshift ovirt-csi-driver-operator pull 28	None	closed	Bug 1866791: Bump library-go to get better operatorLogLevel validation	2020-10-26 02:34:17 UTC
Red Hat Product Errata	RHBA-2020:4196	None	None	None	2020-10-27 16:25:42 UTC

Description Qin Ping 2020-08-06 11:18:07 UTC

Description of problem:
The description of OperatorLogLevelChange event is not correct

Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-08-05-203353

How reproducible:
Always

Steps to Reproduce:
1.oc patch  storage cluster -p '{"spec": {"operatorLogLevel" : "Debug"}}' --type merge
2.oc patch  storage cluster -p '{"spec": {"operatorLogLevel" : "TraceALL"}}' --type merge
3.oc patch  storage cluster -p '{"spec": {"operatorLogLevel" : "test"}}' --type merge
4. oc patch  storage cluster -p '{"spec": {"operatorLogLevel" : "Normal"}}' --type merge
5. Check the log of cluster-storage-operator
6. Check the events of cluster-storage-operator deloyment

Actual results:
1. The logs did not change when updating the operatorLogLevel
2. All the events:
Normal  OperatorLogLevelChange  6m35s (x2 over 12m)  cluster-storage-operator-loggingsyncer  Operator log level changed from "Normal" to "Debug"
  Normal  OperatorLogLevelChange  6m20s                cluster-storage-operator-loggingsyncer  Operator log level changed from "Debug" to "TraceALL"
  Normal  OperatorLogLevelChange  5m54s (x3 over 30m)  cluster-storage-operator-loggingsyncer  Operator log level changed from "Normal" to "test"

When updating operatorloglevel from TraceALL to the test, the event report "changed from Normal to test"
When updating operatorloglevel from TraceALL to the normal, there is no event.

Expected results:
When changing the operatorLogLevel, the logs level is changed actually.
The events can record the operation correctly.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Jan Safranek 2020-08-06 13:25:28 UTC

3. oc patch  storage cluster -p '{"spec": {"operatorLogLevel" : "test"}}' --type merge

This one looks like a real bug, validation should reject value "test".

On the other hand, the other are working for me. Especially change from/to TraceAll is very noticable. Please make sure that CVO is not running, because it will overwrite your Storage changes back to Normal:
oc scale -n openshift-cluster-version  deployment/cluster-version-operator --replicas=0

On Trace level, you should see for example:
I0806 15:23:13.747678 2714590 crcontroller.go:154] CSIDriverOperatorCRController sync finished
I0806 15:23:13.811154 2714590 crcontroller.go:109] CSIDriverOperatorCRController sync started

On TraceAll, you should see: 
I0806 15:23:00.672438 2714590 request.go:1095] Request Body:
00000000  6b 38 73 00 0a 0f 0a 02  76 31 12 09 43 6f 6e 66  |k8s.....v1..Conf|
[whole protobuff dump]

Comment 2 Qin Ping 2020-08-07 02:43:43 UTC

Yes, "TraceAll" is working but "TraceALL" is not working.

Comment 3 Jan Safranek 2020-08-07 06:54:57 UTC

Oh, ok. It's the validation then.

Comment 4 Jan Safranek 2020-08-12 20:48:26 UTC

Since we use library-go, we cannot add hard validation of all OperatorSpec.LogLevel there - other CRs that use the same OperatorSpec and thus have the same validation can already have a CR with wrong LogLevel in etcd and this CR would get unusable after upgrade to new version with tight validation.

Adding soft validation (log message, event), but keep the CR working: https://github.com/openshift/library-go/pull/864

Comment 6 Jan Safranek 2020-09-11 11:08:38 UTC

Note to QA: operatorLogLevel should have very soft validation in CSO and all our CSI operators. Unknown operatorLoglevel leads to a log message + "Normal" level is used.

Check bug #1878007 for solution for all operator CRs + both logLevel / operatorLogLevel fields.

Comment 8 Qin Ping 2020-09-16 06:09:23 UTC

verified with: 4.6.0-0.nightly-2020-09-14-225526

Comment 11 errata-xmlrpc 2020-10-27 16:25:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.