Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1980657

Summary:	fluentd ServiceMonitor in OpenShift Logging cannot be collected by user workload prometheus due to invalid tls config on ROSA
Product:	OpenShift Container Platform	Reporter:	Daein Park <dapark>
Component:	Logging	Assignee:	Jeff Cantrill <jcantril>
Status:	CLOSED DEFERRED	QA Contact:	Anping Li <anli>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	4.7	CC:	aos-bugs
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	logging-core
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-07-10 03:23:07 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Daein Park 2021-07-09 07:09:33 UTC

Description of problem:

The OpenShift Logging(EFK) stack is user workload at the ROSA.
But the FluentdNodeDown critical alert is always firing due to not collected the required metrics through user workload prometheus.

// The message from Prometheus operator in "openshift-user-workload-monitoring" project.
level=warn ts=2021-07-02T02:39:00.701373761Z caller=operator.go:1675 component=prometheusoperator msg="skipping servicemonitor" error="it accesses file system via tls config which Prometheus specification prohibits" servicemonitor=openshift-logging/fluentd namespace=openshift-user-workload-monitoring prometheus=user-workload

Because the fluentd Service monitor tls config invalid at the user workload prometheus as follows.

// Look why above message is shown, the fluentd servicemonitor tlsconfig does not met by the following conditions.
https://github.com/openshift/prometheus-operator/blob/ce7d979635b9d1210db48d54485bc924aed37cdb/pkg/prometheus/operator.go#L1964-L1966
~~~
if tlsConf.CAFile != "" || tlsConf.CertFile != "" || tlsConf.KeyFile != "" {
return errors.New("it accesses file system via tls config which Prometheus specification prohibits")
}
~~~

Version-Release number of selected component (if applicable):

ROSA(4.7.z)

How reproducible:

You can reproduce this issue as installing OpenShift Logging on ROSA

Or, you can also reproduce this issue on OCPv4.7.z as OpenShift Logging install without "openshift.io/cluster-monitoring" label in "openshift-logging".
You can see the "FluentdNodeDown" critical alert would be firing within 10 mins.

Steps to Reproduce:
1.
2.
3.

Actual results:

As always "FluentdNodeDown" critical alert is firing the all fluentd pods are up and running without issues due to not collecting required metrics by invalid tls config at the fluentd servicemonitor.

Expected results:

OpenShift Logging(EFK) stack should provide valid tls config for fluentd ServiceMonitor in order to collect the metrics by user workload promehtues.

Additional info:

I've verified the fluentd servicemonitor with valid tls config as follows.

1. For testing, firstly stop the cluster-logging-operator.
2. Modify the fluentd servicemonitor tls config asfollows.
:
spec:
endpoints:
- bearerTokenSecret:
key: ""
path: /metrics
port: metrics
scheme: https
tlsConfig:
insecureSkipVerify: true
serverName: fluentd.openshift-logging.svc
jobLabel: monitor-fluentd
namespaceSelector:
matchNames:
- openshift-logging
selector:
matchLabels:
logging-infra: support

3. Check if the fluentd metrics are collected by user workload prometheus.

$ oc rsh -n openshift-user-workload-monitoring -c prometheus prometheus-user-workload-1 \
curl 'http://localhost:9090/api/v1/query?query=up%7Bjob%3D"fluentd"%7D+%3D%3D+1' | jq .
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "up",
"container": "fluentd",
"endpoint": "metrics",
"instance": "10.128.0.7:24231",
"job": "fluentd",
"namespace": "openshift-logging",
"pod": "fluentd-5rnpl",
"service": "fluentd"
},
"value": [
1625812558.084,
"1"
]
},
:

Comment 1 Jeff Cantrill 2021-07-09 13:00:18 UTC

Please close this issue and open a new one at issues.redhat.com for the Logging project. OpenShift Logging (5.x) for deployments on OCP 4.7+ are reported in JIRA

Comment 2 Daein Park 2021-07-10 03:23:07 UTC

Thank you for your pointing. I've reported here on this issue: https://issues.redhat.com/browse/LOG-1561
And close this ticket.