Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1980657

Summary: fluentd ServiceMonitor in OpenShift Logging cannot be collected by user workload prometheus due to invalid tls config on ROSA
Product: OpenShift Container Platform Reporter: Daein Park <dapark>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED DEFERRED QA Contact: Anping Li <anli>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.7CC: aos-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: logging-core
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-10 03:23:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Daein Park 2021-07-09 07:09:33 UTC
Description of problem:

The OpenShift Logging(EFK) stack is user workload at the ROSA.
But the FluentdNodeDown critical alert is always firing due to not collected the required metrics through user workload prometheus.

// The message from Prometheus operator in "openshift-user-workload-monitoring" project.
level=warn ts=2021-07-02T02:39:00.701373761Z caller=operator.go:1675 component=prometheusoperator msg="skipping servicemonitor" error="it accesses file system via tls config which Prometheus specification prohibits" servicemonitor=openshift-logging/fluentd namespace=openshift-user-workload-monitoring prometheus=user-workload

Because the fluentd Service monitor tls config invalid at the user workload prometheus as follows.

// Look why above message is shown, the fluentd servicemonitor tlsconfig does not met by the following conditions.
https://github.com/openshift/prometheus-operator/blob/ce7d979635b9d1210db48d54485bc924aed37cdb/pkg/prometheus/operator.go#L1964-L1966
~~~
	if tlsConf.CAFile != "" || tlsConf.CertFile != "" || tlsConf.KeyFile != "" {
		return errors.New("it accesses file system via tls config which Prometheus specification prohibits")
	}
~~~

Version-Release number of selected component (if applicable):

ROSA(4.7.z)

How reproducible:

You can reproduce this issue as installing OpenShift Logging on ROSA

Or, you can also reproduce this issue on OCPv4.7.z as OpenShift Logging install without "openshift.io/cluster-monitoring" label in "openshift-logging".
You can see the "FluentdNodeDown" critical alert would be firing within 10 mins.

Steps to Reproduce:
1.
2.
3.

Actual results:

As always "FluentdNodeDown" critical alert is firing the all fluentd pods are up and running without issues due to not collecting required metrics by invalid tls config at the fluentd servicemonitor.

Expected results:

OpenShift Logging(EFK) stack should provide valid tls config for fluentd ServiceMonitor in order to collect the metrics by user workload promehtues.

Additional info:

I've verified the fluentd servicemonitor with valid tls config as follows.

1. For testing, firstly stop the cluster-logging-operator.
2. Modify the fluentd servicemonitor tls config asfollows.
:
spec:
  endpoints:
  - bearerTokenSecret:
      key: ""
    path: /metrics
    port: metrics
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
      serverName: fluentd.openshift-logging.svc
  jobLabel: monitor-fluentd
  namespaceSelector:
    matchNames:
    - openshift-logging
  selector:
    matchLabels:
      logging-infra: support

3. Check if the fluentd metrics are collected by user workload prometheus.

$ oc rsh -n openshift-user-workload-monitoring -c prometheus prometheus-user-workload-1 \
  curl 'http://localhost:9090/api/v1/query?query=up%7Bjob%3D"fluentd"%7D+%3D%3D+1' | jq .
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "up",
          "container": "fluentd",
          "endpoint": "metrics",
          "instance": "10.128.0.7:24231",
          "job": "fluentd",
          "namespace": "openshift-logging",
          "pod": "fluentd-5rnpl",
          "service": "fluentd"
        },
        "value": [
          1625812558.084,
          "1"
        ]
      },
:

Comment 1 Jeff Cantrill 2021-07-09 13:00:18 UTC
Please close this issue and open a new one at issues.redhat.com for the Logging project. OpenShift Logging (5.x) for deployments on OCP 4.7+ are reported in JIRA

Comment 2 Daein Park 2021-07-10 03:23:07 UTC
Thank you for your pointing. I've reported here on this issue: https://issues.redhat.com/browse/LOG-1561
And close this ticket.