Created attachment 1722069 [details] YAML file to reproduce the problem. Description of problem: Creating a CLF instance with an Output.Secret causes the CLO fluentd to crash-loop because of a missing file mount. Version-Release number of selected component (if applicable): Current master How reproducible: 100% Steps to Reproduce: 1. Deploy the CLO 2. oc create -f clf-secret.yaml # attached 3. oc get pod -l component=fluentd 4. oc logs $(oc get pod -l component=fluentd -o name) Actual results: fluentd pod in CrashLoopBackoff state logs contains error like this: /usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/config/literal_parser.rb:171:in `instance_eval': No such file or directory @ rb_sysopen - /var/run/ocp-collector/secrets/test-secret/shared_key (Errno::ENOENT) Expected results: No errors, fluentd running. Additional info: Missing file is supposed to be mounted from a secret, but for some reason this is not happening. Question: do we even need the shared_key anymore? fluentd 1.0 can do TLS without shared keys, and the shared key adds no extra security since its value is fixed and plainly visible in the source code.
This is a case of bad error handling. The secret is expected to have a shared_key entry, but this is not checked. If the entry is missing, we crash trying to open a nil file name. Proposed fix: verify all the secret keys before generating fluent configuration, take appropriate action or set a degraded status.
Another problem: If more than one ClusterLogForwarder output has a Secret, the CLO fails to reconcile with {"level":"error","ts":"2020-10-16T20:50:01Z","logger":"cluster-logging-operator","msg":"clusterlogforwarder-controller returning, error","cause":{"msg":"Unable to reconcile collection for \"instance\": Failure creating Fluentd Daemonset DaemonSet.apps \"fluentd\" is invalid: spec.template.spec.containers[0].volumeMounts[15].mountPath: Invalid value: \"/var/run/ocp-collector/secrets/receiver\": must be unique"}}
(In reply to Alan Conway from comment #2) > Another problem: If more than one ClusterLogForwarder output has a Secret, > the CLO fails to reconcile with > > {"level":"error","ts":"2020-10-16T20:50:01Z","logger":"cluster-logging- > operator","msg":"clusterlogforwarder-controller returning, > error","cause":{"msg":"Unable to reconcile collection for \"instance\": > Failure creating Fluentd Daemonset DaemonSet.apps \"fluentd\" is invalid: > spec.template.spec.containers[0].volumeMounts[15].mountPath: Invalid value: > \"/var/run/ocp-collector/secrets/receiver\": must be unique"}} More then one secret with the same name? Secrets should be treated as a SET and if they are not then this is a bug
Ran into this issue, and found a work around. Basically if you have multiple outputs using the same secret, the CLO won't update the ds properly. Reported in this bug https://bugzilla.redhat.com/show_bug.cgi?id=1890072
Setting UpcomingSprint as unable to resolve before EOD
Fix in progress, overhaul of secret validation & security config. Will also fix https://bugzilla.redhat.com/show_bug.cgi?id=1834414
Fix PR https://github.com/openshift/cluster-logging-operator/pull/823 awaiting merge.
Verified with cluster-logging.5.0.0-32 When the secret doesn't have `shared_key`, the fluentd pods can start and can forward logs to the fluentd server.
Moved the contents of the Doc Text field here for future reference: Feature: ClusterLogForwarder TLS configuration Reason: Improved control of TLS, need to clarify the text Result: This is the spec documentation comment, update the appropriate user doc to be consistent. // URL to send log records to. // // An absolute URL, with a scheme. Valid schemes depend on `type`. // Special schemes `tcp`, `tls`, `udp` and `udps` are used for types that // have no scheme of their own. For example, to send syslog records using secure UDP: // // { type: syslog, url: udps://syslog.example.com:1234 } // // Basic TLS is enabled if the URL scheme requires it (for example 'https' or 'tls'). // The 'username@password' part of `url` is ignored. // Any additional authentication material is in the `secret`. // See the `secret` field for more details. // // +kubebuilder:validation:Pattern:=`^$|[a-zA-z]+:\/\/.*` // +optional URL string `json:"url,omitempty"` OutputTypeSpec `json:",inline"` // Secret for authentication. // Name of a secret in the same namespace as the cluster logging operator. // // For client authentication, set secret keys `tls.crt` and `tls.key` to the client certificate and private key. // // To use your own certificate authority, set secret key `ca-bundle.crt`. // // Depending on the `type` there may be other secret keys that have meaning. // // +optional Secret *OutputSecretSpec `json:"secret,omitempty"`
Please describe this enhancement for our release notes using the Doc Text field. For examples of release notes for Logging-related enhancents, please [see the current release notes][1] (The Doc Text field is poorly named. It doesn't create a request to update the product documentation. Instead, please [create a Jira task][2] with: - Project = Docs for Red Hat Developers (RHDEVDOCS) - Component = Logging) [1]https://docs.openshift.com/container-platform/4.6/release_notes/ocp-4-6-release-notes.html#ocp-4-6-logging [2]https://issues.redhat.com/secure/CreateIssue!default.jspa
Moving to POST to satisfy the merge bot
Moving back to verified per #c9
Updated doc text, created a documentation JIRA: https://issues.redhat.com/browse/LOG-1064
Doc text is good.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Errata Advisory for Openshift Logging 5.0.0), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0652