[Description of problem] After configuring the Logging stack using the ClusterLogForwarder API for sending logs to one external elasticsearch using PKI user authentication [1] it's possible to see the next error in the fluentd pods: https://www.elastic.co/guide/en/elasticsearch/reference/current/pki-realm.html ~~~ 2021-01-22 08:17:06 +0000 [warn]: [elasticsearch_onprem_secure] failed to flush the buffer. retry_time=10 next_retry_seconds=2021-01-22 08:22:02 +0000 chunk="5b68f908f8fc670c5f84991151a12440" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.example.com\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): [401] {\"error\":{\"root_cause\":[{\"type\":\"security_exception\",\"reason\":\"unable to authenticate user [fluentd] for REST request [/_bulk]\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"Bearer realm=\\\"security\\\"\",\"ApiKey\"]}}],\"type\":\"security_exception\",\"reason\":\"unable to authenticate user [fluentd] for REST request [/_bulk]\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"Bearer realm=\\\"security\\\"\",\"ApiKey\"]}},\"status\":401}" ~~~ [Version-Release number of selected component (if applicable):] OCP 4.6 clusterlogging.4.6.0-202011221454.p0 [How reproducible] Always Steps to Reproduce: 1. Deploy an Elasticsearch using PKI user authentication [1] 2. Configure ClusterLogForwarder for sending the logs to the external Elasticsearch creating the secret providing the CA, tls.cert and tls.key ~~~ spec: outputs: - name: elasticsearch-secure secret: name: external-tls-secret type: elasticsearch url: https://elasticsearch.example.com:9200 pipelines: - inputRefs: - application - audit labels: logs: application name: application-logs outputRefs: - elasticsearch-secure - inputRefs: - infrastructure labels: logs: audit-infra name: infrastructure-audit-logs outputRefs: - elasticsearch-secure ~~~ 3. Check the fluentd pods logs with the error that it's not able to auth to the Elasticsearch server: ~~~ 2021-01-22 08:17:06 +0000 [warn]: [elasticsearch_onprem_secure] failed to flush the buffer. retry_time=10 next_retry_seconds=2021-01-22 08:22:02 +0000 chunk="5b68f908f8fc670c5f84991151a12440" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch.example.com\", :port=>9200, :scheme=>\"https\", :user=>\"fluentd\", :password=>\"obfuscated\"}): [401] {\"error\":{\"root_cause\":[{\"type\":\"security_exception\",\"reason\":\"unable to authenticate user [fluentd] for REST request [/_bulk]\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"Bearer realm=\\\"security\\\"\",\"ApiKey\"]}}],\"type\":\"security_exception\",\"reason\":\"unable to authenticate user [fluentd] for REST request [/_bulk]\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"Bearer realm=\\\"security\\\"\",\"ApiKey\"]}},\"status\":401}" ~~~ 4. Check that it's possible to reach the external Elasticsearch and even push data using curl from inside a fluentd pods and using the certificates provided: ~~~ $ oc rsh <fluentd pod> $ server=elasticsearch.example.com:9200 $ cd /var/run/ocp-collector/secrets/external-tls-secret/ $ curl https://$server/_cat/health?v --key tls.key --cacert ca-bundle.crt --cert tls.crt epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1612262302 10:38:22 azch-cluster green 3 3 450 225 0 0 0 0 $ curl -XPUT https://$server/test -H 'Content-Type: application/json' -d ' { "settings" : { "number_of_shards" : 3, "number_of_replicas" : 1 } } ' --key tls.key --cacert ca-bundle.crt --cert tls.crt (...) {"acknowledged":true,"shards_acknowledged":true,"index":"test"} ~~~ 5. Check the fluentd configuration and observer that exists a default user and passw in the definition: ~~~ <label @ELASTICSEARCH__SECURE> <match retry_retry_elasticsearch_secure> @type copy <store> @type elasticsearch @id retry_elasticsearch_secure host elasticsearch.example.com port 9200 verify_es_version_at_startup false scheme https ssl_version TLSv1_2 target_index_key viaq_index_name id_key viaq_msg_id remove_keys viaq_index_name user fluentd <--------------- This password changeme <------------- This client_key '/var/run/ocp-collector/secrets/external-tls-secret/tls.key' client_cert '/var/run/ocp-collector/secrets/external-tls-secret/tls.crt' ca_file '/var/run/ocp-collector/secrets/external-tls-secret/ca-bundle.crt' type_name _doc http_backend typhoeus write_operation create reload_connections 'true' # https://github.com/uken/fluent-plugin-elasticsearch#reload-after reload_after '200' # https://github.com/uken/fluent-plugin-elasticsearch#sniffer-class-name sniffer_class_name 'Fluent::Plugin::ElasticsearchSimpleSniffer' ~~~ In the Elasticsearch documentation [1], it's possible to read: "You can use a combination of PKI and username/password authentication. For example, you can enable SSL/TLS on the transport layer and define a PKI realm to require transport clients to authenticate with X.509 certificates, while still authenticating HTTP traffic using username and password credentials." Then, how by default, it's added the "user fluentd" and password "changeme", it's using PKI + username/password authentication and it fails. WORKAROUND: - Move the CLO to Unmanaged - Delete from the fluentd configmap the user fluentd and password changeme ~~~ $ oc edit cm fluentd -n openshift-logging <label @ELASTICSEARCH__SECURE> <match retry_retry_elasticsearch_secure> @type copy <store> @type elasticsearch @id retry_elasticsearch_secure host elasticsearch.example.com port 9200 (...) user fluentd <--------------- delete this line password changeme <------------- delete this line client_key '/var/run/ocp-collector/secrets/external-tls-secret/tls.key' client_cert '/var/run/ocp-collector/secrets/external-tls-secret/tls.crt' ca_file '/var/run/ocp-collector/secrets/external-tls-secret/ca-bundle.crt' (...) ~~~ - Delete the fluentd pods $ oc delete pods -l component=fluentd After doing it, it's able to send the logs to the external Elasticsearch [Expected results:] It should work without needing to delete the user fluentd and password changeme, when these are created and the elasticsearch is using elasticsearch using PKI user authentication, then, it fails because it's trying to use PKI + user(fluentd)/password(changeme). Take into consideration that if PKI is used and a user/password is defined, then, it will try to use both and it will fail as it's commented in the Elasticsearch documentation: "You can use a combination of PKI and username/password authentication. For example, you can enable SSL/TLS on the transport layer and define a PKI realm to require transport clients to authenticate with X.509 certificates, while still authenticating HTTP traffic using username and password credentials." [1] https://www.elastic.co/guide/en/elasticsearch/reference/current/pki-realm.html
*** This bug has been marked as a duplicate of bug 1899334 ***