Bug 1835396
| Summary: | Cannot access logs in kibana with the managed deployment orchestrated by the cluster logging operator | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Tyler Lisowski <lisowski> | ||||||||
| Component: | Logging | Assignee: | Jeff Cantrill <jcantril> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | high | ||||||||||
| Version: | 4.3.z | CC: | aos-bugs, bgottfri, brian_mckeown, brueckner, cewong, ewolinet, ikarpukh, jason.greene, jcantril, jtpape, nbziouec, periklis, pweil, rkonuru, smerrow, tnakajo, wili, yhuang | ||||||||
| Target Milestone: | --- | Keywords: | Reopened, UpcomingSprint | ||||||||
| Target Release: | 4.4.z | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | |||||||||||
| : | 1854608 (view as bug list) | Environment: | |||||||||
| Last Closed: | 2020-07-21 10:31:05 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | 1854610 | ||||||||||
| Bug Blocks: | 1854997 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Tyler Lisowski
2020-05-13 17:51:21 UTC
Created attachment 1688158 [details]
Kibana UI screenshot
Hello, is there any update on this ticket. Is this going to be fixed anytime soon. We have a customer who wants to use the this feature on IBM Redhat Openshift Clusters by the end of next week. Thank you. There are additional issues that I discovered independently: * all CLO pods are running fine. oc get pods -n openshift-logging NAME READY STATUS RESTARTS AGE cluster-logging-operator-7647fbdfbf-gzbwq 1/1 Running 0 2d19h curator-1589826600-dg5tf 0/1 Completed 0 6m32s elasticsearch-cdm-d9bzp7fs-1-549fd8d989-7wpdq 2/2 Running 0 2d19h fluentd-t72c9 1/1 Running 0 2d19h fluentd-xfvg6 1/1 Running 0 2d19h fluentd-zvkf6 1/1 Running 0 2d19h kibana-7479c479cc-mwmqj 2/2 Running 0 2d19h I see elastic search getting the documents but see lots of exception mesagees in elasticsearch logs. Whats even more wierd is elastic search is creating all the document entries, but only some of the messages are correct and rest are set to null. Not sure if its a ROKS issue or vanilla OS issue at this point. Here is an excerpt from the elastic search> Elastic-search log [2020-05-18T03:32:05,231][WARN ][c.f.s.a.BackendRegistry ] Authentication finally failed for null [2020-05-18T03:32:16,379][ERROR][i.f.e.p.OpenshiftAPIService] Error retrieving username from token okhttp3.internal.http2.StreamResetException: stream was reset: PROTOCOL_ERROR at okhttp3.internal.http2.Http2Stream.takeHeaders(Http2Stream.java:158) ~[okhttp-3.12.6.jar:?] at okhttp3.internal.http2.Http2Codec.readResponseHeaders(Http2Codec.java:131) ~[okhttp-3.12.6.jar:?] at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88) ~[okhttp-3.12.6.jar:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[okhttp-3.12.6.jar:?] at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) ~[okhttp-3.12.6.jar:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[okhttp-3.12.6.jar:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) ~[okhttp-3.12.6.jar:?] at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) ~[okhttp-3.12.6.jar:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) ~[okhttp-3.12.6.jar:?] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.jav I log into elastic search node and get all the documents in my project index as follows: curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key 'https://localhost:9200/project.appteam-one.d46b3da5-5b5e-4208-93d1-a5cde9c5f276.2020.05.18/_search?pretty=true' | grep -i sending "message" : "Sending message 59 : Hello world from hello-world2-654d89c79d-lqt84! blah blah blah!", "message" : "Sending message 60 : Hello world from hello-world2-654d89c79d-lqt84! blah blah blah!", "message" : "Sending message 61 : Hello world from hello-world2-654d89c79d-lqt84! blah blah blah!", "message" : "Sending message 113 : Hello world from hello-world2-654d89c79d-mbv4s! blah blah blah!", "message" : "Sending message 114 : Hello world from hello-world2-654d89c79d-mbv4s! blah blah blah!", 2:09 there should be 302 such entries (index count is correct at 302) but only the above docs are correct. rest are null. Hello folks - would someone be able to take a look/comment on the issue reported? Thanks in advance While it is a different version of OCP it would be the same version of Kibana/ES, I'm wondering if this is the same as what is being seen here: https://bugzilla.redhat.com/show_bug.cgi?id=1829062#c12 What page were you on when you received the original error in Kibana? Hey! The page is https://kibana-openshift-logging.cesar-oc-playground-1-9e37478581b5d9de33607f5926d1d18f-0000.us-south.stg.containers.appdomain.cloud/app/kibana#/discover?_g=() When testing Kibana, what user is being used to log in? (is this a predefined user like "kubeadmin"? if not, what sort of permissions does the user have?)
I see this in the ES logs
"[2020-05-20T22:58:09,202][ERROR][i.f.e.p.OpenshiftAPIService] Error retrieving username from token"
Which would explain why there are missing user permissions. Also I don't see any user Kibana indices in ES...
Tangentially, looking at the .kibana index, it seems like an index pattern for "logstash-*" was created, this will not match any indices to be viewed (not seeing expected IP is expected given the above error message)
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [
{
"_index" : ".kibana",
"_type" : "index-pattern",
"_id" : "AXI3oSsJOb_Wj8MybJOU",
"_score" : 1.0,
"_source" : {
"title" : "logstash-*",
"notExpandable" : true
}
},
{
"_index" : ".kibana",
"_type" : "config",
"_id" : "5.6.16",
"_score" : 1.0,
"_source" : {
"buildNum" : 15690,
"defaultIndex" : "AXI3oSsJOb_Wj8MybJOU"
}
}
]
}
}
so, turning the log level up i'm not seeing the kibana request get processed. i see the metrics endpoint and the other cluster members... for some reason we aren't getting a token for kibana... Going to investigate why that is occurring tomorrow @Tyler I will keep this on needsinfo until you report back your investigation results. @periklis in my cluster that I created independently from Tyler's cluster on stage, I see exceptions in the initialization of elastic search where it is not finding classes related to sgadmin and a message that sgadmin needs to be run. For your question on who is bringing up Kibana UI, both the cluster creation and UI bring up was by the same id (mine). I am not clearing the need info flag so that Tyler can add to my comments. Thanks Update with the logging team: we're still seeing protocol issues... elasticsearch is only configured to use TLS v1.1 and 1.2 (not sure if that's currently whats causing an issue). but we confirmed that its not anything related to the token that we get back because if i curl the k8s service with my bearer token i can hit my expected endpoint and get a valid response of who my user is... but our plugin isn't able to make that call They are looking into why specifically the Kibana plugin is experiencing authentication issues. They will provide an update when they are able to root cause on why that is occurring. Update with the logging team: We found the older version: `clusterlogging.4.3.16-202004240713` You can look and examine older versions with ``` Current versions seem to be 57.0.0 (newest) to 1.0.0 (oldest) curl https://quay.io/cnr/api/v1/packages/redhat-operators/cluster-logging/53.0.0 (note the digest) curl -XGET https://quay.io/cnr/api/v1/packages/redhat-operators/cluster-logging/blobs/sha256/41d7170cbca29fd933202053bfe525fcde7fd3546f64e31cc056f6eccfdede36 -o cluster-logging.tar.gz (digest is plugged in after sha) tar xvzf cluster-logging.tar.gz Then you can examine the ClusterServiceVersion and it's dependencies at that point in time. Tomorrow the team will look at the different between the newer and older version. To note: Release 53.0.0 (clusterlogging.4.3.16-202004240713) is created on May 4th ``` created_at":"2020-05-04T11:58:01", ``` The next release is created May 12th: ``` 2020-05-12T05 ``` Which I expect is when the regression might have first been introduced. I am not familiar with a way to deploy the old version yet but I believe that version was working. ```
#! validate-crd: deploy/chart/templates/0000_30_02-clusterserviceversion.crd.yaml
#! parse-kind: ClusterServiceVersion
apiVersion: operators.coreos.com/v1alpha1
kind: ClusterServiceVersion
metadata:
# The version value is substituted by the ART pipeline
name: clusterlogging.4.3.16-202004240713
namespace: openshift-logging
annotations:
capabilities: Seamless Upgrades
categories: "OpenShift Optional, Logging & Tracing"
certified: "false"
description: |-
The Cluster Logging Operator for OKD provides a means for configuring and managing your aggregated logging stack.
containerImage: registry.redhat.io/openshift4/ose-cluster-logging-operator@sha256:648b96c77f8b0068bd8323a092cf06793ebd7566046a6ffb88af1d7fabadeaa3
createdAt: 2018-08-01T08:00:00Z
support: AOS Logging
# The version value is substituted by the ART pipeline
olm.skipRange: ">=4.1.0 <4.3.16-202004240713"
alm-examples: |-
[
{
"apiVersion": "logging.openshift.io/v1",
"kind": "ClusterLogging",
"metadata": {
"name": "instance",
"namespace": "openshift-logging"
},
"spec": {
"managementState": "Managed",
"logStore": {
"type": "elasticsearch",
"elasticsearch": {
"nodeCount": 3,
"redundancyPolicy": "SingleRedundancy",
"storage": {
"storageClassName": "gp2",
"size": "200G"
}
}
},
"visualization": {
"type": "kibana",
"kibana": {
"replicas": 1
}
},
"curation": {
"type": "curator",
"curator": {
"schedule": "30 3 * * *"
}
},
"collection": {
"logs": {
"type": "fluentd",
"fluentd": {}
}
}
}
},
{
"apiVersion": "logging.openshift.io/v1alpha1",
"kind": "LogForwarding",
"metadata": {
"name": "instance",
"namespace": "openshift-logging"
},
"spec": {
"outputs": [
{
"name": "clo-default-output-es",
"type": "elasticsearch",
"endpoint": "elasticsearch.openshift-logging.svc:9200",
"secret": {
"name": "elasticsearch"
}
}
],
"pipelines": [
{
"name": "clo-default-app-pipeline",
"inputSource": "logs.app",
"outputRefs": ["clo-managaged-output-es"]
},
{
"name": "clo-default-infra-pipeline",
"inputSource": "logs.app",
"outputRefs": ["clo-managaged-output-es"]
}
]
}
}
]
spec:
relatedImages:
- name: ose-cluster-logging-operator
image: registry.redhat.io/openshift4/ose-cluster-logging-operator@sha256:648b96c77f8b0068bd8323a092cf06793ebd7566046a6ffb88af1d7fabadeaa3
- name: ose-logging-curator5
image: registry.redhat.io/openshift4/ose-logging-curator5@sha256:da8943a7eacfd34ac8687ae607e11fb1ad1f538e4bdcae95f3ed70039be72f04
- name: ose-logging-elasticsearch5
image: registry.redhat.io/openshift4/ose-logging-elasticsearch5@sha256:f02e4f75617b706d9b8e2dc06777aa572a443ccc3dd604ce4c21667f55725435
- name: ose-logging-fluentd
image: registry.redhat.io/openshift4/ose-logging-fluentd@sha256:a43ba2606777a8b6e3a45443bac1ae697600731b34c2abb84e35624ed8ef0270
- name: ose-logging-kibana5
image: registry.redhat.io/openshift4/ose-logging-kibana5@sha256:8f3dc6d2e8c80fce660f65c3c7be1330d6a7b73d003998be8c333e993ccafc78
- name: ose-oauth-proxy
image: registry.redhat.io/openshift4/ose-oauth-proxy@sha256:5fc02d6d99203f2d437068315434b5ca926b992ec02e686ae8b47fbc5ddc89a1
- name: ose-promtail
image: registry.redhat.io/openshift4/ose-promtail@sha256:1264aa92ebc6cccf46da3a35fbb54421b806dda5640c7e9706e6e815d13f509d
# The version value is substituted by the ART pipeline
version: 4.3.16-202004240713
displayName: Cluster Logging
minKubeVersion: 1.16.0
description: |
# Cluster Logging
The Cluster Logging Operator orchestrates and manages the aggregated logging stack as a cluster-wide service.
##Features
* **Create/Destroy**: Launch and create an aggregated logging stack to support the entire OKD cluster.
* **Simplified Configuration**: Configure your aggregated logging cluster's structure like components and end points easily.
## Prerequisites and Requirements
### Cluster Logging Namespace
Cluster logging and the Cluster Logging Operator is only deployable to the **openshift-logging** namespace. This namespace
must be explicitly created by a cluster administrator (e.g. `oc create ns openshift-logging`). To enable metrics
service discovery add namespace label `openshift.io/cluster-monitoring: "true"`.
For additional installation documentation see [Deploying cluster logging](https://docs.openshift.com/container-platform/4.1/logging/efk-logging-deploying.html)
in the OpenShift product documentation.
### Elasticsearch Operator
The Elasticsearch Operator is responsible for orchestrating and managing cluster logging's Elasticsearch cluster. This
operator must be deployed to the global operator group namespace
### Memory Considerations
Elasticsearch is a memory intensive application. Cluster Logging will specify that each Elasticsearch node needs
16G of memory for both request and limit unless otherwise defined in the ClusterLogging custom resource. The initial
set of OKD nodes may not be large enough to support the Elasticsearch cluster. Additional OKD nodes must be added
to the OKD cluster if you desire to run with the recommended(or better) memory. Each ES node can operate with a
lower memory setting though this is not recommended for production deployments.
keywords: ['elasticsearch', 'kibana', 'fluentd', 'logging', 'aggregated', 'efk']
maintainers:
- name: Red Hat
email: aos-logging
provider:
name: Red Hat, Inc
links:
- name: Elastic
url: https://www.elastic.co/
- name: Fluentd
url: https://www.fluentd.org/
- name: Documentation
url: https://github.com/openshift/cluster-logging-operator/blob/master/README.md
- name: Cluster Logging Operator
url: https://github.com/openshift/cluster-logging-operator
installModes:
- type: OwnNamespace
supported: true
- type: SingleNamespace
supported: true
- type: MultiNamespace
supported: false
- type: AllNamespaces
supported: false
install:
strategy: deployment
spec:
permissions:
- serviceAccountName: cluster-logging-operator
rules:
- apiGroups:
- logging.openshift.io
resources:
- "*"
verbs:
- "*"
- apiGroups:
- ""
resources:
- pods
- services
- endpoints
- persistentvolumeclaims
- events
- configmaps
- secrets
- serviceaccounts
- serviceaccounts/finalizers
verbs:
- "*"
- apiGroups:
- apps
resources:
- deployments
- daemonsets
- replicasets
- statefulsets
verbs:
- "*"
- apiGroups:
- route.openshift.io
resources:
- routes
- routes/custom-host
verbs:
- "*"
- apiGroups:
- batch
resources:
- cronjobs
verbs:
- "*"
- apiGroups:
- rbac.authorization.k8s.io
resources:
- roles
- rolebindings
verbs:
- "*"
- apiGroups:
- security.openshift.io
resources:
- securitycontextconstraints
resourceNames:
- privileged
verbs:
- use
- apiGroups:
- monitoring.coreos.com
resources:
- servicemonitors
- prometheusrules
verbs:
- "*"
clusterPermissions:
- serviceAccountName: cluster-logging-operator
rules:
- apiGroups:
- console.openshift.io
resources:
- consoleexternalloglinks
verbs:
- "*"
- apiGroups:
- scheduling.k8s.io
resources:
- priorityclasses
verbs:
- "*"
- apiGroups:
- oauth.openshift.io
resources:
- oauthclients
verbs:
- "*"
- apiGroups:
- rbac.authorization.k8s.io
resources:
- clusterroles
- clusterrolebindings
verbs:
- "*"
- apiGroups:
- config.openshift.io
resources:
- proxies
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- pods
- namespaces
- services
- services/finalizers
verbs:
- get
- list
- watch
deployments:
- name: cluster-logging-operator
spec:
replicas: 1
selector:
matchLabels:
name: cluster-logging-operator
template:
metadata:
labels:
name: cluster-logging-operator
spec:
serviceAccountName: cluster-logging-operator
containers:
- name: cluster-logging-operator
image: registry.redhat.io/openshift4/ose-cluster-logging-operator@sha256:648b96c77f8b0068bd8323a092cf06793ebd7566046a6ffb88af1d7fabadeaa3
imagePullPolicy: IfNotPresent
command:
- cluster-logging-operator
env:
- name: WATCH_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.annotations['olm.targetNamespaces']
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: OPERATOR_NAME
value: "cluster-logging-operator"
- name: ELASTICSEARCH_IMAGE
value: "registry.redhat.io/openshift4/ose-logging-elasticsearch5@sha256:f02e4f75617b706d9b8e2dc06777aa572a443ccc3dd604ce4c21667f55725435"
- name: FLUENTD_IMAGE
value: "registry.redhat.io/openshift4/ose-logging-fluentd@sha256:a43ba2606777a8b6e3a45443bac1ae697600731b34c2abb84e35624ed8ef0270"
- name: KIBANA_IMAGE
value: "registry.redhat.io/openshift4/ose-logging-kibana5@sha256:8f3dc6d2e8c80fce660f65c3c7be1330d6a7b73d003998be8c333e993ccafc78"
- name: CURATOR_IMAGE
value: "registry.redhat.io/openshift4/ose-logging-curator5@sha256:da8943a7eacfd34ac8687ae607e11fb1ad1f538e4bdcae95f3ed70039be72f04"
- name: OAUTH_PROXY_IMAGE
value: "registry.redhat.io/openshift4/ose-oauth-proxy@sha256:5fc02d6d99203f2d437068315434b5ca926b992ec02e686ae8b47fbc5ddc89a1"
- name: PROMTAIL_IMAGE
value: "registry.redhat.io/openshift4/ose-promtail@sha256:1264aa92ebc6cccf46da3a35fbb54421b806dda5640c7e9706e6e815d13f509d"
customresourcedefinitions:
owned:
- name: clusterloggings.logging.openshift.io
version: v1
kind: ClusterLogging
displayName: Cluster Logging
description: A Cluster Logging instance
resources:
- kind: Deployment
version: v1
- kind: DaemonSet
version: v1
- kind: CronJob
version: v1beta1
- kind: ReplicaSet
version: v1
- kind: Pod
version: v1
- kind: ConfigMap
version: v1
- kind: Secret
version: v1
- kind: Service
version: v1
- kind: Route
version: v1
- kind: Elasticsearch
version: v1
- kind: LogForwarding
version: v1alpha1
- kind: Collector
version: v1alpha1
specDescriptors:
- description: The desired number of Kibana Pods for the Visualization component
displayName: Kibana Size
path: visualization.kibana.replicas
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:podCount'
- description: Resource requirements for the Kibana pods
displayName: Kibana Resource Requirements
path: visualization.kibana.resources
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:resourceRequirements'
- description: The node selector to use for the Kibana Visualization component
displayName: Kibana Node Selector
path: visualization.kibana.nodeSelector
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:nodeSelector'
- description: The desired number of Elasticsearch Nodes for the Log Storage component
displayName: Elasticsearch Size
path: logStore.elasticsearch.nodeCount
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:podCount'
- description: Resource requirements for each Elasticsearch node
displayName: Elasticsearch Resource Requirements
path: logStore.elasticsearch.resources
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:resourceRequirements'
- description: The node selector to use for the Elasticsearch Log Storage component
displayName: Elasticsearch Node Selector
path: logStore.elasticsearch.nodeSelector
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:nodeSelector'
- description: Resource requirements for the Fluentd pods
displayName: Fluentd Resource Requirements
path: collection.logs.fluentd.resources
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:resourceRequirements'
- description: The node selector to use for the Fluentd log collection component
displayName: Fluentd node selector
path: collection.logs.fluentd.nodeSelector
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:nodeSelector'
- description: The list of output targets that receive log messages
displayName: Forwarding Outputs
path: forwarding.outputs
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:forwardingOutputs'
- description: The list of mappings between log sources (e.g. application logs) and forwarding outputs
displayName: Forwarding Pipelines
path: forwarding.pipelines
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:forwardingPipelines'
- description: Resource requirements for the Curator pods
displayName: Curator Resource Requirements
path: curation.curator.resources
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:resourceRequirements'
- description: The node selector to use for the Curator component
displayName: Curator Node Selector
path: curation.curator.nodeSelector
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:nodeSelector'
- description: The cron schedule for the Curator component
displayName: Curation Schedule
path: curation.curator.schedule
statusDescriptors:
- description: The status for each of the Kibana pods for the Visualization component
displayName: Kibana Status
path: visualization.kibanaStatus.pods
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:podStatuses'
- description: The status for each of the Elasticsearch Client pods for the Log Storage component
displayName: Elasticsearch Client Pod Status
path: logStore.elasticsearchStatus.pods.client
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:podStatuses'
- description: The status for each of the Elasticsearch Data pods for the Log Storage component
displayName: Elasticsearch Data Pod Status
path: logStore.elasticsearchStatus.pods.data
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:podStatuses'
- description: The status for each of the Elasticsearch Master pods for the Log Storage component
displayName: Elasticsearch Master Pod Status
path: logStore.elasticsearchStatus.pods.master
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:podStatuses'
- description: The cluster status for each of the Elasticsearch Clusters for the Log Storage component
displayName: Elasticsearch Cluster Health
path: logStore.elasticsearchStatus.clusterHealth
- description: The status for each of the Fluentd pods for the Log Collection component
displayName: Fluentd status
path: collection.logs.fluentdStatus.pods
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:podStatuses'
- description: The status for migration of a clusterlogging instance
displayName: Fluentd status
path: migration
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:migrationStatus'
- name: logforwardings.logging.openshift.io
version: v1alpha1
kind: LogForwarding
displayName: Log Forwarding
description: Log forwarding spec to define destinations for specific log sources
specDescriptors:
- description: The list of output targets that receive log messages
displayName: Forwarding Outputs
path: outputs
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:forwardingOutputs'
- description: The list of mappings between log sources (e.g. application logs) and forwarding outputs
displayName: Forwarding Pipelines
path: pipelines
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:forwardingPipelines'
statusDescriptors:
- description: The status of the sources being collected
displayName: Source Status
path: sources
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:sourceStatuses'
- description: The status of forwarding outputs
displayName: Outputs Status
path: outputs
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:outputStatuses'
- description: The status of forwarding pipelines
displayName: Pipelines Status
path: pipelines
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:pipelineStatuses'
- description: The status of log forwarding resourece
displayName: Log Forwarding Status
path: status
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:logforwardingStatuses'
- name: collectors.logging.openshift.io
version: v1alpha1
kind: Collector
displayName: Log Collector
description: Log Collector spec to define log collection
specDescriptors:
- description: The type of log collector
displayName: Collector type
path: type
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:collectorType'
```
That is the cluster service version for 4.3.16
Update can prove that the old one works by
Ensuring the olm-operator stays scaled down so the old cluster logging operator does not get destroyed
```
watch -n 5 kubectl -n openshift-operator-lifecycle-manager scale --replicas 0 deploy olm-operator
```
Deleting the existing ClusterLogging Solution
```
kubectl delete clusterlogging -n openshift-logging instance
```
Let everything clean up. Then apply the older cluster logging operator
```
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
labels:
name: cluster-logging-operator
namespace: openshift-logging
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
name: cluster-logging-operator
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
alm-examples: |-
[
{
"apiVersion": "logging.openshift.io/v1",
"kind": "ClusterLogging",
"metadata": {
"name": "instance",
"namespace": "openshift-logging"
},
"spec": {
"managementState": "Managed",
"logStore": {
"type": "elasticsearch",
"elasticsearch": {
"nodeCount": 3,
"redundancyPolicy": "SingleRedundancy",
"storage": {
"storageClassName": "gp2",
"size": "200G"
}
}
},
"visualization": {
"type": "kibana",
"kibana": {
"replicas": 1
}
},
"curation": {
"type": "curator",
"curator": {
"schedule": "30 3 * * *"
}
},
"collection": {
"logs": {
"type": "fluentd",
"fluentd": {}
}
}
}
},
{
"apiVersion": "logging.openshift.io/v1alpha1",
"kind": "LogForwarding",
"metadata": {
"name": "instance",
"namespace": "openshift-logging"
},
"spec": {
"outputs": [
{
"name": "clo-default-output-es",
"type": "elasticsearch",
"endpoint": "elasticsearch.openshift-logging.svc:9200",
"secret": {
"name": "elasticsearch"
}
}
],
"pipelines": [
{
"name": "clo-default-app-pipeline",
"inputSource": "logs.app",
"outputRefs": ["clo-managaged-output-es"]
},
{
"name": "clo-default-infra-pipeline",
"inputSource": "logs.app",
"outputRefs": ["clo-managaged-output-es"]
}
]
}
}
]
capabilities: Seamless Upgrades
categories: OpenShift Optional, Logging & Tracing
certified: "false"
containerImage: registry.redhat.io/openshift4/ose-cluster-logging-operator@sha256:2e08105b56f4f3d2f1842fdc13571720aa36754e64885ccf55987ba69a14a079
createdAt: "2018-08-01T08:00:00Z"
description: The Cluster Logging Operator for OKD provides a means for configuring
and managing your aggregated logging stack.
olm.operatorGroup: openshift-logging-92zpg
olm.operatorNamespace: openshift-logging
olm.skipRange: '>=4.1.0 <4.3.20-202005121847'
olm.targetNamespaces: openshift-logging
support: AOS Logging
labels:
name: cluster-logging-operator
spec:
containers:
- command:
- cluster-logging-operator
env:
- name: WATCH_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.annotations['olm.targetNamespaces']
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: OPERATOR_NAME
value: cluster-logging-operator
- name: ELASTICSEARCH_IMAGE
value: registry.redhat.io/openshift4/ose-logging-elasticsearch5@sha256:f02e4f75617b706d9b8e2dc06777aa572a443ccc3dd604ce4c21667f55725435
- name: FLUENTD_IMAGE
value: registry.redhat.io/openshift4/ose-logging-fluentd@sha256:a43ba2606777a8b6e3a45443bac1ae697600731b34c2abb84e35624ed8ef0270
- name: KIBANA_IMAGE
value: registry.redhat.io/openshift4/ose-logging-kibana5@sha256:8f3dc6d2e8c80fce660f65c3c7be1330d6a7b73d003998be8c333e993ccafc78
- name: CURATOR_IMAGE
value: registry.redhat.io/openshift4/ose-logging-curator5@sha256:da8943a7eacfd34ac8687ae607e11fb1ad1f538e4bdcae95f3ed70039be72f04
- name: OAUTH_PROXY_IMAGE
value: registry.redhat.io/openshift4/ose-oauth-proxy@sha256:5fc02d6d99203f2d437068315434b5ca926b992ec02e686ae8b47fbc5ddc89a1
- name: PROMTAIL_IMAGE
value: registry.redhat.io/openshift4/ose-promtail@sha256:1264aa92ebc6cccf46da3a35fbb54421b806dda5640c7e9706e6e815d13f509d
image: registry.redhat.io/openshift4/ose-cluster-logging-operator@sha256:648b96c77f8b0068bd8323a092cf06793ebd7566046a6ffb88af1d7fabadeaa3
imagePullPolicy: IfNotPresent
name: cluster-logging-operator
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: cluster-logging-operator
serviceAccountName: cluster-logging-operator
terminationGracePeriodSeconds: 30
```
Then create a new Logging Instance and let it initialize
Once it initializes logs will start flowing again: so this likely points to a regression somewhere between the 4.16 code and the 4.19 code
Logging team is investigating the code delta between the versions and trying to pinpoint the change that caused it Hi folks - any recent news on how this is going? Btw, I have another customer case reporting the same issue, following these steps: https://cloud.ibm.com/docs/openshift?topic=openshift-health#oc_logging_operator "[object Object]: [security_exception] no permissions for [indices:data/read/field_caps] and User [name=CN=system.logging.kibana,OU=OpenShift,O=Logging, roles=[]]" - this client said they were using CLO v4.3.20-202005121847 when they hit the error - they said they were able to get cluster logging working with CLO v4.2.29-202004140532 does this tally with your understanding? also, just for my own knowledge, what are the steps to install a lower version of the CLO than the default version available from operator hub, and is this then an environment we would support? [if they move from the defaults] thanks a lot. Hi folks, any update ? Thank you. Hi folks, any update ? Thank you. Hi folks, any update ? Thank you. I believe this to be a session and cookie issue as observed: https://bugzilla.redhat.com/show_bug.cgi?id=1791837#c29 https://bugzilla.redhat.com/show_bug.cgi?id=1791837#c30 Closing NOTABUG Reopening given that solutions in other bz's have not worked in this case. Clearing all cookies and using incognito mode still results in the same error in the kibana UI. Looked at this a little bit more and narrowed the issue down to the elasticsearch container. There is no difference in elasticsearch version (or plugin jar versions for that matter) between the one that works and the one that doesn't. However, the JVM is different. The one that works: java-1.8.0-openjdk-1.8.0.242.b08-1.el7.x86_64 The one that doesn't work: java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64 Which may explain the error mentioned above of: [2020-05-18T03:32:16,379][ERROR][i.f.e.p.OpenshiftAPIService] Error retrieving username from token okhttp3.internal.http2.StreamResetException: stream was reset: PROTOCOL_ERROR in the elasticsearch log. This may need to be resolved by ART. Moving to UpcomingSprint Created openjdk issue: https://issues.redhat.com/browse/OPENJDK-114 Hello, this is from IBM Cloud Support. We still have the issue with one of the customers, using IBM Cloud ROKS 4.3.12_1520_openshift. It is said that the root cause is JDK version that elasticsearch container used. The one that works: java-1.8.0-openjdk-1.8.0.242.b08-1.el7.x86_64 The one that doesn't work: java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64 https://issues.redhat.com/browse/OPENJDK-114 But the issue still is not resolved. Can you investigate further? Please help to push Red Hat to update the elasticsearch image. The elasticsearch image in ROKS 4.3 is using openjdk 1.8.0.252. @Jeff Cantrill @Cesar Wong, Could you investigate the issue based on the last comment? Let us know if you need any more information from us. Putting this back to low, because not a blocker for 4.5. Relevant workarounds provided by [1] need to reevaluated. <4.5 releases use Kibana 5 related in this issue. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1835396#c21 @Periklis - I don't think this is accurate. The errors are seen immediately upon accessing Kibana using the 4.3-based CLO. This isn't a scenario of log in, wait a while for a token to expire, and then try to refresh the page or access the dashboard links again: It straight-up fails immediately. What I mean above is that the linked workarounds in that comment do *not* work in this 4.3 case. I'm sure the statements on 4.5 are accurate. ;) Is the full log available for that elastic search error. It sounds like TLS is being used for this? I wonder if you are seeing: https://github.com/square/okhttp/issues/5970 https://github.com/square/okhttp/pull/5971/files Created attachment 1699378 [details]
elasticsearch log
Jason, that looks like it could be this bug. I am attaching the logs from one of the elastic search nodes on from a 4.3.23 ROKS cluster.
The bug above has been addressed higher up in the stack in the kubernetes-client: https://github.com/fabric8io/kubernetes-client/pull/2227 I can confirm that running with a version of the elasticsearch plugin that includes the fix above does fix the issue. @Cesar Wong, do you have any way to upgrade the kubernetes-client in the IBM Cloud ROCK cluster v4.3? Excellent. Glad to hear the update addresses the issue. @tnakajo.com - I've now submitted a PR to bump the k8s client version in the elastic search plugin https://github.com/fabric8io/openshift-elasticsearch-plugin/pull/190 What I did to test it is to build the plugin from that repo, and then injected that build into a new image for elasticsearch. Once the PR above merges, the elastic search build will need to be updated to pull that in to the image. @Cesar Wong It seems the PR above has been merged. Can you proceed further? Hello, this is from IBM Cloud Support. I have another customer reporting the same issue using IBM Cloud ROKS 4.3 openshift. Could you provide an ETA for the bug fixed? @tnakajo Fixing this issue seems to be only merging the PR and integrating it in elasticsearch images. AFAICS, the fix targets 4.3.z and thus it needs to land first in 4.4.z and then in 4.3.z. The streams open and close weekly, e.g. 4.4.z by tomorrow. Thus we need to wait at least for the next cycle. Put "Upcoming Sprint", because the fix will not land on this weeks 4.3.z release, thus next releases are earliest in next sprint. @Periklis Tsirakidis @Cesar Wong Can you tell us the current status? Will the fix be mearged on 4.3.z release next week? FYI: We have 4 customers waiting for the fix now. By the way, once the fix 4.3.z is ready, how does the user update on their cluster? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2913 @Periklis Tsirakidis @Cesar Wong @Jeff Cantrill It seems the fix has been merged into 4.4.z. Is the fix scheduled to marge into 4.3.z in the next sprint? The BZ chain tell me that the 4.3 is also merged. Have a look here https://bugzilla.redhat.com/show_bug.cgi?id=1854997 |