Bug 1703546
Summary: | Changing clusterlogging CR for ES does not trigger a new ES deployment in a timely fashion | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Mike Fiedler <mifiedle> | ||||
Component: | Logging | Assignee: | Josef Karasek <jkarasek> | ||||
Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.1.0 | CC: | aos-bugs, ewolinet, jcantril, jmalde, rmeggins | ||||
Target Milestone: | --- | Keywords: | BetaBlocker | ||||
Target Release: | 4.1.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | aos-scalability-41,logging-core | ||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-06-04 10:48:05 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Mike Fiedler
2019-04-26 16:35:57 UTC
Created attachment 1559255 [details]
logging operator logs and CR yaml
Mike, Are there any messages in the EO logs that refer to being unable to upgrade or waiting for a particular state? I was unable to recreate this with an image built from master (earlier today on my local system). I did not see the EO log move at all when I saved the updated clusterlogging CR. Let me install from today's puddle and give it another try. The EO logs are in the attachment on this bz in the meantime. I re-ran this with the latest images as served by OperatorHub and I do see one message pop in the EO log. The diff of the ES operator pod before/after the scenario is: > time="2019-04-30T01:43:34Z" level=warning msg="Unable to perform synchronized flush: <nil>" The diff of the cluster logging op logs before/after the scenario is: > time="2019-04-30T01:42:57Z" level=info msg="Elasticsearch resources change found, updating elasticsearch" > time="2019-04-30T01:43:16Z" level=info msg="Elasticsearch resources change found, updating elasticsearch" Let me know if there is anything else that might help. Modified the summary to reflect the cause. The issue is that it is taking a long time for the operator to respect the change because it forces Elasticsearch to do a sync'd flush. This operation pushes inmemory data to disk which can take time depending on: cluster size, amount of data, ingestion rate, etc. Verified on: quay.io/openshift/origin-elasticsearch-operator@sha256:59d5e2e988573428c0474c96c25d0fc48e0f80f64b657e5e2618b4372239a605 version 4.1.0-0.nightly-2019-05-02-131943 True False 6h51m Cluster version is 4.1.0-0.nightly-2019-05-02-131943 elasticsearch redeployed with reasonable messages in the operator log Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |