Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1840274

Summary: During upgrade, if CLO upgrades before EO fluentd writes to *-write index instead of alias
Product: OpenShift Container Platform Reporter: ewolinet
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.5CC: aos-bugs, jcantril
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:01:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1843271    

Description ewolinet 2020-05-26 16:40:45 UTC
Description of problem:
During upgrade from 4.4 -> 4.5, if CLO upgrades before EO does, fluentd will be updated to write to a '*-write' endpoint which should be an alias but if EO has not yet created the alias then fluent will cause a write index to be created instead. Then when EO upgrades this can cause issues with index management or loss of data that had been written by fluentd already.

Version-Release number of selected component (if applicable):
4.4 -> 4.5

How reproducible:
Always

Steps to Reproduce:
1. Upgrade CLO
2. Check ES indices

Actual results:
Fluentd causes write index to be created, incorrectly

Expected results:
Fluentd should wait until the alias is in place and then proceed to push its logs

Additional info:

Comment 4 Anping Li 2020-06-23 03:51:39 UTC
1) should be export the PATH ruby and Library libruby.so.2.5
$oc logs fluentd-k6t8c -c fluentd-init
./wait_for_es_version.sh: line 3: ruby: command not found

$docker run -it --entrypoint /opt/rh/rh-ruby25/root/usr/bin/ruby ose-logging-fluentd:v4.5.0 --help
/opt/rh/rh-ruby25/root/usr/bin/ruby: error while loading shared libraries: libruby.so.2.5: cannot open shared object file: No such file or directory

2) wait_for_es_version.sh shouldn't be executed when deploying fluentd only.

Comment 6 Anping Li 2020-06-24 01:49:52 UTC
Verified on the CI images

1) Upgrade CLO  to 4.6. one fluend is Init:CrashLoopBackOff. 
$oc get pods
fluentd-2cxs9                                   1/1     Running                 0          7m42s
fluentd-2mkwn                                   1/1     Running                 0          7m42s
fluentd-c6vs2                                   1/1     Running                 0          7m42s
fluentd-fkdmv                                   1/1     Running                 0          7m42s
fluentd-qcvgv                                   1/1     Running                 0          7m42s
fluentd-rtcsn                                   1/1     Running                 0          7m42s
fluentd-vn8fd                                   0/1     Init:CrashLoopBackOff   5          4m33s
$ oc logs fluentd-vn8fd -c fluentd-init
Elasticsearch is currently version: 5.6.16 - Expecting it to be at least: 6
2) Upgrade EO to 4.6. The ES pods are not Ready during upgrade. no data are received, no -write index.

3) After EO upgrade, the infra-000001, app-000001 are created in ES cluster. The fluentd start to upgrade. The doc.count increase in the old indices(.operatation.xxx and project.xxx indices) and new indices(infra-000001 and app-000001).

Comment 8 errata-xmlrpc 2020-10-27 16:01:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196