Bug 1748478

Summary: readiness probe could show an invalid message in some conditions.
Product: OpenShift Container Platform Reporter: German Parente <gparente>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: agawand, anisal, aos-bugs, ddo, grodrigu, jcantril, mirollin, ocasalsa, rmeggins, stwalter
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-23 11:05:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1795393    

Description German Parente 2019-09-03 16:40:57 UTC
Description of problem:

we can see sometimes this error:

I0902 10:31:51.476431   22783 prober.go:111] Readiness probe for "logging-es-xxxxxxxx-NN-yyyyy_logging(......):elasticsearch" failed (failure): cat: /opt/app-root/src/init_failures: No such file or directory

in fact, the readiness, at the end, does this check:

check_for_init_complete || cat ${HOME}/init_failures

But it could be possible that init.sh has not yet generated the file "${HOME}/init_complete" even if there are no errors yet, so, ${HOME}/init_failures is empty or non-existent as in this case.

We should take care of this situation and avoid the 

cat: /opt/app-root/src/init_failures: No such file or directory


Version-Release number of selected component (if applicable): atomic-openshift-3.11.98-1.git.0.0cbaff3.el7.x86_64


How reproducible: at customer site. 


Steps to Reproduce:
1. I guess we could put a "sleep X" in init.sh to force this message.

Comment 3 Greg Rodriguez II 2019-10-28 19:47:23 UTC
Added another customer experiencing this issue.  Is there a known workaround that has been developed?

Comment 5 Jeff Cantrill 2019-10-31 19:11:43 UTC
(In reply to Greg Rodriguez II from comment #4)
> Customer states this affecting production and would like any type of
> workaround or resolution

Delete the readiness probe from the Deployment.  You may have to manually seed the permissions 'oc exec -c elasticsearch -- es_seed_acl'

Comment 8 Greg Rodriguez II 2019-11-08 20:21:36 UTC
Customer is requesting update on this ticket.  Has there been any progress?

Comment 10 Anping Li 2019-11-14 10:47:50 UTC
Waiting another image

Comment 12 Anping Li 2019-11-24 10:01:27 UTC
Verified openshift/ose-logging-elasticsearch5:v4.3.0-201911220712

Comment 13 Greg Rodriguez II 2019-11-29 13:50:56 UTC
Are there any plans to port this fix to 4.2 in the near future?

Comment 15 errata-xmlrpc 2020-01-23 11:05:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062