Bug 1503563

Summary: Logging upgrade from 3.5 to 3.6 fails with "Exception in thread "main" java.lang.IllegalArgumentException: Unknown Discovery type [kubernetes]"
Product: OpenShift Container Platform Reporter: Peter Portante <pportant>
Component: LoggingAssignee: Jan Wozniak <jwozniak>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: high Docs Contact:
Priority: high    
Version: 3.6.0CC: aos-bugs, jcantril, jwozniak, pdwyer, rmeggins, rromerom, stwalter, tatanaka, xtian
Target Milestone: ---Keywords: OpsBlocker
Target Release: 3.6.z   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-07 07:13:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Peter Portante 2017-10-18 12:01:10 UTC
Image version: registry.ops.openshift.com/openshift3/logging-elasticsearch:v3.6.173.0.21

[root@yocum-test-5-master-151b5 ~]# oc logs -f logging-es-s6yxhcz4-3-rvq96
[2017-10-18 10:46:16,055][INFO ][container.run            ] Begin Elasticsearch startup script
[2017-10-18 10:46:16,066][INFO ][container.run            ] Comparing the specified RAM to the maximum recommended for Elasticsearch...
[2017-10-18 10:46:16,067][INFO ][container.run            ] Inspecting the maximum RAM available...
[2017-10-18 10:46:16,070][INFO ][container.run            ] ES_HEAP_SIZE: '5632m'
[2017-10-18 10:46:16,072][INFO ][container.run            ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof
[2017-10-18 10:46:16,074][INFO ][container.run            ] Checking if Elasticsearch is ready on https://localhost:9200
Exception in thread "main" java.lang.IllegalArgumentException: Unknown Discovery type [kubernetes]
	at org.elasticsearch.discovery.DiscoveryModule.configure(DiscoveryModule.java:100)
	at <<<guice>>>
	at org.elasticsearch.node.Node.<init>(Node.java:213)
	at org.elasticsearch.node.Node.<init>(Node.java:140)
	at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:143)
	at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:194)
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:286)
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:45)
Refer to the log for complete error details.

Comment 1 Jeff Cantrill 2017-10-18 13:13:27 UTC
Jan,

I know you responded in the email but can you provide information here.  Is this a result of new images being pulled without being officially deployed via ansible?  Is there some other scenario that can lead us to this situation?

Comment 2 Jan Wozniak 2017-10-18 13:42:10 UTC
This occurs when trying to deploy outdated ES image with the newest ansible (after label discovery and readiness probe were merged). At the time of writing, there are still a couple of images in our registries, that deserve a rebuild. The label discovery and readiness probe were merged in the second half of September and it should be contained in 3.6 and will be contained in 3.7 once released.

1) https://access.redhat.com/containers/?tab=tags#/registry.access.redhat.com/openshift3/logging-elasticsearch
Here 3.6 tag and latest have not been rebuilt in two months

2) https://hub.docker.com/r/openshift/origin-logging-elasticsearch/tags/
Here latest contains the proper library but 3.6 lacks update for three months


A fast fix could be either to get ES image built after mid-September when the feature was merged or remove readiness probe from ES.

https://github.com/openshift/openshift-ansible/issues/5497#issuecomment-331372471

Comment 3 Peter Portante 2017-10-18 15:22:43 UTC
If the openshift-ansible logging tasks are designed to work with a certain version of the logging images, why aren't those tasks requiring that minimum version be used?

Comment 4 Jan Wozniak 2017-10-18 15:32:51 UTC
I am not sure how to require 3.6 image built after mid-September. Readiness probe was requested to be backported to 3.6 but images containing this functionality weren't rebuilt yet.

Comment 6 Jeff Cantrill 2017-10-24 17:18:11 UTC
*** Bug 1505860 has been marked as a duplicate of this bug. ***

Comment 8 Peter Portante 2017-10-24 21:00:31 UTC
While it is certainly good to make a short-term release to address this problem, the long term problem is that for any number of valid reasons, the openshift-ansible playbooks can be told to install using a version of OpenShift for which those playbooks are not compatible.  This fact appears to be the core problem.

We need to engineer a way for the playbooks and images to work together to avoid these kinds of problems.  Please find a way to track this need via another BZ, an upstream issue in the repos, or Trello card.

Comment 9 Anping Li 2017-10-25 10:14:06 UTC
From test result, the readiness will not be added to ES.

The following scenarios pass:

1) openshift-ansible:v3.6.173.0.49 deploy v3.6.173.0.5
2) openshift-ansible:v3.6.173.0.49 deploy v3.6.173.0.49
3) openshift-ansible:v3.6.173.0.49 upgrade logging from 3.5.0 to v3.6.173.0.49.
4) openshift-ansible:v3.6.173.0.49 upgrade logging from the current latest release images ( v3.6.173.0.5: elasticsearch, v3.6.173.0.21: fluentd/kibana/auth-proxy) to v3.6.173.0.49.

Comment 10 Anping Li 2017-10-25 10:31:52 UTC
Please ignore the comment 9, I used the image openshift3/ose-ansible:v3.6.173.0.49.   I found the openshift3/ose-ansible:v3.6.173.0.49 is built with openshift-ansile-v3.6.173.0.5.

Comment 11 Anping Li 2017-10-25 12:26:37 UTC
The following scenario pass testing with openshift-ansible-v3.6.173.0.59. so move bug to verified.

Scenarios 1)
Deploy Logging v3.6.173.0.49 on OCP v3.6.173.0.49 

Scenarios 2)
Upgrade Logging 3.5.0 deployed by openshift-ansile-3.5.132 to OCP v3.6.173.0.49 on OCP v3.6.173.0.49 

By the way, If you want to deploy Elasticsearch:v3.6.173.0.5, you must use openshift-ansible-3.7.0.21 and prior.

Comment 12 Jan Wozniak 2017-11-07 11:29:36 UTC
this will provide partial solution, until we have a better way
https://github.com/openshift/origin-aggregated-logging/pull/758

Comment 20 Jeff Cantrill 2017-11-21 14:23:43 UTC
Relevant information on how to revert the discovery mechanism or disable the probe: 
https://github.com/openshift/openshift-ansible/issues/5497#issuecomment-331372471

Comment 23 errata-xmlrpc 2017-12-07 07:13:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3389

Comment 25 Red Hat Bugzilla 2023-09-15 00:04:34 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days