1439554 – upgrade fails with error Could not resolve placeholder 'MIN_MASTERS'

Bug 1439554 - upgrade fails with error Could not resolve placeholder 'MIN_MASTERS'

Summary: upgrade fails with error Could not resolve placeholder 'MIN_MASTERS'

Keywords:
Status:	CLOSED DUPLICATE of bug 1439356
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.4.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Jeff Cantrill
QA Contact:	Xia Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1440245
TreeView+	depends on / blocked

Reported:	2017-04-06 08:49 UTC by Ruben Romero Montes
Modified:	2020-07-16 09:23 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1440245 (view as bug list)
Environment:
Last Closed:	2017-04-17 20:59:21 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
migrate logs (486.96 KB, text/plain) 2017-04-07 14:47 UTC, Ruben Romero Montes	no flags	Details
logging configmap (3.47 KB, text/x-vhdl) 2017-04-07 15:07 UTC, Boris Kurktchiev	no flags	Details
View All

Description Ruben Romero Montes 2017-04-06 08:49:29 UTC

Description of problem:
When upgrading from 3.4.0 to 3.4.1-17 the following error is shown and the deployment fails.

Could not resolve placeholder 'MIN_MASTERS'

Version-Release number of selected component (if applicable):
Openshift 3.4.1.10
openshift3-logging-elasticsearch-3.4.1-17

How reproducible:
Always

Steps to Reproduce:
1. oc new-app logging-deployer-template --param IMAGE_VERSION=3.4.1 --param MODE=upgrade

Actual results:
The metrics fail to deploy because the ENV variable MIN_MASTERS is not present in the deployment config for logging-es-xxxx

Expected results:
The metrics are upgraded to the latest version and the deployment config includes the MIN_MASTERS environment variable defined.

Additional info:
By manually adding the variable, the deployment succeeds.
The most reliable workaround I found was to reinstall the metrics. I.E. MODE=reinstall

Comment 2 Jeff Cantrill 2017-04-06 15:03:18 UTC

Should be using this template: https://github.com/openshift/origin-aggregated-logging/blob/release-1.4/deployer/templates/es.yaml

Comment 5 Jeff Cantrill 2017-04-06 19:25:37 UTC

*** Bug 1439356 has been marked as a duplicate of this bug. ***

Comment 8 Jeff Cantrill 2017-04-07 12:15:12 UTC

The work around to resolve this issue is:

Determine the number of min masters by:

min_masters = floor("# ES cluster nodes" / 2) + 1 

1. oc edit dc $ES_DC_NAME
2. add env entry to container with name: 'MIN_MASTERS' value: $min_master
3. oc rollout latest $DC_NAME

Apply change to each deploymentconfig.

Fix for the deployer is forthcoming.

Comment 9 Boris Kurktchiev 2017-04-07 12:55:23 UTC

can this be applied and then re-run the deployer with mode=validate to make sure it did everything else that it needed to do and if there isnt things that were done would the deployer get it done?

Comment 10 Boris Kurktchiev 2017-04-07 12:59:54 UTC

So just tested this and it worked... sorta. Now there is a new error but that may have to go in its own bug?
Comparing the specificed RAM to the maximum recommended for ElasticSearch...
Inspecting the maximum RAM available...
ES_JAVA_OPTS: '-Dmapper.allow_dots_in_name=true -Xms128M -Xmx2048m'
Checking if Elasticsearch is ready on https://localhost:9200 ..[2017-04-07 12:58:00,722][INFO ][node                     ] [Wendigo] version[2.4.1], pid[1], build[945a6e0/2016-11-17T20:39:42Z]
[2017-04-07 12:58:00,725][INFO ][node                     ] [Wendigo] initializing ...
.[2017-04-07 12:58:01,659][INFO ][plugins                  ] [Wendigo] modules [reindex, lang-expression, lang-groovy], plugins [search-guard-ssl, openshift-elasticsearch, cloud-kubernetes, search-guard-2], sites []
[2017-04-07 12:58:01,709][INFO ][env                      ] [Wendigo] using [1] data paths, mounts [[/elasticsearch/persistent (v10itscampusdmz2193.isis.unc.edu:/ose_pv_infra_storage/logging)]], net usable_space [192.3gb], net total_space [200gb], spins? [possibly], types [nfs]
[2017-04-07 12:58:01,709][INFO ][env                      ] [Wendigo] heap size [1.9gb], compressed ordinary object pointers [true]
[2017-04-07 12:58:02,220][INFO ][http                     ] [Wendigo] Using [org.elasticsearch.http.netty.NettyHttpServerTransport] as http transport, overridden by [search-guard2]
[2017-04-07 12:58:02,382][INFO ][transport                ] [Wendigo] Using [com.floragunn.searchguard.transport.SearchGuardTransportService] as transport service, overridden by [search-guard2]
[2017-04-07 12:58:02,383][INFO ][transport                ] [Wendigo] Using [com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] as transport, overridden by [search-guard-ssl]
..[2017-04-07 12:58:03,644][WARN ][com.floragunn.searchguard.transport.SearchGuardTransportService] [Wendigo] registered two transport handlers for action cluster:admin/searchguard/config/update, handlers: com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@483b0690, com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@687e6293
[2017-04-07 12:58:03,645][WARN ][com.floragunn.searchguard.transport.SearchGuardTransportService] [Wendigo] registered two transport handlers for action cluster:admin/searchguard/config/update[n], handlers: com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@6870c3c2, com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@fb0a08c
[2017-04-07 12:58:03,645][WARN ][com.floragunn.searchguard.transport.SearchGuardTransportService] [Wendigo] registered two transport handlers for action cluster:admin/searchguard/config/update, handlers: com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@1faf386c, com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@483b0690
[2017-04-07 12:58:03,645][WARN ][com.floragunn.searchguard.transport.SearchGuardTransportService] [Wendigo] registered two transport handlers for action cluster:admin/searchguard/config/update[n], handlers: com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@4debbf0, com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@6870c3c2
[2017-04-07 12:58:03,645][WARN ][com.floragunn.searchguard.transport.SearchGuardTransportService] [Wendigo] registered two transport handlers for action cluster:admin/searchguard/config/update, handlers: com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@60e06f7d, com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@1faf386c
[2017-04-07 12:58:03,646][WARN ][com.floragunn.searchguard.transport.SearchGuardTransportService] [Wendigo] registered two transport handlers for action cluster:admin/searchguard/config/update[n], handlers: com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@66a5755, com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@4debbf0
[2017-04-07 12:58:03,648][WARN ][com.floragunn.searchguard.transport.SearchGuardTransportService] [Wendigo] registered two transport handlers for action cluster:admin/searchguard/config/update, handlers: com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@40c2ce52, com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@60e06f7d
[2017-04-07 12:58:03,648][WARN ][com.floragunn.searchguard.transport.SearchGuardTransportService] [Wendigo] registered two transport handlers for action cluster:admin/searchguard/config/update[n], handlers: com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@18a19e, com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@66a5755
[2017-04-07 12:58:03,648][WARN ][com.floragunn.searchguard.transport.SearchGuardTransportService] [Wendigo] registered two transport handlers for action cluster:admin/searchguard/config/update, handlers: com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@2e13f304, com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@40c2ce52
[2017-04-07 12:58:03,648][WARN ][com.floragunn.searchguard.transport.SearchGuardTransportService] [Wendigo] registered two transport handlers for action cluster:admin/searchguard/config/update[n], handlers: com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@787508ca, com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@18a19e
[2017-04-07 12:58:03,649][WARN ][com.floragunn.searchguard.transport.SearchGuardTransportService] [Wendigo] registered two transport handlers for action cluster:admin/searchguard/config/update, handlers: com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@3d24420b, com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@2e13f304
[2017-04-07 12:58:03,649][WARN ][com.floragunn.searchguard.transport.SearchGuardTransportService] [Wendigo] registered two transport handlers for action cluster:admin/searchguard/config/update[n], handlers: com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@6274670b, com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@787508ca
[2017-04-07 12:58:03,678][INFO ][plugins                  ] [Living Colossus] modules [], plugins [search-guard-ssl, search-guard2], sites []
[2017-04-07 12:58:03,718][INFO ][transport                ] [Living Colossus] Using [com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] as transport, overridden by [search-guard-ssl]
[2017-04-07 12:58:04,085][WARN ][com.floragunn.searchguard.transport.SearchGuardTransportService] [Wendigo] registered two transport handlers for action cluster:admin/searchguard/config/update, handlers: com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@1ab14636, com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@3d24420b
[2017-04-07 12:58:04,085][WARN ][com.floragunn.searchguard.transport.SearchGuardTransportService] [Wendigo] registered two transport handlers for action cluster:admin/searchguard/config/update[n], handlers: com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@16b3c905, com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@6274670b
[2017-04-07 12:58:04,085][WARN ][com.floragunn.searchguard.transport.SearchGuardTransportService] [Wendigo] registered two transport handlers for action cluster:admin/searchguard/config/update, handlers: com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@49fdbe2b, com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@1ab14636
[2017-04-07 12:58:04,085][WARN ][com.floragunn.searchguard.transport.SearchGuardTransportService] [Wendigo] registered two transport handlers for action cluster:admin/searchguard/config/update[n], handlers: com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@53eba4b8, com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor@16b3c905
Exception in thread "main" java.lang.IllegalStateException: This is a proxy used to support circular references involving constructors. The object we're proxying is not constructed yet. Please wait until after injection has completed to use this object.
	at <<<guice>>>
	at com.sun.proxy.$Proxy12.addLifecycleListener(Unknown Source)
	at com.floragunn.searchguard.action.configupdate.TransportConfigUpdateAction.<init>(TransportConfigUpdateAction.java:87)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at <<<guice>>>
	at org.elasticsearch.node.Node.<init>(Node.java:213)
	at org.elasticsearch.node.Node.<init>(Node.java:140)
	at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:143)
	at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:194)
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:286)
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)


Also I want to point out that ES_JAVA_OPTS: '-Dmapper.allow_dots_in_name=true -Xms128M -Xmx2048m' even though INSTANCE_RAM=4GB in the DC

Comment 11 Ruben Romero Montes 2017-04-07 14:47:43 UTC

Created attachment 1269787 [details]
migrate logs

Comment 12 Jeff Cantrill 2017-04-07 15:01:10 UTC

Please limit this BZ to the originally reported issue and open new ones as appropriate.  Additionally, it would be significantly more helpful if you attach stack traces instead of including them in the comments.

The issue is the migrate script was not updated to account for the min master change.  Subsequently you see the reported issue.  The resolution is to remove the 'zen.minimum_master_nodes' setting from the logging-elasticsearch configmap or edit each DC as described in #8.

We are currently working on a fix to the deployer which will resolve upgrades by adding the MIN_MASTER env var to the deploymentconfigs.  Re-running the deployer with validate will not expose this or additional issues as this is the only related change in the 3.4 code stream.

Also, see [1] to understand the memory settings.  The value is calculated based on the limits of the node and resources provided in the spec.

https://github.com/openshift/origin-aggregated-logging/blob/master/elasticsearch/run.sh#L28

Comment 13 Boris Kurktchiev 2017-04-07 15:07:25 UTC

sure no problem, will do. Here is the configmap as it exists and I do not see the said zen.minimum_master_nodes setting anywhere, so removing it seems to not be possible?

Comment 14 Boris Kurktchiev 2017-04-07 15:07:45 UTC

Created attachment 1269790 [details]
logging configmap

Comment 15 Boris Kurktchiev 2017-04-07 15:09:10 UTC

also as I noted, I did edit the DC as suggested and ended up with the trace as posted. which i am going to let the support guys open a BZ on and once they do that I am more than happy to attach the ES errors to again.

Comment 16 Jeff Cantrill 2017-04-07 18:51:36 UTC

Update after re-reviewing this issue, I can produce this problem, but based on the attached logs and configs, it is unclear to me where or how you have this problem.  I can reproduce and see a stack in the ES logs.  The attached logs have no ref to 'MIN_MASTER' that I can find.

The 3.4 change introducing MIN_MASTER was only added in an 'install' scenario and updates the config like [1]. ES expects an environment var which would be defined in the DC [2]. The deployer on install will fix the value in the DC based on whats described in #8.

There is no way to see this error if you are using the default which is what ES uses if zen.minimum_master_nodes is not in the config.

[1] https://github.com/openshift/origin-aggregated-logging/commit/4f789b4e97ba0cf77ab553dd3773820951833c74#diff-bb1ad2cd0b762aef9387d3b94258a8a9R32
[2] https://github.com/openshift/origin-aggregated-logging/commit/4f789b4e97ba0cf77ab553dd3773820951833c74#diff-3a3ae33bf79a43018ccbd0c4bc265f88R84

Comment 17 Jeff Cantrill 2017-04-07 19:33:47 UTC

Ruben,

Can you provide details of what and where you are seeing this issue.  I have not been able to find any MIN_MASTER errors in the attached logs or configs.  Moving work to fix MIN_MASTERS to https://bugzilla.redhat.com/show_bug.cgi?id=1439356 which better describes the issues I have been able to replicate.

Comment 18 Ruben Romero Montes 2017-04-10 14:27:59 UTC

@Jeff
The logs are indeed taken after the variable was introduced manually to the DC and then performed the upgrade again. Unfortunately don't have any more the logs from the upgrade without the MIN_MASTERS

I see the mentioned bug provides the logs from the upgrade. This can therefore be set as duplicate.

Thanks

Comment 19 Jeff Cantrill 2017-04-17 20:59:21 UTC


*** This bug has been marked as a duplicate of bug 1439356 ***

Note You need to log in before you can comment on or make changes to this bug.