1738758 – the logging-es-ops nodeSelector are changed after upgrade

Bug 1738758 - the logging-es-ops nodeSelector are changed after upgrade

Summary: the logging-es-ops nodeSelector are changed after upgrade

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.11.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3.11.z
Assignee:	Noriko Hosoi
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-08 05:57 UTC by Anping Li
Modified:	2020-02-02 01:32 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-02-02 01:32:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
deploymentconfig Before upgrade (46.70 KB, text/plain) 2019-08-08 06:13 UTC, Anping Li	no flags	Details
Deployment configure after upgrade (46.65 KB, text/plain) 2019-08-08 06:14 UTC, Anping Li	no flags	Details
View All

Description Anping Li 2019-08-08 05:57:56 UTC

Description of problem:

The nodeSelector are changed in the logging-es-ops deploymentconfig after upgrade. that logging-es pod couldn't be started.

1) nodeSelector before Upgrade:
cat elasticsearch-dc-before-upgrade.json | jq '.items[].metadata.name, .items[].spec.template.spec.nodeSelector'
"logging-es-data-master-ajbqhp8h"
"logging-es-data-master-telafmeq"
"logging-es-ops-data-master-0fr84k1a"
"logging-es-ops-data-master-9961o92h"
"logging-es-ops-data-master-o7nhcbo4"
{
  "logging-es-node": "1"
}
{
  "logging-es-node": "0"
}
{
  "logging-es-ops-node": "2"
}
{
  "logging-es-ops-node": "0"
}
{
  "logging-es-ops-node": "1"
}


2) Logging Inventory used for upgrade
openshift_logging_install_logging=true
openshift_logging_es_cluster_size=2
openshift_logging_es_number_of_replicas=1
openshift_logging_es_number_of_shards=1
openshift_logging_es_memory_limit=2Gi
openshift_logging_es_nodeselector={"node-role.kubernetes.io/infra": "true"}

openshift_logging_use_ops=true
openshift_logging_es_ops_cluster_size=3
openshift_logging_es_ops_number_of_replicas=1
openshift_logging_es_ops_number_of_shards=1
openshift_logging_es_ops_memory_limit=2Gi
openshift_logging_es_ops_nodeselector={"node-role.kubernetes.io/compute": "true"}

openshift_logging_elasticsearch_storage_type=hostmount

3)nodeSelector after Upgrade
cat elasticsearch-dc-after.json | jq '.items[].metadata.name, .items[].spec.template.spec.nodeSelector'
"logging-es-data-master-ajbqhp8h"
"logging-es-data-master-telafmeq"
"logging-es-ops-data-master-0fr84k1a"
"logging-es-ops-data-master-9961o92h"
"logging-es-ops-data-master-o7nhcbo4"
{
  "logging-es-node": "1"
}
{
  "logging-es-node": "0"
}
{
  "logging-es-node": "2"
}
{
  "logging-es-node": "0"
}
{
  "logging-es-node": "1"


Version-Release number of selected component (if applicable):
openshift3/ose-ansible:v3.11.135 

How reproducible:
Always

Steps to Reproduce:
1. deploy logging using openshift_logging_use_ops=true

openshift_logging_install_logging=true
openshift_logging_es_cluster_size=2
openshift_logging_es_number_of_replicas=1
openshift_logging_es_number_of_shards=1
openshift_logging_es_memory_limit=2Gi
openshift_logging_es_nodeselector={"node-role.kubernetes.io/infra": "true"}

openshift_logging_use_ops=true
openshift_logging_es_ops_cluster_size=3
openshift_logging_es_ops_number_of_replicas=1
openshift_logging_es_ops_number_of_shards=1
openshift_logging_es_ops_memory_limit=2Gi
openshift_logging_es_ops_nodeselector={"node-role.kubernetes.io/compute": "true"}

2. Add hostpath volume and nodeSelector to ES and ES-Ops deployment configure
"logging-es-data-master-ajbqhp8h"
"logging-es-data-master-telafmeq"
"logging-es-ops-data-master-0fr84k1a"
"logging-es-ops-data-master-9961o92h"
"logging-es-ops-data-master-o7nhcbo4"
{
  "logging-es-node": "1"
}
{
  "logging-es-node": "0"
}
{
  "logging-es-ops-node": "2"
}
{
  "logging-es-ops-node": "0"
}
{
  "logging-es-ops-node": "1"
}

3. Upgrade to latest version using openshift3/ose-ansible:v3.11.135 

Actual results:
The logging-es-ops pod couldn't be started, as the nodeSelector are changed after upgrade
cat elasticsearch-dc-after.json | jq '.items[].metadata.name, .items[].spec.template.spec.nodeSelector'
"logging-es-data-master-ajbqhp8h"
"logging-es-data-master-telafmeq"
"logging-es-ops-data-master-0fr84k1a"
"logging-es-ops-data-master-9961o92h"
"logging-es-ops-data-master-o7nhcbo4"
{
  "logging-es-node": "1"
}
{
  "logging-es-node": "0"
}
{
  "logging-es-node": "2"
}
{
  "logging-es-node": "0"
}
{
  "logging-es-node": "1"

Comment 1 Anping Li 2019-08-08 06:13:34 UTC

Created attachment 1601703 [details]
deploymentconfig Before upgrade

Comment 2 Anping Li 2019-08-08 06:14:07 UTC

Created attachment 1601704 [details]
Deployment configure after upgrade

Comment 3 Anping Li 2019-08-08 06:35:21 UTC

There is timespan between deployment configuration changed and logging-es-ops deploymentconfigure rollout.  workaound:  Correct the nodeSelector in logging-es-ops deploymentconfigure before the rollout. you can correct them at the point logging-es is restarting.

Comment 4 Noriko Hosoi 2019-09-17 22:53:34 UTC

Hi Anping,

Sorry for my ignorance, but I'd like to learn a couple more things...

1) Could you share these outputs?

   - the ansible log from the upgrade
   - ES log when it fails to start
   - oc get events | grep Warning

2) If you label nodes and nodeSelector like this from the beginning, the logging-es pods do not start?

The logging-es-ops pod couldn't be started, as the nodeSelector are changed after upgrade
cat elasticsearch-dc-after.json | jq '.items[].metadata.name, .items[].spec.template.spec.nodeSelector'
"logging-es-data-master-ajbqhp8h"
"logging-es-data-master-telafmeq"
"logging-es-ops-data-master-0fr84k1a"
"logging-es-ops-data-master-9961o92h"
"logging-es-ops-data-master-o7nhcbo4"
{
  "logging-es-node": "1"
}
{
  "logging-es-node": "0"
}
{
  "logging-es-node": "2"
}
{
  "logging-es-node": "0"
}
{
  "logging-es-node": "1"
}

Comment 5 Anping Li 2019-10-22 03:12:01 UTC

sorry for the later I will provide you  the logs in the next 3.11 testing

Comment 7 Jeff Cantrill 2020-02-02 01:32:09 UTC

Closing DEFERRED. Please reopen if problem persists and there are open customer cases.

Note You need to log in before you can comment on or make changes to this bug.