Bug 1738758 - the logging-es-ops nodeSelector are changed after upgrade
Summary: the logging-es-ops nodeSelector are changed after upgrade
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.11.z
Assignee: Noriko Hosoi
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-08 05:57 UTC by Anping Li
Modified: 2020-02-02 01:32 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-02 01:32:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
deploymentconfig Before upgrade (46.70 KB, text/plain)
2019-08-08 06:13 UTC, Anping Li
no flags Details
Deployment configure after upgrade (46.65 KB, text/plain)
2019-08-08 06:14 UTC, Anping Li
no flags Details

Description Anping Li 2019-08-08 05:57:56 UTC
Description of problem:

The nodeSelector are changed in the logging-es-ops deploymentconfig after upgrade. that logging-es pod couldn't be started.

1) nodeSelector before Upgrade:
cat elasticsearch-dc-before-upgrade.json | jq '.items[].metadata.name, .items[].spec.template.spec.nodeSelector'
"logging-es-data-master-ajbqhp8h"
"logging-es-data-master-telafmeq"
"logging-es-ops-data-master-0fr84k1a"
"logging-es-ops-data-master-9961o92h"
"logging-es-ops-data-master-o7nhcbo4"
{
  "logging-es-node": "1"
}
{
  "logging-es-node": "0"
}
{
  "logging-es-ops-node": "2"
}
{
  "logging-es-ops-node": "0"
}
{
  "logging-es-ops-node": "1"
}


2) Logging Inventory used for upgrade
openshift_logging_install_logging=true
openshift_logging_es_cluster_size=2
openshift_logging_es_number_of_replicas=1
openshift_logging_es_number_of_shards=1
openshift_logging_es_memory_limit=2Gi
openshift_logging_es_nodeselector={"node-role.kubernetes.io/infra": "true"}

openshift_logging_use_ops=true
openshift_logging_es_ops_cluster_size=3
openshift_logging_es_ops_number_of_replicas=1
openshift_logging_es_ops_number_of_shards=1
openshift_logging_es_ops_memory_limit=2Gi
openshift_logging_es_ops_nodeselector={"node-role.kubernetes.io/compute": "true"}

openshift_logging_elasticsearch_storage_type=hostmount

3)nodeSelector after Upgrade
cat elasticsearch-dc-after.json | jq '.items[].metadata.name, .items[].spec.template.spec.nodeSelector'
"logging-es-data-master-ajbqhp8h"
"logging-es-data-master-telafmeq"
"logging-es-ops-data-master-0fr84k1a"
"logging-es-ops-data-master-9961o92h"
"logging-es-ops-data-master-o7nhcbo4"
{
  "logging-es-node": "1"
}
{
  "logging-es-node": "0"
}
{
  "logging-es-node": "2"
}
{
  "logging-es-node": "0"
}
{
  "logging-es-node": "1"


Version-Release number of selected component (if applicable):
openshift3/ose-ansible:v3.11.135 

How reproducible:
Always

Steps to Reproduce:
1. deploy logging using openshift_logging_use_ops=true

openshift_logging_install_logging=true
openshift_logging_es_cluster_size=2
openshift_logging_es_number_of_replicas=1
openshift_logging_es_number_of_shards=1
openshift_logging_es_memory_limit=2Gi
openshift_logging_es_nodeselector={"node-role.kubernetes.io/infra": "true"}

openshift_logging_use_ops=true
openshift_logging_es_ops_cluster_size=3
openshift_logging_es_ops_number_of_replicas=1
openshift_logging_es_ops_number_of_shards=1
openshift_logging_es_ops_memory_limit=2Gi
openshift_logging_es_ops_nodeselector={"node-role.kubernetes.io/compute": "true"}

2. Add hostpath volume and nodeSelector to ES and ES-Ops deployment configure
"logging-es-data-master-ajbqhp8h"
"logging-es-data-master-telafmeq"
"logging-es-ops-data-master-0fr84k1a"
"logging-es-ops-data-master-9961o92h"
"logging-es-ops-data-master-o7nhcbo4"
{
  "logging-es-node": "1"
}
{
  "logging-es-node": "0"
}
{
  "logging-es-ops-node": "2"
}
{
  "logging-es-ops-node": "0"
}
{
  "logging-es-ops-node": "1"
}

3. Upgrade to latest version using openshift3/ose-ansible:v3.11.135 

Actual results:
The logging-es-ops pod couldn't be started, as the nodeSelector are changed after upgrade
cat elasticsearch-dc-after.json | jq '.items[].metadata.name, .items[].spec.template.spec.nodeSelector'
"logging-es-data-master-ajbqhp8h"
"logging-es-data-master-telafmeq"
"logging-es-ops-data-master-0fr84k1a"
"logging-es-ops-data-master-9961o92h"
"logging-es-ops-data-master-o7nhcbo4"
{
  "logging-es-node": "1"
}
{
  "logging-es-node": "0"
}
{
  "logging-es-node": "2"
}
{
  "logging-es-node": "0"
}
{
  "logging-es-node": "1"

Comment 1 Anping Li 2019-08-08 06:13:34 UTC
Created attachment 1601703 [details]
deploymentconfig Before upgrade

Comment 2 Anping Li 2019-08-08 06:14:07 UTC
Created attachment 1601704 [details]
Deployment configure after upgrade

Comment 3 Anping Li 2019-08-08 06:35:21 UTC
There is timespan between deployment configuration changed and logging-es-ops deploymentconfigure rollout.  workaound:  Correct the nodeSelector in logging-es-ops deploymentconfigure before the rollout. you can correct them at the point logging-es is restarting.

Comment 4 Noriko Hosoi 2019-09-17 22:53:34 UTC
Hi Anping,

Sorry for my ignorance, but I'd like to learn a couple more things...

1) Could you share these outputs?

   - the ansible log from the upgrade
   - ES log when it fails to start
   - oc get events | grep Warning

2) If you label nodes and nodeSelector like this from the beginning, the logging-es pods do not start?

The logging-es-ops pod couldn't be started, as the nodeSelector are changed after upgrade
cat elasticsearch-dc-after.json | jq '.items[].metadata.name, .items[].spec.template.spec.nodeSelector'
"logging-es-data-master-ajbqhp8h"
"logging-es-data-master-telafmeq"
"logging-es-ops-data-master-0fr84k1a"
"logging-es-ops-data-master-9961o92h"
"logging-es-ops-data-master-o7nhcbo4"
{
  "logging-es-node": "1"
}
{
  "logging-es-node": "0"
}
{
  "logging-es-node": "2"
}
{
  "logging-es-node": "0"
}
{
  "logging-es-node": "1"
}

Comment 5 Anping Li 2019-10-22 03:12:01 UTC
sorry for the later I will provide you  the logs in the next 3.11 testing

Comment 7 Jeff Cantrill 2020-02-02 01:32:09 UTC
Closing DEFERRED. Please reopen if problem persists and there are open customer cases.


Note You need to log in before you can comment on or make changes to this bug.