Bug 1848454 - OpenShift logging upgrade from 3.11.161 to 3.11.219 fails
Summary: OpenShift logging upgrade from 3.11.161 to 3.11.219 fails
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.11.0
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: ---
: 3.11.z
Assignee: Sergey Yedrikov
QA Contact: Anping Li
URL:
Whiteboard: logging-core
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-18 11:52 UTC by Devendra Kulkarni
Modified: 2023-12-15 18:13 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Missing check for the matching number of Elasticsearch DCs, PVCs and indices. Consequence: Ansible pads missing elements with Nones leading to cryptic Cluster Logging playbook failure. Fix: Added the missing check. Result: Cluster logging playbook issues a diagnostic if the number of Elasticsearch DCs, PVCs and indices do not match.
Clone Of:
Environment:
Last Closed: 2020-10-22 11:02:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Playbook logs (100.69 KB, application/gzip)
2020-10-11 16:15 UTC, Anping Li
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 12233 0 None closed Bug 1848454: OpenShift logging upgrade from 3.11.161 to 3.11.219 fails 2020-11-11 19:19:49 UTC
Github openshift openshift-ansible pull 12252 0 None closed Revert "Bug 1848454: OpenShift logging upgrade from 3.11.161 to 3.11.219 fails" 2020-11-11 19:19:50 UTC
Red Hat Product Errata RHBA-2020:4170 0 None None None 2020-10-22 11:02:46 UTC

Description Devendra Kulkarni 2020-06-18 11:52:43 UTC
Description of problem:

Openshift Logging upgrade fails from 3.11.161 to 3.11.219 on the ansible task:

~~~
TASK [openshift_logging_elasticsearch : set_fact] ********************************************************************
fatal: [svvtocp1mastr01.vegvesen.no]: FAILED! => {"msg": "The conditional check 'openshift_logging_elasticsearch_deployment_name == \"\"' failed. The error was: error while evaluating conditional (openshift_logging_elasticsearch_deployment_name == \"\"): 'None' has no attribute 'name'\n\nThe error appears to be in '/usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/main.yaml': line 470, column 3, but may\nbe elsewhere in the file depending
on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- set_fact:\n  ^ here\n"}
********************************************************************
~~~

- Logging stack is currently running fine with 3.11.161 version.

Version-Release number of selected component (if applicable): 


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results: Playbook fails at the TASK:[openshift_logging_elasticsearch : set_fact] 


Expected results: Upgrade should be successful.


Additional info:

Comment 3 Jeff Cantrill 2020-06-18 12:58:36 UTC
Please provide:
* The version of ansible
* The version of openshift-ansible

Comment 4 Devendra Kulkarni 2020-06-18 13:21:29 UTC
Hello @Jeff,

# rpm -qa | grep openshift-ansible
openshift-ansible-roles-3.11.219-1.git.0.8845382.el7.noarch
openshift-ansible-playbooks-3.11.219-1.git.0.8845382.el7.noarch
openshift-ansible-3.11.219-1.git.0.8845382.el7.noarch
openshift-ansible-docs-3.11.219-1.git.0.8845382.el7.noarch

#rpm -qa | grep ^ansible
ansible-2.9.6-1.el7ae.noarch

Comment 5 Jeff Cantrill 2020-06-18 21:02:28 UTC
Please confirm the attached inventory is the correct one.  The content would indicate it's not to install logging at all:

# Aggregated logging
openshift_logging_install_logging=False

Comment 6 Devendra Kulkarni 2020-06-19 04:34:01 UTC
Hello Jeff,

I have already pointed out this to the customer and they are using extra vars (openshift_logging_install_logging=True) while running the playbook.

Regards,
Devendra Kulkarni

Comment 8 Periklis Tsirakidis 2020-07-08 08:22:19 UTC
Put UpcomingSprint label, not likely to land this until EOS.

Comment 9 IgorKarpukhin 2020-07-30 18:40:07 UTC
Moving to UpcomingSprint

Comment 10 Jeff Cantrill 2020-08-20 13:39:00 UTC
*** Bug 1842608 has been marked as a duplicate of this bug. ***

Comment 11 Jeff Cantrill 2020-08-21 14:11:10 UTC
Moving to UpcomingSprint for future evaluation

Comment 13 Jeff Cantrill 2020-09-12 01:52:48 UTC
Moving to UpcomingSprint awaiting for PRs to merge, etc.

Comment 18 Anping Li 2020-10-11 16:12:49 UTC
The upgrade failed when ES DC count = ES PVC count.

Sunday 11 October 2020  15:53:05 +0000 (0:00:00.148)       0:01:04.922 ******** 
fatal: [ec2-54-89-6-220.compute-1.amazonaws.com]: FAILED! => {
    "changed": false, 
    "msg": "There must be the same number of ES DeploymentConfigs, ES PVCs and ES indices. Found ES DeploymentConfigs - \"[u'logging-es-data-master-5cg1if76', u'logging-es-data-master-2i7wgnxz', u'logging-es-data-master-r1ijyoz5', u'logging-es-data-master-v3pbddal']\", ES DC count - \"4\", ES PVCs - \"{}\", ES PVC length - \"0\" and ES indices - \"[0, 1, 2, 3]\", ES indices length - \"4\""
}


#ES PVC length =0 when pvc number =4
$oc get dc
NAME                              REVISION   DESIRED   CURRENT   TRIGGERED BY
logging-es-data-master-2i7wgnxz   3          1         1         
logging-es-data-master-5cg1if76   3          1         1         
logging-es-data-master-r1ijyoz5   3          1         1         
logging-es-data-master-v3pbddal   3          1         1         
logging-kibana                    2          1         1         config

$ oc get pvc
NAME   STATUS   VOLUME       CAPACITY   ACCESS MODES   STORAGECLASS   AGE
es-0   Bound    logginges1   10G        RWO                           57m
es-1   Bound    logginges2   10G        RWO                           56m
es-2   Bound    logginges0   10G        RWO                           55m
es-3   Bound    logginges3   10G        RWO                           55m

Comment 19 Anping Li 2020-10-11 16:15:18 UTC
Created attachment 1720679 [details]
Playbook logs

#######Logging-Variables##########
openshift_logging_install_logging=true
openshift_logging_es_cluster_size=4
openshift_logging_es_number_of_replicas=0
openshift_logging_es_allows_cluster_reader=True
openshift_logging_es_nodeselector={"role": "node"}
openshift_logging_elasticsearch_storage_type=pvc
openshift_logging_es_pvc_size=5Gi
#openshift_logging_es_pvc_storage_class_name=''
openshift_logging_es_pvc_dynamic=true
openshift_logging_es_pvc_prefix=es
openshift_logging_es_pv_selector={'logging-infra':'es'}
openshift_logging_es_memory_limit=2Gi

Comment 20 Anping Li 2020-10-13 02:19:00 UTC
The PR pull in new issue. The upgrade always failed using the new code.

Comment 21 Anping Li 2020-10-13 02:24:48 UTC
Workaround: Please don't use openshift-ansible:v3.11.306.  use openshift-ansible:v3.11.286 if you need to upgrade logging.

Comment 23 Anping Li 2020-10-15 12:28:51 UTC
@Sergey,
openshift_logging_es_pvc_prefix=logging-es pass. Only hit comment 18 when we use customized pvc prefix  logging_es_pvc_prefix=es.

Comment 24 Sergey Yedrikov 2020-10-15 18:54:19 UTC
Revert PR: https://github.com/openshift/openshift-ansible/pull/12252

Comment 26 Anping Li 2020-10-20 07:16:52 UTC
Revert the PR as it cause regression.  the support case behind BZ 1848454 got closed, there's no customer waiting for that fix atm. so close it.

Comment 28 errata-xmlrpc 2020-10-22 11:02:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 3.11.306 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4170


Note You need to log in before you can comment on or make changes to this bug.