1440855 – Missing aliases after upgrade

Bug 1440855 - Missing aliases after upgrade

Summary: Missing aliases after upgrade

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.4.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	3.4.z
Assignee:	Rich Megginson
QA Contact:	Walid A.
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-04-10 15:37 UTC by Ruben Romero Montes
Modified:	2020-05-14 15:56 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: The upgrade procedure was not correctly upgrading the indices from the old style to the new common data model style. Part of this was related to the number of indices to upgrade, which failed if over several hundred. Consequence: Some data was hidden from Kibana searches. Fix: The upgrade script was changed to properly create aliases for the old indices, and not hide new data. Result: Both old and new data is available to view in Kibana.
Clone Of:
Environment:
Last Closed:	2017-10-25 13:00:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
indices (285.49 KB, text/plain) 2017-04-10 15:37 UTC, Ruben Romero Montes	no flags	Details
migrate logs (486.96 KB, text/plain) 2017-04-10 15:39 UTC, Ruben Romero Montes	no flags	Details
aliases (325.50 KB, text/plain) 2017-04-10 15:40 UTC, Ruben Romero Montes	no flags	Details
elasticsearch logs from the deployers (5.48 MB, text/plain) 2017-04-11 06:51 UTC, Ruben Romero Montes	no flags	Details
*create "project.name." aliases for all of the "name." indices* (3.39 KB, text/plain) 2017-04-13 20:16 UTC, Rich Megginson	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:3049	0	normal	SHIPPED_LIVE	OpenShift Container Platform 3.6, 3.5, and 3.4 bug fix and enhancement update	2017-10-25 15:57:15 UTC

Description Ruben Romero Montes 2017-04-10 15:37:36 UTC

Created attachment 1270519 [details]
indices

Description of problem:
After an upgrade to the latest logging images 3.4.1-17

Version-Release number of selected component (if applicable):
Openshift 3.4.1.10
openshift3-logging-elasticsearch-3.4.1-17

How reproducible:
Apparently always when having a big amount of indices.

Steps to Reproduce:
1. oc new-app logging-deployer-template --param IMAGE_VERSION=3.4.1 --param MODE=upgrade

Actual results:
Some indices with 3.3 pattern <namespace>.<uuid>.<date> don't have their correspondent alias for 3.4 format created: project.<namespace>.<uuid>.<date>
Logs cannot be accessed from Openshift

Expected results:
All aliases created for the 3.3 indices.

Additional info:

Comment 1 Ruben Romero Montes 2017-04-10 15:39:04 UTC

Created attachment 1270520 [details]
migrate logs

Comment 2 Ruben Romero Montes 2017-04-10 15:40:04 UTC

Created attachment 1270522 [details]
aliases

Comment 3 Ruben Romero Montes 2017-04-11 06:51:08 UTC

Created attachment 1270662 [details]
elasticsearch logs from the deployers

Comment 4 Rich Megginson 2017-04-13 20:16:28 UTC

Created attachment 1271543 [details]
create "project.name.*" aliases for all of the "name.*" indices

This script will go through all of the indices in the server.  Any indices beginning with "." or "project." are ignored.  For the remaining indices of the form "name.*", an alias will be created for "project.name.*".

This is essentially what the deployer MODE=upgrade does except that it doesn't restart pods, redeploy, etc.

Comment 5 Ruben Romero Montes 2017-04-14 15:53:14 UTC

The script didn't work apparently. The possible reason are the new indices created following the new format project.name... that when an alias exists for that pattern they are ignored.

It was possible to workaround it by manually creating the remaining aliases.

Comment 9 Rich Megginson 2017-04-25 02:14:16 UTC

https://github.com/openshift/origin-aggregated-logging/pull/380

Comment 11 Xia Zhao 2017-04-26 06:19:34 UTC

Consider that the original issue can't be reproduced with small number of projects, and there are 1500+ indices in bug reporter's env, I'm contacting performance team to test the bug fix with env containing large number of projects.

Comment 12 Rich Megginson 2017-04-26 13:20:30 UTC

(In reply to Xia Zhao from comment #11)
> Consider that the original issue can't be reproduced with small number of
> projects, and there are 1500+ indices in bug reporter's env, I'm contacting
> performance team to test the bug fix with env containing large number of
> projects.

Have you tried to reproduce with a small number of projects?

Comment 13 Rich Megginson 2017-04-26 22:42:55 UTC

Forgot to build openshift-elasticsearch-plugin 2.4.1.7 for RH - building now - then will need to get tdawson to rebuild the puddle, then I will need to rebuild the image - then tdawson will need to push to QE registry

Comment 17 Xia Zhao 2017-04-28 02:13:37 UTC

(In reply to Rich Megginson from comment #12)
> (In reply to Xia Zhao from comment #11)
> > Consider that the original issue can't be reproduced with small number of
> > projects, and there are 1500+ indices in bug reporter's env, I'm contacting
> > performance team to test the bug fix with env containing large number of
> > projects.
> 
> Have you tried to reproduce with a small number of projects?

@Rich,

According to the logs here, I've got the alias created in an env with 3 projects: https://bugzilla.redhat.com/show_bug.cgi?id=1395170#c38

Thanks,
Xia

Comment 18 Rich Megginson 2017-04-28 02:21:46 UTC

Yes, we need a test which can stress with a large number of projects/indices.

Comment 19 Xia Zhao 2017-04-28 09:04:22 UTC

@wabouham Just FYI: the testing work will be blocked here for this moment: https://bugzilla.redhat.com/show_bug.cgi?id=1446504

Comment 20 Xia Zhao 2017-05-10 07:37:48 UTC

@wabouham Just FYI: the blocker mentioned in comment #19 will not blocking us to test the bug fix with "missing aliases after upgrade", it is only blocking the upgraded logging stacks on collecting new log entries. 
Please feel free to let me know if any help is needed from my side to proceed with testing the bug fix here. Thanks!

Comment 21 Walid A. 2017-05-10 15:57:44 UTC

@xiazhao : please send me access info to the environment where I need to create large number of projects.  I'll run our tools and run curl commands to create one index per project to try to reproduce the issue.  You can email me or reply with private comment.  Thanks !

Comment 22 Xia Zhao 2017-05-11 02:25:05 UTC

@wabouham I can setup one for you if needed, but I'm afraid it can't fulfill the test with large number of projects. Consider the large number of applications will consume more resources, I wonder if SVT team have the env with larger resources.

Comment 23 Walid A. 2017-05-11 12:46:54 UTC

@xiazhao please set up an env or let me access yours.  I'll try to create the projects on your environment first, and run curl commands to create a log entry form each project.  Should not need to create an app in each project.  If we run into resource issues, I'll setup a a larger env and work form there.  Thanks.

Comment 24 Xia Zhao 2017-05-11 14:10:14 UTC

@Walid, The 3.3.1 env with logging deployed is ready, sent it with email. Any question please let me know. Thanks!

Comment 25 Xia Zhao 2017-05-12 06:10:34 UTC

(In reply to Xia Zhao from comment #19)
> @wabouham Just FYI: the testing work will be blocked here for this moment:
> https://bugzilla.redhat.com/show_bug.cgi?id=1446504

FYI. we've worked out that this blocker can be worked around by the solution here: https://bugzilla.redhat.com/show_bug.cgi?id=1446504#c10

Comment 32 Walid A. 2017-05-16 21:17:27 UTC

Verified fix.  Created an AWS env with 1 master/etc, 1 infra node, and 2 application nodes:

# openshift version
openshift v3.4.1.24
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

# openshift_ansible latest from release-1.4 branch (May 15, 2017)
# rpm -qva | grep ansible
openshift-ansible-filter-plugins-3.4.69-1.git.0.e3b8949.el7.noarch
openshift-ansible-playbooks-3.4.69-1.git.0.e3b8949.el7.noarch
openshift-ansible-3.4.69-1.git.0.e3b8949.el7.noarch
openshift-ansible-roles-3.4.69-1.git.0.e3b8949.el7.noarch
openshift-ansible-docs-3.4.69-1.git.0.e3b8949.el7.noarch
openshift-ansible-callback-plugins-3.4.69-1.git.0.e3b8949.el7.noarch
ansible-2.2.1.0-2.el7.noarch
openshift-ansible-lookup-plugins-3.4.69-1.git.0.e3b8949.el7.noarch

Verification procedure:
---------------
Logging stack deployed initially on version 3.3.1.
Created 2000 projects and 2000 indices for each project.
Upgraded logging stack to version 3.4.1.
Verified that 2000 aliases were created for the 2000 indices, with correct patterns

# oc exec logging-es-rh4rw47l-3-uqu4m -- curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cat/indices | grep clusterproj | wc -l
2000

# oc exec logging-es-rh4rw47l-3-uqu4m -- curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cat/aliases | grep clusterproj | wc -l
2000

Each existing project had the following index pattern before and after the upgrade:
clusterproj1811.944e7597-3a2c-11e7-8ab2-026d0776ce7c.2017.05.16

Corresponding alias created after logging stack upgrade to v3.4.1:
project.clusterproj1811.944e7597-3a2c-11e7-8ab2-026d0776ce7c.cdm-alias.2017.05.16 clusterproj1811.944e7597-3a2c-11e7-8ab2-026d0776ce7c.2017.05.16

Comment 33 Peter Portante 2017-06-05 15:43:37 UTC

Is there some text missing in the doc text?  I see "Part of this was related" but it seems like this thought is not complete.

Comment 35 errata-xmlrpc 2017-10-25 13:00:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3049

Note You need to log in before you can comment on or make changes to this bug.