Created attachment 1270519 [details] indices Description of problem: After an upgrade to the latest logging images 3.4.1-17 Version-Release number of selected component (if applicable): Openshift 3.4.1.10 openshift3-logging-elasticsearch-3.4.1-17 How reproducible: Apparently always when having a big amount of indices. Steps to Reproduce: 1. oc new-app logging-deployer-template --param IMAGE_VERSION=3.4.1 --param MODE=upgrade Actual results: Some indices with 3.3 pattern <namespace>.<uuid>.<date> don't have their correspondent alias for 3.4 format created: project.<namespace>.<uuid>.<date> Logs cannot be accessed from Openshift Expected results: All aliases created for the 3.3 indices. Additional info:
Created attachment 1270520 [details] migrate logs
Created attachment 1270522 [details] aliases
Created attachment 1270662 [details] elasticsearch logs from the deployers
Created attachment 1271543 [details] create "project.name.*" aliases for all of the "name.*" indices This script will go through all of the indices in the server. Any indices beginning with "." or "project." are ignored. For the remaining indices of the form "name.*", an alias will be created for "project.name.*". This is essentially what the deployer MODE=upgrade does except that it doesn't restart pods, redeploy, etc.
The script didn't work apparently. The possible reason are the new indices created following the new format project.name... that when an alias exists for that pattern they are ignored. It was possible to workaround it by manually creating the remaining aliases.
https://github.com/openshift/origin-aggregated-logging/pull/380
Consider that the original issue can't be reproduced with small number of projects, and there are 1500+ indices in bug reporter's env, I'm contacting performance team to test the bug fix with env containing large number of projects.
(In reply to Xia Zhao from comment #11) > Consider that the original issue can't be reproduced with small number of > projects, and there are 1500+ indices in bug reporter's env, I'm contacting > performance team to test the bug fix with env containing large number of > projects. Have you tried to reproduce with a small number of projects?
Forgot to build openshift-elasticsearch-plugin 2.4.1.7 for RH - building now - then will need to get tdawson to rebuild the puddle, then I will need to rebuild the image - then tdawson will need to push to QE registry
(In reply to Rich Megginson from comment #12) > (In reply to Xia Zhao from comment #11) > > Consider that the original issue can't be reproduced with small number of > > projects, and there are 1500+ indices in bug reporter's env, I'm contacting > > performance team to test the bug fix with env containing large number of > > projects. > > Have you tried to reproduce with a small number of projects? @Rich, According to the logs here, I've got the alias created in an env with 3 projects: https://bugzilla.redhat.com/show_bug.cgi?id=1395170#c38 Thanks, Xia
Yes, we need a test which can stress with a large number of projects/indices.
@wabouham Just FYI: the testing work will be blocked here for this moment: https://bugzilla.redhat.com/show_bug.cgi?id=1446504
@wabouham Just FYI: the blocker mentioned in comment #19 will not blocking us to test the bug fix with "missing aliases after upgrade", it is only blocking the upgraded logging stacks on collecting new log entries. Please feel free to let me know if any help is needed from my side to proceed with testing the bug fix here. Thanks!
@xiazhao : please send me access info to the environment where I need to create large number of projects. I'll run our tools and run curl commands to create one index per project to try to reproduce the issue. You can email me or reply with private comment. Thanks !
@wabouham I can setup one for you if needed, but I'm afraid it can't fulfill the test with large number of projects. Consider the large number of applications will consume more resources, I wonder if SVT team have the env with larger resources.
@xiazhao please set up an env or let me access yours. I'll try to create the projects on your environment first, and run curl commands to create a log entry form each project. Should not need to create an app in each project. If we run into resource issues, I'll setup a a larger env and work form there. Thanks.
@Walid, The 3.3.1 env with logging deployed is ready, sent it with email. Any question please let me know. Thanks!
(In reply to Xia Zhao from comment #19) > @wabouham Just FYI: the testing work will be blocked here for this moment: > https://bugzilla.redhat.com/show_bug.cgi?id=1446504 FYI. we've worked out that this blocker can be worked around by the solution here: https://bugzilla.redhat.com/show_bug.cgi?id=1446504#c10
Verified fix. Created an AWS env with 1 master/etc, 1 infra node, and 2 application nodes: # openshift version openshift v3.4.1.24 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 # openshift_ansible latest from release-1.4 branch (May 15, 2017) # rpm -qva | grep ansible openshift-ansible-filter-plugins-3.4.69-1.git.0.e3b8949.el7.noarch openshift-ansible-playbooks-3.4.69-1.git.0.e3b8949.el7.noarch openshift-ansible-3.4.69-1.git.0.e3b8949.el7.noarch openshift-ansible-roles-3.4.69-1.git.0.e3b8949.el7.noarch openshift-ansible-docs-3.4.69-1.git.0.e3b8949.el7.noarch openshift-ansible-callback-plugins-3.4.69-1.git.0.e3b8949.el7.noarch ansible-2.2.1.0-2.el7.noarch openshift-ansible-lookup-plugins-3.4.69-1.git.0.e3b8949.el7.noarch Verification procedure: --------------- Logging stack deployed initially on version 3.3.1. Created 2000 projects and 2000 indices for each project. Upgraded logging stack to version 3.4.1. Verified that 2000 aliases were created for the 2000 indices, with correct patterns # oc exec logging-es-rh4rw47l-3-uqu4m -- curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cat/indices | grep clusterproj | wc -l 2000 # oc exec logging-es-rh4rw47l-3-uqu4m -- curl -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cat/aliases | grep clusterproj | wc -l 2000 Each existing project had the following index pattern before and after the upgrade: clusterproj1811.944e7597-3a2c-11e7-8ab2-026d0776ce7c.2017.05.16 Corresponding alias created after logging stack upgrade to v3.4.1: project.clusterproj1811.944e7597-3a2c-11e7-8ab2-026d0776ce7c.cdm-alias.2017.05.16 clusterproj1811.944e7597-3a2c-11e7-8ab2-026d0776ce7c.2017.05.16
Is there some text missing in the doc text? I see "Part of this was related" but it seems like this thought is not complete.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3049