Bug 1336625 - [intservice_public_249]logging upgrade failed because serviceaccounts "aggregated-logging-curator" not found
Summary: [intservice_public_249]logging upgrade failed because serviceaccounts "aggreg...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Logging
Version: 3.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: ewolinet
QA Contact: chunchen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-17 05:24 UTC by Xia Zhao
Modified: 2016-09-30 02:17 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-19 13:55:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
pod log (57.47 KB, text/plain)
2016-05-17 05:24 UTC, Xia Zhao
no flags Details
pod_dump (4.21 KB, text/plain)
2016-05-17 05:24 UTC, Xia Zhao
no flags Details

Description Xia Zhao 2016-05-17 05:24:00 UTC
Created attachment 1158160 [details]
pod log

Problem description: 
Upgrade from a 3.2.0 stage normal installation, deployer failed because serviceaccounts "aggregated-logging-curator" not found

Version-Release number of selected component (if applicable):
docker.io/openshift/origin-logging-deployment                 latest              21fc80bb6c46        9 hours ago         706.3 MB

How reproducible:
Always

Steps to Reproduce:
1. Deploy logging system at 3.2.0 level:
image_prefix = registry.access.redhat.com/openshift3/
image_version = 3.2.0
2. Wait for EFK pods running
3. Run logging deployer with -p MODE=upgrade -p IMAGE_PREFIX=openshift/origin- -p image_version = latest 
4. Check logging upgrade log

Actual Result:
4. upgrade failed because serviceaccounts "aggregated-logging-curator" not found

Expected Result:
4. upgrade should be successful

Additional info:
1.Upgrade deployer pod dump attached
2.Upgrade pod log attached (when ENABLE_OPS_CLUSTER=false )
3.Issue reproduced when ENABLE_OPS_CLUSTER=true and ENABLE_OPS_CLUSTER=false

Comment 1 Xia Zhao 2016-05-17 05:24:35 UTC
Created attachment 1158161 [details]
pod_dump

Comment 2 Xia Zhao 2016-05-17 06:11:32 UTC
I'm curious on whether we expected customer to run this command specially before initiating the upgrade process inside deployer pod? 

oc new-app logging-deployer-account-template

If yes, they may encounter some unwanted error such as "error: serviceaccounts "aggregated-logging-elasticsearch" already exists", and the upgrade behavior will be inconsistent with scenarios like https://tcms-openshift.rhcloud.com/case/5284/?from_plan=3 where "oc new-app logging-deployer-account-template" is not needed.

Comment 3 ewolinet 2016-05-17 13:17:18 UTC
This sounds like a documentation bug.  A customer would need to run `oc new-app logging-deployer-account-template` on the new template in this case.  They may also need to update the deployer service account depending on how old their installation is.

I'll update our upgrade documentation to reflect these 'gotcha' steps.

Comment 4 Luke Meyer 2016-05-17 13:21:11 UTC
FWIW the upgrade probably *could* create the missing SA... I don't believe any more special permissions are required.

Comment 5 ewolinet 2016-05-17 13:25:09 UTC
True, however if the Curator service account isn't yet created then there's a good chance the deployer may not have the daemonset-admin and oauth-editor roles and it would fail at those steps instead.

Comment 7 Xia Zhao 2016-05-24 06:53:01 UTC
Verified with latest deployer image on dockerhub, upgraded as described by the new upgrade doc: https://github.com/openshift/origin-aggregated-logging#upgrading-your-efk-stack, this issue is fixed well.

Comment 8 Xia Zhao 2016-05-24 09:08:59 UTC
Created https://github.com/openshift/origin-aggregated-logging/pull/147 to refine the current hyperlinks on upgrade doc


Note You need to log in before you can comment on or make changes to this bug.