Created attachment 1158160 [details] pod log Problem description: Upgrade from a 3.2.0 stage normal installation, deployer failed because serviceaccounts "aggregated-logging-curator" not found Version-Release number of selected component (if applicable): docker.io/openshift/origin-logging-deployment latest 21fc80bb6c46 9 hours ago 706.3 MB How reproducible: Always Steps to Reproduce: 1. Deploy logging system at 3.2.0 level: image_prefix = registry.access.redhat.com/openshift3/ image_version = 3.2.0 2. Wait for EFK pods running 3. Run logging deployer with -p MODE=upgrade -p IMAGE_PREFIX=openshift/origin- -p image_version = latest 4. Check logging upgrade log Actual Result: 4. upgrade failed because serviceaccounts "aggregated-logging-curator" not found Expected Result: 4. upgrade should be successful Additional info: 1.Upgrade deployer pod dump attached 2.Upgrade pod log attached (when ENABLE_OPS_CLUSTER=false ) 3.Issue reproduced when ENABLE_OPS_CLUSTER=true and ENABLE_OPS_CLUSTER=false
Created attachment 1158161 [details] pod_dump
I'm curious on whether we expected customer to run this command specially before initiating the upgrade process inside deployer pod? oc new-app logging-deployer-account-template If yes, they may encounter some unwanted error such as "error: serviceaccounts "aggregated-logging-elasticsearch" already exists", and the upgrade behavior will be inconsistent with scenarios like https://tcms-openshift.rhcloud.com/case/5284/?from_plan=3 where "oc new-app logging-deployer-account-template" is not needed.
This sounds like a documentation bug. A customer would need to run `oc new-app logging-deployer-account-template` on the new template in this case. They may also need to update the deployer service account depending on how old their installation is. I'll update our upgrade documentation to reflect these 'gotcha' steps.
FWIW the upgrade probably *could* create the missing SA... I don't believe any more special permissions are required.
True, however if the Curator service account isn't yet created then there's a good chance the deployer may not have the daemonset-admin and oauth-editor roles and it would fail at those steps instead.
Verified with latest deployer image on dockerhub, upgraded as described by the new upgrade doc: https://github.com/openshift/origin-aggregated-logging#upgrading-your-efk-stack, this issue is fixed well.
Created https://github.com/openshift/origin-aggregated-logging/pull/147 to refine the current hyperlinks on upgrade doc