Hide Forgot
Description of problem: upgrade logging 3.2.1 to 3.3 failed after upgrade OSE3.2->OCP 3.3 Version-Release number of selected component (if applicable): openshift-ansible-3.3.20 How reproducible: always Steps to Reproduce: 1. install OSE 3.2 2. deploy logging. 3. upgrade OSE 3.2.1 to OCP 3.3 4. check the logging applications status. [root@anli-working ha2]# oc get pods NAME READY STATUS RESTARTS AGE logging-curator-1-9gmwb 1/1 Running 0 26m logging-curator-ops-1-aeykn 1/1 Running 0 30m logging-es-2ct7rh6u-2-0czhe 1/1 Running 0 27m logging-es-ops-hrp6pnho-1-tfmpe 1/1 Running 0 27m logging-fluentd-1-agmf9 1/1 Running 0 27m logging-fluentd-1-ewfst 1/1 Running 0 29m logging-fluentd-1-ezbyq 1/1 Running 0 26m logging-fluentd-1-gm3vk 1/1 Running 0 31m logging-fluentd-1-p52a7 1/1 Running 0 30m logging-kibana-1-ywbq2 2/2 Running 0 29m logging-kibana-ops-1-syq3k 2/2 Running 2 26m 5. deploy account. [root@anli-working ha2]# oc new-app logging-deployer-account-template --> Deploying template "logging-deployer-account-template" in project "openshift" logging-deployer-account-template --------- Template for creating the deployer account and roles needed for the aggregated logging deployer. Create as cluster-admin. --> Creating resources with label app=logging-deployer-account-template ... error: serviceaccounts "logging-deployer" already exists error: serviceaccounts "aggregated-logging-kibana" already exists error: serviceaccounts "aggregated-logging-elasticsearch" already exists error: serviceaccounts "aggregated-logging-fluentd" already exists error: serviceaccounts "aggregated-logging-curator" already exists clusterrole "oauth-editor" created clusterrole "daemonset-admin" created rolebinding "logging-deployer-edit-role" created rolebinding "logging-deployer-dsadmin-role" created --> Failed [root@anli-working ha2]# oc policy add-role-to-user edit --serviceaccount logging-deployer [root@anli-working ha2]# oc policy add-role-to-user daemonset-admin --serviceaccount logging-deployer [root@anli-working ha2]# oadm policy add-cluster-role-to-user oauth-editor system:serviceaccount:logging:logging-deployer 6. upgrade logging. [root@anli-working ha2]# oc new-app logging-deployer-template -p ENABLE_OPS_CLUSTER=true,IMAGE_PREFIX=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/,KIBANA_HOSTNAME=kibana.0823-voo.qe.rhcloud.com,KIBANA_OPS_HOSTNAME=kibana-ops.0823-voo.qe.rhcloud.com,PUBLIC_MASTER_URL=https://openshift-166.lab.sjc.redhat.com:443,ES_INSTANCE_RAM=2048M,MASTER_URL=https://openshift-166.lab.sjc.redhat.com:443,MODE=upgrade --> Deploying template "logging-deployer-template" in project "openshift" logging-deployer-template --------- Template for running the aggregated logging deployer in a pod. Requires empowered 'logging-deployer' service account. * With parameters: * MODE=upgrade * IMAGE_PREFIX=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ * IMAGE_VERSION=latest * IMAGE_PULL_SECRET= * INSECURE_REGISTRY=false * ENABLE_OPS_CLUSTER=true * KIBANA_HOSTNAME=kibana.0823-voo.qe.rhcloud.com * KIBANA_OPS_HOSTNAME=kibana-ops.0823-voo.qe.rhcloud.com * PUBLIC_MASTER_URL=https://openshift-166.lab.sjc.redhat.com:443 * MASTER_URL=https://openshift-166.lab.sjc.redhat.com:443 * ES_CLUSTER_SIZE=1 * ES_INSTANCE_RAM=2048M * ES_PVC_SIZE= * ES_PVC_PREFIX=logging-es- * ES_PVC_DYNAMIC= * ES_NODE_QUORUM= * ES_RECOVER_AFTER_NODES= * ES_RECOVER_EXPECTED_NODES= * ES_RECOVER_AFTER_TIME=5m * ES_OPS_CLUSTER_SIZE= * ES_OPS_INSTANCE_RAM=8G * ES_OPS_PVC_SIZE= * ES_OPS_PVC_PREFIX=logging-es-ops- * ES_OPS_PVC_DYNAMIC= * ES_OPS_NODE_QUORUM= * ES_OPS_RECOVER_AFTER_NODES= * ES_OPS_RECOVER_EXPECTED_NODES= * ES_OPS_RECOVER_AFTER_TIME=5m * FLUENTD_NODESELECTOR=logging-infra-fluentd=true * ES_NODESELECTOR= * ES_OPS_NODESELECTOR= * KIBANA_NODESELECTOR= * KIBANA_OPS_NODESELECTOR= * CURATOR_NODESELECTOR= * CURATOR_OPS_NODESELECTOR= --> Creating resources with label app=logging-deployer-template ... pod "logging-deployer-67o1q" created --> Success Run 'oc status' to view your app. 7. check pod status. [root@anli-working ha2]# oc get pods NAME READY STATUS RESTARTS AGE logging-deployer-67o1q 1/1 Running 0 3m logging-es-2ct7rh6u-2-0czhe 1/1 Running 0 31m logging-es-ops-hrp6pnho-1-tfmpe 1/1 Running 0 31m logging-kibana-1-ywbq2 2/2 Terminating 0 32m logging-kibana-ops-1-syq3k 2/2 Terminating 2 29m NAME READY STATUS RESTARTS AGE logging-curator-2-deploy 0/1 ContainerCreating 0 1m logging-deployer-67o1q 1/1 Running 0 5m logging-es-2ct7rh6u-3-deploy 1/1 Running 0 1m logging-kibana-2-deploy 1/1 Running 0 1m [root@anli-working ha2]# oc get pods NAME READY STATUS RESTARTS AGE logging-curator-2-deploy 1/1 Running 0 1m logging-deployer-67o1q 1/1 Running 0 5m [root@anli-working ha2]# oc get pods NAME READY STATUS RESTARTS AGE logging-deployer-67o1q 1/1 Running 0 5m [root@anli-working ha2]# oc get pods NAME READY STATUS RESTARTS AGE logging-curator-1-deploy 1/1 Running 0 1m logging-curator-1-g5mgc 0/1 RunContainerError 1 1m logging-curator-ops-1-bqfe2 0/1 RunContainerError 0 1m logging-curator-ops-1-deploy 1/1 Running 0 1m logging-deployer-67o1q 0/1 Error 0 7m [root@anli-working ha2]# oc get pods NAME READY STATUS RESTARTS AGE logging-curator-1-deploy 1/1 Running 0 8m logging-curator-1-g5mgc 0/1 CrashLoopBackOff 6 7m 0/1 CrashLoopBackOff 6 8m logging-curator-ops-1-deploy 1/1 Running 0 8m logging-deployer-67o1q 0/1 Error 0 14m [root@anli-working ha2]# oc get pods NAME READY STATUS RESTARTS AGE logging-curator-1-deploy 0/1 Error 0 14m logging-curator-ops-1-deploy 0/1 Error 0 14m logging-deployer-67o1q 0/1 Error 0 20m 8. check the deployer logs. please refer to the attached files. oc logs logging-curator-1-deploy >logging-curator-1-deploy.logs oc logs logging-curator-ops-1-deploy >logging-curator-ops-1-deploy.logs oc logs logging-deployer-67o1q >logging-deployer-67o1q.logs oc describe logging-curator-1-g5mgc oc describe logging-curator-1-g5mgc.describe Actual results: tailf logging-deployer-67o1q.logs + oc patch deploymentconfig/logging-es-2ct7rh6u --type=json --patch '[{"op": "replace", "path": "/spec/template/spec/containers/0/volumeMounts/0/mountPath", "value": "/etc/elasticsearch/secret"},{"op": "add", "path": "/spec/template/spec/containers/0/volumeMounts/1", "value": {"name": "elasticsearch-config", "mountPath": "/usr/share/elasticsearch/config", "readOnly": true}},{"op": "add", "path": "/spec/template/spec/volumes/1", "value": {"name": "elasticsearch-config", "configMap": {"name": "logging-elasticsearch"}}}]' "logging-es-2ct7rh6u" patched + oc deploy deploymentconfig/logging-es-2ct7rh6u --latest Error from server: Operation cannot be fulfilled on deploymentconfigs "logging-es-2ct7rh6u": the object has been modified; please apply your changes to the latest version and try again In logging-curator-1-g5mgc.describe 3m 3m 1 {kubelet openshift-155.lab.sjc.redhat.com} spec.containers{curator} Warning Failed Failed to start container with docker id de85f02852f4 with error: Error response from daemon: Cannot start container de85f02852f4f76fabf4752a4b076e6cacf6aa7f470cde4ae8adb57c4bd0196c: [9] System error: invalid character '}' looking for beginning of value 2m 2m 1 {kubelet openshift-155.lab.sjc.redhat.com} spec.containers{curator} Normal Created Created container with docker id 8d6946ca3815 2m 2m 1 {kubelet openshift-155.lab.sjc.redhat.com} spec.containers{curator} Warning Failed Failed to start container with docker id 8d6946ca3815 with error: Error response from daemon: Cannot start container 8d6946ca3815b85459e5f4a1dda88842773aa4a8e0270b446f680b260ada395f: [9] System error: invalid character '}' looking for beginning of value 2m 2m 1 {kubelet openshift-155.lab.sjc.redhat.com} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "curator" with RunContainerError: "runContainer: Error response from daemon: Cannot start container 8d6946ca3815b85459e5f4a1dda88842773aa4a8e0270b446f680b260ada395f: [9] System error: invalid character '}' looking for beginning of value" In logging-curator-ops-1-bqfe2.describe 6m 6m 1 {kubelet openshift-114.lab.sjc.redhat.com} spec.containers{curator} Warning Failed Failed to start container with docker id 1620eda5c6be with error: Error response from daemon: Cannot start container 1620eda5c6be602917675e02debbac2e26b503924ff87a135912cca8afc5b261: [9] System error: invalid character '}' looking for beginning of value 6m 6m 1 {kubelet openshift-114.lab.sjc.redhat.com} spec.containers{curator} Normal Created Created container with docker id 2e366425c3d8 6m 6m 1 {kubelet openshift-114.lab.sjc.redhat.com} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "curator" with RunContainerError: "runContainer: Error response from daemon: Cannot start container 2e366425c3d83954faf323937db2f575725835739247593fe992a7164221fb55: [9] System error: invalid character '}' looking for beginning of value" 6m 6m 1 {kubelet openshift-114.lab.sjc.redhat.com} spec.containers{curator} Warning Failed Failed to start container with docker id 2e366425c3d8 with error: Error response from daemon: Cannot start container 2e366425c3d83954faf323937db2f575725835739247593fe992a7164221fb55: [9] System error: invalid character '}' looking for beginning of value 6m 6m 1 {kubelet openshift-114.lab.sjc.redhat.com} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "curator" with CrashLoopBackOff: "Back-off 20s restarting failed container=curator pod=logging-curator-ops-1-bqfe2_logging(db3b6ad0-70b8-11e6-b266-fa163e493d67)" 5m 5m 1 {kubelet openshift-114.lab.sjc.redhat.com} spec.containers{curator} Normal Created Created container with docker id 95341f392f63 5m 5m 1 {kubelet openshift-114.lab.sjc.redhat.com} spec.containers{curator} Warning Failed Failed to start container with docker id 95341f392f63 with error: Error response from daemon: Cannot start container 95341f392f6394306d3dcda45b860c13a81d4afa2ab27524dd666bf2dc6abb56: [9] System error: could not synchronise with container process 5m 5m 1 {kubelet openshift-114.lab.sjc.redhat.com} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "curator" with RunContainerError: "runContainer: Error response from daemon: Cannot start container 95341f392f6394306d3dcda45b860c13a81d4afa2ab27524dd666bf2dc6abb56: [9] System error: could not synchronise with container process" 5m 5m 3 {kubelet openshift-114.lab.sjc.redhat.com} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "curator" with CrashLoopBackOff: "Back-off 40s restarting failed container=curator pod=logging-curator-ops-1-bqfe2_logging(db3b6ad0-70b8-11e6-b266-fa163e493d67)" Expected results: Additional info:
What was the IMAGE_PREFIX when installing 3.2? I ask because obviously the upgraded version is internal-only, and I'm wondering if a different registry was used for the initial install. The upgrade is in-place, and though it does update image tags, it doesn't update image names/repos (unfortunately... maybe it should), so the upgrade could be trying to deploy e.g. registry.access.redhat.com/openshift3/logging-curator:3.3.0 and that will not resolve until release. In other words, you can't upgrade to a different IMAGE_PREFIX. A full describe of one of the upgraded DCs and/or full list of namespace events would probably shed light on what it's trying to do.
I just ran an upgrade from 3.2.1 to 3.3.0 for aggregated logging and wasn't able to recreate this -- I see that the deployer completed successfully. It looks like an oc deploy failed during the deployer for one of the es deployment configs. If you look, you see that the deployer pod for your MODE=upgrade is in status ERROR. A list of events from that time as Luke is requesting would be useful.
Yes, I used a different docker-registry server during upgrade, registry.access.redhat.com-> brew-pulp-docker01.web.prod.ext.phx2.redhat.com
Anping, can you verify if rerunning this test with first installing logging 3.2 from the brew-pulp repo and then upgrade to 3.3 using the same repo still causes you to see this issue? When I ran my test and was unable to recreate, it was while using the same repo for 3.2.1 and 3.3
I'm tempted to close this NOTABUG, but maybe we should keep it around to remind ourselves to handle this situation (upgrading to a different registry) better.
(In reply to ewolinet from comment #4) > Anping, can you verify if rerunning this test with first installing logging > 3.2 from the brew-pulp repo and then upgrade to 3.3 using the same repo > still causes you to see this issue? > > When I ran my test and was unable to recreate, it was while using the same > repo for 3.2.1 and 3.3 If rerun with install configure, the upgrade works well.
The issue will be addressed in repo,so QA move to verified. please feel free to open it if need,