Description of problem: I am trying to upgrade my system from 3.3 to 3.4 using advanced installation method. The upgrade is complete. I tried to upgrade metrics using procedure described here https://docs.openshift.com/container-platform/3.4/install_config/upgrading/automated_upgrades.html#automated-upgrading-cluster-metrics Metrics deployment failed. Next I cleaned up the openshift-infra project and tried to install metrics afresh as per the docs here https://docs.openshift.com/container-platform/3.4/install_config/cluster_metrics.html#install-config-cluster-metrics The deployment process failed with the following error Deploying Hawkular Metrics & Cassandra Components scripts/hawkular.sh: line 200: STARTUP_TIMEOUT: unbound variable This issue seems to have been fixed in origin in a different context https://bugzilla.redhat.com/show_bug.cgi?id=1395267 https://github.com/openshift/origin-metrics/commit/643515aa28919240c01e5ee70d1d692116f104f6 However I am not sure if it made into the current version. Metrics Deployer Template seems to be dated September although the folder is downloaded yesterday. # ls -l /usr/share/openshift/examples/infrastructure-templates/enterprise/metrics-deployer.yaml -rw-r--r--. 1 root root 5375 Sep 4 22:59 /usr/share/openshift/examples/infrastructure-templates/enterprise/metrics-deployer.yaml # # ls -ld /usr/share/openshift/examples/infrastructure-templates/enterprise drwxr-xr-x. 2 root root 90 Jan 19 22:11 /usr/share/openshift/examples/infrastructure-templates/enterprise Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Clean up openshift-infra project on a 3.4 cluster 2. Deploy metrics following the documentation here https://docs.openshift.com/container-platform/3.4/install_config/cluster_metrics.html#install-config-cluster-metrics In my case I used the following command to start the deployer oc new-app --as=system:serviceaccount:openshift-infra:metrics-deployer -f metrics-deployer.yaml -p HAWKULAR_METRICS_HOSTNAME=hawkular.apps.testv3.osecloud.com -p USE_PERSISTENT_STORAGE=false -p MASTER_URL=https://master.testv3.osecloud.com:8443 -p IMAGE_PREFIX=openshift3/ -p IMAGE_VERSION=v3.4 The deployer starts. But it fails with the following error. " Deploying Hawkular Metrics & Cassandra Components scripts/hawkular.sh: line 200: STARTUP_TIMEOUT: unbound variable “ Actual results: Metrics deployer fails Expected results: Metrics deployer should be successfully completed Additional info: Here are the deployer logs # oc logs -f metrics-deployer-1uasp ++ parse_bool false CONTINUE_ON_ERROR ++ local v=false ++ '[' false '!=' true -a false '!=' false ']' ++ echo false + continue_on_error=false + '[' false == false ']' + set -eu + deployer_mode=deploy + image_prefix=openshift3/ + image_version=v3.4 + master_url=https://master.testv3.osecloud.com:8443 + [[ 3 == \/ ]] ++ parse_bool false REDEPLOY ++ local v=false ++ '[' false '!=' true -a false '!=' false ']' ++ echo false + redeploy=false + '[' false == true ']' + mode=deploy + '[' deploy = redeploy ']' ++ parse_bool false IGNORE_PREFLIGHT ++ local v=false ++ '[' false '!=' true -a false '!=' false ']' ++ echo false + ignore_preflight=false + cassandra_nodes=1 ++ parse_bool false USE_PERSISTENT_STORAGE ++ local v=false ++ '[' false '!=' true -a false '!=' false ']' ++ echo false + use_persistent_storage=false ++ parse_bool false DYNAMICALLY_PROVISION_STORAGE ++ local v=false ++ '[' false '!=' true -a false '!=' false ']' ++ echo false + dynamically_provision_storage=false + cassandra_pv_size=10Gi + metric_duration=7 + user_write_access=false + heapster_node_id=nodename + metric_resolution=15s + project=openshift-infra + master_ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt + token_file=/var/run/secrets/kubernetes.io/serviceaccount/token + dir=/etc/deploy/_output + secret_dir=/secret + rm -rf /etc/deploy/_output + mkdir -p /etc/deploy/_output + chmod 700 /etc/deploy/_output + mkdir -p /secret + chmod 700 /secret chmod: changing permissions of '/secret': Read-only file system + : + hawkular_metrics_hostname=hawkular.apps.testv3.osecloud.com + hawkular_metrics_alias=hawkular-metrics + hawkular_cassandra_alias=hawkular-cassandra ++ date +%s + openshift admin ca create-signer-cert --key=/etc/deploy/_output/ca.key --cert=/etc/deploy/_output/ca.crt --serial=/etc/deploy/_output/ca.serial.txt --name=metrics-signer@1484886470 + '[' -n 1 ']' + oc config set-cluster master --api-version=v1 --certificate-authority=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt --server=https://master.testv3.osecloud.com:8443 cluster "master" set. ++ cat /var/run/secrets/kubernetes.io/serviceaccount/token + oc config set-credentials account --token=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtaW5mcmEiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoibWV0cmljcy1kZXBsb3llci10b2tlbi1scnZ4NiIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJtZXRyaWNzLWRlcGxveWVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiOTlkZTI2YzQtZGVjOC0xMWU2LWFlZTktZmExNjNlNjlhMTk3Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1pbmZyYTptZXRyaWNzLWRlcGxveWVyIn0.svwTz9KivfoHFibn3dLKa6OdVwXNmIKUSELO-fNNK288rTbeuCBloUANtwEF2TtZdrjELtVmnCNgHeGz_rgHfIQPsUIT_JAhHGMBHMO4qDbCKSORou8cgJb8AUTGiADg_A9sGo7RqBuGNgj79TdjDOZpMlE6m4NYjY4tarq1d3mlLzbQb1xOetsDdDL8K9QHu4nL6H44pOkE6c2MIXLrB71ZFBZx4j2B4hkPVPq_5-aKA-0dM8yXYJ0PiFz895ntAT8lOUixahzwar4OKxeS-mN-4AfWTpnx-vrHQsk8D5QeqK-O8M2Pfow90mvTFBz-YTcsQl6k6R7otM6pZGw7QA user "account" set. + oc config set-context current --cluster=master --user=account --namespace=openshift-infra context "current" set. + oc config use-context current switched to context "current". + old_kc=/etc/deploy/.kubeconfig + KUBECONFIG=/etc/deploy/_output/kube.conf + '[' -z 1 ']' + oc config set-cluster deployer-master --api-version=v1 --certificate-authority=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt --server=https://master.testv3.osecloud.com:8443 cluster "deployer-master" set. ++ cat /var/run/secrets/kubernetes.io/serviceaccount/token + oc config set-credentials deployer-account --token=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtaW5mcmEiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoibWV0cmljcy1kZXBsb3llci10b2tlbi1scnZ4NiIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJtZXRyaWNzLWRlcGxveWVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiOTlkZTI2YzQtZGVjOC0xMWU2LWFlZTktZmExNjNlNjlhMTk3Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1pbmZyYTptZXRyaWNzLWRlcGxveWVyIn0.svwTz9KivfoHFibn3dLKa6OdVwXNmIKUSELO-fNNK288rTbeuCBloUANtwEF2TtZdrjELtVmnCNgHeGz_rgHfIQPsUIT_JAhHGMBHMO4qDbCKSORou8cgJb8AUTGiADg_A9sGo7RqBuGNgj79TdjDOZpMlE6m4NYjY4tarq1d3mlLzbQb1xOetsDdDL8K9QHu4nL6H44pOkE6c2MIXLrB71ZFBZx4j2B4hkPVPq_5-aKA-0dM8yXYJ0PiFz895ntAT8lOUixahzwar4OKxeS-mN-4AfWTpnx-vrHQsk8D5QeqK-O8M2Pfow90mvTFBz-YTcsQl6k6R7otM6pZGw7QA user "deployer-account" set. + oc config set-context deployer-context --cluster=deployer-master --user=deployer-account --namespace=openshift-infra context "deployer-context" set. + '[' -n 1 ']' + oc config use-context deployer-context switched to context "deployer-context". + case $deployer_mode in + '[' false '!=' true ']' + validate_preflight + set +x PREFLIGHT CHECK SUCCEEDED validate_master_accessible: ok validate_hostname: The HAWKULAR_METRICS_HOSTNAME value is deemed acceptable. validate_deployer_secret: ok Generating randomized passwords for the Hawkular Metrics and Cassandra keystores and truststores Creating the Hawkular Metrics keystore from the PEM file Entry for alias hawkular-metrics successfully imported. Import command completed: 1 entries successfully imported, 0 entries failed or cancelled [Storing /etc/deploy/_output/hawkular-metrics.keystore] Creating the Hawkular Cassandra keystore from the PEM file Entry for alias hawkular-cassandra successfully imported. Import command completed: 1 entries successfully imported, 0 entries failed or cancelled [Storing /etc/deploy/_output/hawkular-cassandra.keystore] Creating the Hawkular Metrics Certificate Certificate stored in file </etc/deploy/_output/hawkular-metrics.cert> Creating the Hawkular Cassandra Certificate Certificate stored in file </etc/deploy/_output/hawkular-cassandra.cert> Importing the Hawkular Metrics Certificate into the Cassandra Truststore Certificate was added to keystore [Storing /etc/deploy/_output/hawkular-cassandra.truststore] Importing the Hawkular Cassandra Certificate into the Hawkular Metrics Truststore Certificate was added to keystore [Storing /etc/deploy/_output/hawkular-metrics.truststore] Importing the Hawkular Cassandra Certificate into the Cassandra Truststore Certificate was added to keystore [Storing /etc/deploy/_output/hawkular-cassandra.truststore] Importing the CA Certificate into the Cassandra Truststore Certificate was added to keystore [Storing /etc/deploy/_output/hawkular-cassandra.truststore] Certificate was added to keystore [Storing /etc/deploy/_output/hawkular-cassandra.truststore] Certificate was added to keystore [Storing /etc/deploy/_output/hawkular-cassandra.truststore] Importing the CA Certificate into the Hawkular Metrics Truststore Certificate was added to keystore [Storing /etc/deploy/_output/hawkular-metrics.truststore] Certificate was added to keystore [Storing /etc/deploy/_output/hawkular-metrics.truststore] Certificate was added to keystore [Storing /etc/deploy/_output/hawkular-metrics.truststore] Adding password for user hawkular Generating the JGroups Keystore Creating the Hawkular Metrics Secrets configuration json file Creating the Hawkular Metrics Certificate Secrets configuration json file Creating the Hawkular Metrics User Account Secrets Creating the Cassandra Secrets configuration file Creating the Cassandra Certificate Secrets configuration json file Creating Hawkular Metrics & Cassandra Secrets secret "hawkular-metrics-secrets" created secret "hawkular-metrics-certificate" created secret "hawkular-metrics-account" created secret "hawkular-cassandra-secrets" created secret "hawkular-cassandra-certificate" created Creating Hawkular Metrics & Cassandra Templates template "hawkular-metrics" created template "hawkular-cassandra-services" created template "hawkular-cassandra-node-pv" created template "hawkular-cassandra-node-dynamic-pv" created template "hawkular-cassandra-node-emptydir" created template "hawkular-support" created Deploying Hawkular Metrics & Cassandra Components scripts/hawkular.sh: line 200: STARTUP_TIMEOUT: unbound variable
It seems the latest version of metrics-deployer is not downloaded with atomic-openshift-utils(?). I downloaded this from GitHub wget https://raw.githubusercontent.com/openshift/openshift-ansible/master/roles/openshift_hosted_templates/files/v1.4/enterprise/metrics-deployer.yaml and tried this version of metrics-deployer and metrics got successfully deployed.
This is a just a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1416213 Or did the v1.4 metrics deployer not get brought into the system?
Also, setting this back to deployments, as its not directly related to the metric images but an installation problem.
yes.. this is an installation problem. The version of metrics-deployer.yaml is incorrect.
(In reply to Matt Wringe from comment #3) > Also, setting this back to deployments, as its not directly related to the > metric images but an installation problem. I don't think the PM team know how to fix: scripts/hawkular.sh: line 200: STARTUP_TIMEOUT: unbound variable
@Veer, Can you provide information from where the template was sourced? Did it come in from installing an rpm? Can you provide additional details and version information.
"scripts/hawkular.sh: line 200: STARTUP_TIMEOUT: unbound variable" is because you are using the wrong template. Which is why I was asking if this is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1416213 in https://bugzilla.redhat.com/show_bug.cgi?id=1415268#c2 This is not an issue with the metrics images themselves, but with the template that is being used. Either with the docs, or with how ansible is being used here.
@Matt, The template that gets downloaded the box with 3.4 (I think when you install atomic-openshift-utils) is incorrect. /usr/share/openshift/examples/infrastructure-templates/enterprise/metrics-deployer.yaml That is exactly what I showed in my description above. It is dated Sep 4, whereas the defect was fixed later.
Adding https://bugzilla.redhat.com/show_bug.cgi?id=1418700, we way have a way to update these files to Installer and have a way to document this.
(In reply to Veer Muchandi from comment #11) > @Matt, The template that gets downloaded the box with 3.4 (I think when you > install atomic-openshift-utils) is incorrect. > /usr/share/openshift/examples/infrastructure-templates/enterprise/metrics- > deployer.yaml > > That is exactly what I showed in my description above. It is dated Sep 4, > whereas the defect was fixed later. So is this already fixed then by updating the version of ansible you are using? or is the file on your system still bad? The fix is already in https://github.com/openshift/openshift-ansible/blob/release-1.4/roles/openshift_hosted_templates/files/v1.4/origin/metrics-deployer.yaml I am not sure what else the metric team needs to do with this, or if it can be closed as being fixed already.
(In reply to Matt Wringe from comment #14) > (In reply to Veer Muchandi from comment #11) > > @Matt, The template that gets downloaded the box with 3.4 (I think when you > > install atomic-openshift-utils) is incorrect. > > /usr/share/openshift/examples/infrastructure-templates/enterprise/metrics- > > deployer.yaml > > > > That is exactly what I showed in my description above. It is dated Sep 4, > > whereas the defect was fixed later. > > So is this already fixed then by updating the version of ansible you are > using? or is the file on your system still bad? When I opened this defect, metrics-deployer.yaml was incorrect. I was able to get the right version and use it. If this file is changed and if yum install atomic-openshift-utils get the right version downloaded (I guess that is how it is copied to the box), then we should be good. A good way to test is to run the installer and see that the metrics are deployed. > > The fix is already in > https://github.com/openshift/openshift-ansible/blob/release-1.4/roles/ > openshift_hosted_templates/files/v1.4/origin/metrics-deployer.yaml > > I am not sure what else the metric team needs to do with this, or if it can > be closed as being fixed already.