Created attachment 1259013 [details] Part of the ansible execution log when upgrade the logging stacks Description of problem: Upgrade logging stacks from 3.3.1 to 3.5.0 by using ansible scripts, after ansible execution process completed successfully, the es pod stayed at 3.3.1 level (while all the other guys including curator,kibana,fluentd are at 3.5.0 level) and reported "java.lang.IllegalArgumentException: Could not resolve placeholder 'NAMESPACE'". Version-Release number of selected component (if applicable): openshift-ansible-3.5.20-1.git.0.5a5fcd5.el7.noarch How reproducible: Always Steps to Reproduce: 1. Install logging 3.3.1 stacks on a OCP 3.5.0 master, attach elasticsearch with the HostPath PV 2. Upgrade logging stacks to 3.5.0 by using ansible scripts (inventory file attached) 3. Check elasticsearch status post upgrade Actual results: Elasticsearch stayed at 3.3.1 level and reported "java.lang.IllegalArgumentException: Could not resolve placeholder 'NAMESPACE'" after logging was upgraded to 3.5.0: # oc get po NAME READY STATUS RESTARTS AGE logging-curator-2-l8jb8 1/1 Running 8 44m logging-deployer-4bjvf 0/1 Completed 0 20h logging-es-5bd3p6ko-3-rkbt9 0/1 Error 9 26m logging-fluentd-8k1m2 1/1 Running 0 46m logging-fluentd-rg7d4 1/1 Running 0 41m logging-kibana-2-4v0hx 2/2 Running 0 44m # oc get dc logging-es-5bd3p6ko -o yaml | grep image image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-elasticsearch:3.3.1 imagePullPolicy: Always # oc logs -f logging-es-5bd3p6ko-3-rkbt9 Comparing the specificed RAM to the maximum recommended for ElasticSearch... Inspecting the maximum RAM available... ES_JAVA_OPTS: '-Des.path.home=/usr/share/java/elasticsearch -Des.config=/usr/share/java/elasticsearch/config/elasticsearch.yml -Xms128M -Xmx512m' {1.5.2}: Setup Failed ... - IllegalArgumentException[Could not resolve placeholder 'NAMESPACE'] java.lang.IllegalArgumentException: Could not resolve placeholder 'NAMESPACE' at org.elasticsearch.common.property.PropertyPlaceholder.parseStringValue(PropertyPlaceholder.java:124) at org.elasticsearch.common.property.PropertyPlaceholder.replacePlaceholders(PropertyPlaceholder.java:81) at org.elasticsearch.common.settings.ImmutableSettings$Builder.replacePropertyPlaceholders(ImmutableSettings.java:1098) at org.elasticsearch.node.internal.InternalSettingsPreparer.prepareSettings(InternalSettingsPreparer.java:101) at org.elasticsearch.bootstrap.Bootstrap.initialSettings(Bootstrap.java:112) at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:183) at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32) Expected results: ES should be running fine after upgrade Additional info: Ansible log (part of it) attached Upgrade inventory file attached
Created attachment 1259029 [details] the inventory file used for logging upgrade to 3.5.0
Xia, Can you please provide the entirety of the ansible playbook output? The logic we would need to look at is where we would generate the ES DC template, the portion of logs you pasted is just the 'oc apply' of them.
So, in the old deployer/scripts/upgrade.sh there is this function: # this is required for the upgrade to ES 2.3.5 function update_es_for_235() { That adds the downward API NAMESPACE var to ES config Do we still need that in the new ansible es_migration.sh?
We already provide that as part of the ES dc we generate: https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_logging/templates/es.j2#L61-L64 The problem it seems like, is that we didn't even generate an ES DC template...
@eric I wonder if we suffer from the same thing that fixed pvc: https://github.com/openshift/openshift-ansible/pull/3548/files#diff-8484225afbf0539375b973fb21b46838R68
(In reply to ewolinet from comment #3) > Xia, > > Can you please provide the entirety of the ansible playbook output? The > logic we would need to look at is where we would generate the ES DC > template, the portion of logs you pasted is just the 'oc apply' of them. No problem, let me redo the upgrade to provide the full log. Just need some more hours for 3.3.1 logging systems to generate log entries on journald log driver system. Will attach it soon later.
Created attachment 1259549 [details] full ansible upgrade log
@ewolinet The original issue was reproduced and full ansible upgrade log was attached.
Thanks Xia, I see this in the log: TASK [openshift_logging : Applying /tmp/openshift-logging-ansible-VcmLKX/templates/logging-logging-es-5a36v5kl-dc.yaml] *** task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging/tasks/oc_apply.yaml:13 Using module file /usr/lib/python2.7/site-packages/ansible/modules/core/commands/command.py <host-8-173-207.host.centralci.eng.rdu2.redhat.com> ESTABLISH SSH CONNECTION FOR USER: root <host-8-173-207.host.centralci.eng.rdu2.redhat.com> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o 'IdentityFile="/root/libra.pem"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r host-8-173-207.host.centralci.eng.rdu2.redhat.com '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo ~/.ansible/tmp/ansible-tmp-1488547884.75-30025641072589 `" && echo ansible-tmp-1488547884.75-30025641072589="` echo ~/.ansible/tmp/ansible-tmp-1488547884.75-30025641072589 `" ) && sleep 0'"'"'' <host-8-173-207.host.centralci.eng.rdu2.redhat.com> PUT /tmp/tmpcn7Mcq TO /root/.ansible/tmp/ansible-tmp-1488547884.75-30025641072589/command.py <host-8-173-207.host.centralci.eng.rdu2.redhat.com> SSH: EXEC sftp -b - -C -o ControlMaster=auto -o ControlPersist=60s -o 'IdentityFile="/root/libra.pem"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r '[host-8-173-207.host.centralci.eng.rdu2.redhat.com]' <host-8-173-207.host.centralci.eng.rdu2.redhat.com> ESTABLISH SSH CONNECTION FOR USER: root <host-8-173-207.host.centralci.eng.rdu2.redhat.com> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o 'IdentityFile="/root/libra.pem"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r host-8-173-207.host.centralci.eng.rdu2.redhat.com '/bin/sh -c '"'"'chmod u+x /root/.ansible/tmp/ansible-tmp-1488547884.75-30025641072589/ /root/.ansible/tmp/ansible-tmp-1488547884.75-30025641072589/command.py && sleep 0'"'"'' <host-8-173-207.host.centralci.eng.rdu2.redhat.com> ESTABLISH SSH CONNECTION FOR USER: root <host-8-173-207.host.centralci.eng.rdu2.redhat.com> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o 'IdentityFile="/root/libra.pem"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r -tt host-8-173-207.host.centralci.eng.rdu2.redhat.com '/bin/sh -c '"'"'/usr/bin/python /root/.ansible/tmp/ansible-tmp-1488547884.75-30025641072589/command.py; rm -rf "/root/.ansible/tmp/ansible-tmp-1488547884.75-30025641072589/" > /dev/null 2>&1 && sleep 0'"'"'' ok: [host-8-173-207.host.centralci.eng.rdu2.redhat.com] => { "changed": false, "cmd": [ "/usr/local/bin/oc", "--config=/tmp/openshift-logging-ansible-VcmLKX/admin.kubeconfig", "apply", "-f", "/tmp/openshift-logging-ansible-VcmLKX/templates/logging-logging-es-5a36v5kl-dc.yaml", "-n", "logging" ], "delta": "0:00:00.167743", "end": "2017-03-03 13:31:24.028327", "failed": false, "failed_when_result": false, "invocation": { "module_args": { "_raw_params": "/usr/local/bin/oc --config=/tmp/openshift-logging-ansible-VcmLKX/admin.kubeconfig apply -f /tmp/openshift-logging-ansible-VcmLKX/templates/logging-logging-es-5a36v5kl-dc.yaml -n logging", "_uses_shell": false, "chdir": null, "creates": null, "executable": null, "removes": null, "warn": true }, "module_name": "command" }, "rc": 1, "start": "2017-03-03 13:31:23.860584", "warnings": [] } STDERR: The DeploymentConfig "logging-es-5a36v5kl" is invalid: * spec.template.spec.volumes[2].hostPath: Forbidden: may not specify more than 1 volume type * spec.template.spec.containers[0].volumeMounts[2].name: Not found: "elasticsearch-storage" I'll log on and see how the DC was generated
So this looks to be due to the fact that the currently deployed ES DC uses a host mount, however the generated DC template for ES uses an emptyDir. So when the role tries to apply it clobbers and then it seems that its left without an 'elasticsearch-storage' volume definition. Current DC snippet: volumeMounts: - mountPath: /etc/elasticsearch/secret name: elasticsearch readOnly: true - mountPath: /usr/share/elasticsearch/config name: elasticsearch-config readOnly: true - mountPath: /elasticsearch/persistent name: elasticsearch-storage volumes: - name: elasticsearch secret: defaultMode: 420 secretName: logging-elasticsearch - configMap: defaultMode: 420 name: logging-elasticsearch name: elasticsearch-config - hostPath: path: /usr/local/es-storage name: elasticsearch-storage Template snippet: volumeMounts: - name: elasticsearch mountPath: /etc/elasticsearch/secret readOnly: true - name: elasticsearch-config mountPath: /usr/share/java/elasticsearch/config readOnly: true - name: elasticsearch-storage mountPath: /elasticsearch/persistent volumes: - name: elasticsearch secret: secretName: logging-elasticsearch - name: elasticsearch-config configMap: name: logging-elasticsearch - name: elasticsearch-storage emptyDir: {}
Prefer hostMount to other storage if exists in facts: https://github.com/openshift/openshift-ansible/pull/3596
1.5 fix in https://github.com/openshift/openshift-ansible/pull/3608
Commit pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/5e952859247d28abe6d5efb794ff6a1f8639000d bug 1428249. Use ES hostmount storage if it exists
blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1431935
Verified with openshift-ansible-3.5.35-1.git.0.7aa4728.el7.noarch, after upgrade, meet the exception described in https://bugzilla.redhat.com/show_bug.cgi?id=1428711 which is considered not to be a real support case. Set to verified since the original issue did not repro.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0903