Created attachment 1320886 [details] ansilbe running log Description of problem: Deploy logging 3.7 via ansible, set openshift_logging_image_version=v3.7, but it failed at Invalid version specified for Elasticsearch. maybe "es_version": "3_7" it is the root cause. This issue blocks the whole installation. TASK [openshift_logging_elasticsearch : set_fact] ***************************************************************************************************************************************************************** task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/determine_version.yaml:14 ok: [qe-juzhao-37-gce-master-container-etcd-nfs-1.0831-gwf.qe.rhcloud.com] => { "ansible_facts": { "es_version": "3_7" }, "changed": false } TASK [openshift_logging_elasticsearch : fail] ********************************************************************************************************************************************************************* task path: /usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/determine_version.yaml:17 fatal: [qe-juzhao-37-gce-master-container-etcd-nfs-1.0831-gwf.qe.rhcloud.com]: FAILED! => { "changed": false, "failed": true } MSG: Invalid version specified for Elasticsearch to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.retry Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Deploy logging 3.7 via ansible, Inventory file see the "Additional" info part 2. 3. Actual results: Deployment is failed Expected results: Deployment should be successful. Additional info: Inventory file: [OSEv3:children] masters [masters] ${MASTER_URL} openshift_public_hostname=${MASTER_URL} [OSEv3:vars] ansible_ssh_user=root ansible_ssh_private_key_file="~/libra.pem" deployment_type=openshift-enterprise # Logging openshift_logging_install_logging=true openshift_logging_kibana_hostname=kibana.${SUB_DOMAIN} public_master_url=https://${MASTER_URL}:8443 openshift_logging_image_prefix=${IMAGE_PREFIX} openshift_logging_image_version=v3.7 openshift_logging_namespace=logging
Version # rpm -qa | grep openshift-ansible openshift-ansible-lookup-plugins-3.7.0-0.123.0.git.0.248cba6.el7.noarch openshift-ansible-3.7.0-0.123.0.git.0.248cba6.el7.noarch openshift-ansible-callback-plugins-3.7.0-0.123.0.git.0.248cba6.el7.noarch openshift-ansible-filter-plugins-3.7.0-0.123.0.git.0.248cba6.el7.noarch openshift-ansible-playbooks-3.7.0-0.123.0.git.0.248cba6.el7.noarch openshift-ansible-docs-3.7.0-0.123.0.git.0.248cba6.el7.noarch openshift-ansible-roles-3.7.0-0.123.0.git.0.248cba6.el7.noarch This issue blocks the whole installation.
The list of allowed versions specified only 3.5 and 3.6, but given the master is going to become 3.7, I updated the allowed versions. Perhaps in future releases, when we branch out of master and start working on a new version, we can have a "branch out" script that would automate these chores. https://github.com/openshift/openshift-ansible/pull/5297
it seems to be resolved by Jeff already, I missed his comment.
bug fix is not in openshift-ansible-playbooks-3.7.0-0.125.0.git.0.91043b6.el7.noarch
Commit pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/d2bf958251e4092ba90218cd3cc20621483b3057 bug 1487573. Bump the allowed ES versions
Fixed in openshift-ansible-3.7.0-0.125.1, the latest openshift-ansible version is openshift-ansible-3.7.0-0.125.0, will verify this defect when we get the fixed openshift-ansible packages.
Tested with openshift-ansible-3.7.0-0.126.0 and logging-elasticsearch:v3.7.0-0.125.0.0 It does not fail at "Invalid version specified for Elasticsearch", but es pod failed to start up dur to java.lang.IllegalArgumentException # oc logs logging-es-data-master-hm7cbr4f-1-tp76j [2017-09-12 06:53:54,431][INFO ][container.run ] Begin Elasticsearch startup script [2017-09-12 06:53:54,546][INFO ][container.run ] Comparing the specified RAM to the maximum recommended for Elasticsearch... [2017-09-12 06:53:54,548][INFO ][container.run ] Inspecting the maximum RAM available... [2017-09-12 06:53:54,595][INFO ][container.run ] ES_HEAP_SIZE: '512m' [2017-09-12 06:53:54,597][INFO ][container.run ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof [2017-09-12 06:53:54,602][INFO ][container.run ] Checking if Elasticsearch is ready on https://localhost:9200 Exception in thread "main" java.lang.IllegalArgumentException: Unknown Discovery type [kubernetes] at org.elasticsearch.discovery.DiscoveryModule.configure(DiscoveryModule.java:100) at <<<guice>>> at org.elasticsearch.node.Node.<init>(Node.java:213) at org.elasticsearch.node.Node.<init>(Node.java:140) at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:143) at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:194) at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:286) at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:45) Refer to the log for complete error details.
Created attachment 1324725 [details] es pod log
The 3.7 should include new discovery mechanism for ES pods, it was merged into master on Thursday and Friday. brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-elasticsearch latest 9e932136598b 23 hours ago 434.4 MB brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-elasticsearch v3.7 9e29edfea4b3 6 days ago 434.4 MB From what I see in the brew registry, the 'latest' image already has it, the 'v3.7' does not but ansible 'master' branch requires it. We should update the elasticsearch:v3.7 image
Moving back to ON_QA since the 'new' issue in comment#7 is separate from the reported image. Please use older 3.7 images to validate the ansible fix resolves the problem.
Jeff, on the contrary, using older 3.7 image doesn't help because it already is too old. We either need new 3.7 image matching 3.7 openshift-ansible or QE to temporarily set this in openshift-ansible roles/openshift_logging_elasticsearch/templates/elasticsearch.yml.j2 cloud: kubernetes: service: ${SERVICE_DNS}
Close this defect as VERIFIED, since the original issue was fixed, see Comment 7, for the new issue in Comment 7, it is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1491171 now, will open one defect once BZ # 1491171 is fixed and the error still exist.
This most likely is a result of the images being updated from under the deployment without re-running ansible. This can occur if the pull policy for images is set to always pull and the inventory does not explicitly set logging image versions. If the logging-elasticsearch configmap does not have this section: cloud: kubernetes: pod_label: ${POD_LABEL} pod_port: 9300 namespace: ${NAMESPACE} Most likely the problem is they have old configs but new readiness probe. They should be able to workaround by updating the configmap to: cloud: kubernetes: service: ${SERVICE_DNS} namespace: ${NAMESPACE}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188