Created attachment 1405454 [details] Patch files from control host Description of problem: During a standard openshift-ansible upgrade of logging, an error was reported: TASK [openshift_logging : command] ********************************************* Wednesday 07 March 2018 15:13:17 +0000 (0:00:00.389) 0:01:54.715 ******* fatal: [54.193.4.223 -> localhost]: FAILED! => {"changed": true, "cmd": ["patch", "--force", "--quiet", "-u", "/tmp/openshift-logging-ansible-ICBJx8/configmap_new_file", "/tmp/openshift-logging-ansible-ICBJx8/patch.patch"], "delta": "0:00:00.003628", "end": "2018-03-07 15:13:17.665273", "msg": "non-zero return code", "rc": 1, "start": "2018-03-07 15:13:17.661645", "stderr": "", "stderr_lines": [], "stdout": "1 out of 1 hunk FAILED -- saving rejects to file /tmp/openshift-logging-ansible-ICBJx8/configmap_new_file.rej", "stdout_lines": ["1 out of 1 hunk FAILED -- saving rejects to file /tmp/openshift-logging-ansible-ICBJx8/configmap_new_file.rej"]} Version-Release number of selected component (if applicable): v3.9.1 Additional info: Attaching patch information from control host.
https://github.com/openshift/openshift-ansible/pull/7423
The number_of_shards and number_of_replicas are not set in the configmap logging-elasticsearch. I think the expected values should be same as inventory variable. 1) Inventory variable: openshift_logging_es_number_of_shards=1 openshift_logging_es_number_of_replicas=1 2) The final configure file # oc get configmap logging-elasticsearch -o yaml |head -20 apiVersion: v1 data: elasticsearch.yml: | cluster: name: ${CLUSTER_NAME} script: inline: on indexed: on index: unassigned.node_left.delayed_timeout: 2m translog: flush_threshold_size: 256mb flush_threshold_period: 5m node: name: ${DC_NAME} master: ${IS_MASTER} data: ${HAS_DATA} 3) The final configure file oc exec logging-es-data-master-ew9eniev-1-g7h77 -- head -20 /usr/share/java/elasticsearch/config/elasticsearch.yml Defaulting container name to elasticsearch. Use 'oc describe pod/logging-es-data-master-ew9eniev-1-g7h77 -n logging' to see all of the containers in this pod. cluster: name: ${CLUSTER_NAME} script: inline: on indexed: on index: unassigned.node_left.delayed_timeout: 2m translog: flush_threshold_size: 256mb flush_threshold_period: 5m node: name: ${DC_NAME} master: ${IS_MASTER} data: ${HAS_DATA} max_local_storage_nodes: 1
@Eric, Thoughts here about the fact that the existing deployment is 3.7 (early 3.7) and we are running the 3.9 playbooks to upgrade it. I have not looked at the details to understand if there is any interplay here that may be causing this to fail. On the surface I believe it should work if we truly support N-1
@Jeff - since 3.8 will not be an official release, 3.9 playbooks must support upgrading from 3.7->3.9.
@Anping, Regarding #c4, are you passed the error and the result is the block is missing from the configmap? Does the upgrade still fail like in #c1?
<anli> jcantril, Just no config values in configmap.
Moving to 3.9.z as the installer failure is resolved but seems to still be an issue in the content of the configmap
I think this is actually still a bug and should not be pushed off to the next release. The issue is the regex expression I used to resolve this. I'll have a PR opened to fix this soon.
https://github.com/openshift/openshift-ansible/pull/7476
3.9 cherry-pick https://github.com/openshift/openshift-ansible/pull/7479
The number of shards and replicas can be overwritten with ansible variables. and the values are kept when no variables are specified. So move to verified. Test version: ose-ansible/images/v3.9.7-1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3748