1552744 – [starter-us-west-1] error during logging upgrade patch operation

Bug 1552744 - [starter-us-west-1] error during logging upgrade patch operation

Summary: [starter-us-west-1] error during logging upgrade patch operation

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	3.9.0
Assignee:	ewolinet
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-03-07 16:26 UTC by Justin Pierce
Modified:	2018-12-13 19:27 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Incorrectly generating patch files for logging configmaps Consequence: It would fail when going to apply the file because reference lines weren't there. Fix: We changed how we handled whitelisted lines to still prevent them from ending up in patch files that were generated but still allowed the patch to be applied after. Result: We correctly patch logging configmaps based on changes from current deployments.
Clone Of:
Environment:
Last Closed:	2018-12-13 19:26:59 UTC
Target Upstream Version:
Embargoed:
Flags:	jupierce: needinfo-

Attachments	(Terms of Use)
Patch files from control host (1.43 KB, application/x-gzip) 2018-03-07 16:26 UTC, Justin Pierce	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:3748	0	None	None	None	2018-12-13 19:27:10 UTC

Description Justin Pierce 2018-03-07 16:26:49 UTC

Created attachment 1405454 [details]
Patch files from control host

Description of problem:
During a standard openshift-ansible upgrade of logging, an error was reported:

TASK [openshift_logging : command] *********************************************
Wednesday 07 March 2018  15:13:17 +0000 (0:00:00.389)       0:01:54.715 ******* 
fatal: [54.193.4.223 -> localhost]: FAILED! => {"changed": true, "cmd": ["patch", "--force", "--quiet", "-u", "/tmp/openshift-logging-ansible-ICBJx8/configmap_new_file", "/tmp/openshift-logging-ansible-ICBJx8/patch.patch"], "delta": "0:00:00.003628", "end": "2018-03-07 15:13:17.665273", "msg": "non-zero return code", "rc": 1, "start": "2018-03-07 15:13:17.661645", "stderr": "", "stderr_lines": [], "stdout": "1 out of 1 hunk FAILED -- saving rejects to file /tmp/openshift-logging-ansible-ICBJx8/configmap_new_file.rej", "stdout_lines": ["1 out of 1 hunk FAILED -- saving rejects to file /tmp/openshift-logging-ansible-ICBJx8/configmap_new_file.rej"]}



Version-Release number of selected component (if applicable):
v3.9.1


Additional info:
Attaching patch information from control host.

Comment 1 ewolinet 2018-03-07 17:04:59 UTC

https://github.com/openshift/openshift-ansible/pull/7423

Comment 4 Anping Li 2018-03-09 09:38:03 UTC

The number_of_shards and number_of_replicas are not set in the configmap logging-elasticsearch. I think the expected values should be same as inventory variable.

1) Inventory variable:
openshift_logging_es_number_of_shards=1
openshift_logging_es_number_of_replicas=1


2) The final configure file

# oc get configmap logging-elasticsearch -o yaml  |head -20
apiVersion: v1
data:
  elasticsearch.yml: |
    cluster:
      name: ${CLUSTER_NAME}

    script:
      inline: on
      indexed: on

    index:
      unassigned.node_left.delayed_timeout: 2m
      translog:
        flush_threshold_size: 256mb
        flush_threshold_period: 5m

    node:
      name: ${DC_NAME}
      master: ${IS_MASTER}
      data: ${HAS_DATA}


3) The final configure file
oc exec logging-es-data-master-ew9eniev-1-g7h77 -- head -20 /usr/share/java/elasticsearch/config/elasticsearch.yml
Defaulting container name to elasticsearch.
Use 'oc describe pod/logging-es-data-master-ew9eniev-1-g7h77 -n logging' to see all of the containers in this pod.
cluster:
  name: ${CLUSTER_NAME}

script:
  inline: on
  indexed: on

index:
  unassigned.node_left.delayed_timeout: 2m
  translog:
    flush_threshold_size: 256mb
    flush_threshold_period: 5m

node:
  name: ${DC_NAME}
  master: ${IS_MASTER}
  data: ${HAS_DATA}
  max_local_storage_nodes: 1

Comment 5 Jeff Cantrill 2018-03-09 14:05:36 UTC

@Eric,

Thoughts here about the fact that the existing deployment is 3.7 (early 3.7) and we are running the 3.9 playbooks to upgrade it.  I have not looked at the details to understand if there is any interplay here that may be causing this to fail. On the surface I believe it should work if we truly support N-1

Comment 6 Justin Pierce 2018-03-09 14:13:20 UTC

@Jeff - since 3.8 will not be an official release, 3.9 playbooks must support upgrading from 3.7->3.9.

Comment 7 Jeff Cantrill 2018-03-09 15:40:39 UTC

@Anping,

Regarding #c4, are you passed the error and the result is the block is missing from the configmap?  Does the upgrade still fail like in #c1?

Comment 8 Jeff Cantrill 2018-03-09 15:59:56 UTC

<anli> jcantril, Just no config values in configmap.

Comment 9 Jeff Cantrill 2018-03-09 16:15:36 UTC

Moving to 3.9.z as the installer failure is resolved but seems to still be an issue in the content of the configmap

Comment 10 ewolinet 2018-03-09 17:56:59 UTC

I think this is actually still a bug and should not be pushed off to the next release. The issue is the regex expression I used to resolve this. I'll have a PR opened to fix this soon.

Comment 11 ewolinet 2018-03-09 18:06:09 UTC

https://github.com/openshift/openshift-ansible/pull/7476

Comment 12 Jeff Cantrill 2018-03-09 19:44:38 UTC

3.9 cherry-pick https://github.com/openshift/openshift-ansible/pull/7479

Comment 14 Anping Li 2018-03-12 06:34:50 UTC

The number of shards and replicas can be overwritten with ansible variables. and the values are kept when no variables are specified. So move to verified.

Test version: ose-ansible/images/v3.9.7-1

Comment 17 errata-xmlrpc 2018-12-13 19:26:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3748

Note You need to log in before you can comment on or make changes to this bug.