Bug 1552744 - [starter-us-west-1] error during logging upgrade patch operation
Summary: [starter-us-west-1] error during logging upgrade patch operation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 3.9.0
Assignee: ewolinet
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-07 16:26 UTC by Justin Pierce
Modified: 2018-12-13 19:27 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Incorrectly generating patch files for logging configmaps Consequence: It would fail when going to apply the file because reference lines weren't there. Fix: We changed how we handled whitelisted lines to still prevent them from ending up in patch files that were generated but still allowed the patch to be applied after. Result: We correctly patch logging configmaps based on changes from current deployments.
Clone Of:
Environment:
Last Closed: 2018-12-13 19:26:59 UTC
Target Upstream Version:
Embargoed:
jupierce: needinfo-


Attachments (Terms of Use)
Patch files from control host (1.43 KB, application/x-gzip)
2018-03-07 16:26 UTC, Justin Pierce
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:3748 0 None None None 2018-12-13 19:27:10 UTC

Description Justin Pierce 2018-03-07 16:26:49 UTC
Created attachment 1405454 [details]
Patch files from control host

Description of problem:
During a standard openshift-ansible upgrade of logging, an error was reported:

TASK [openshift_logging : command] *********************************************
Wednesday 07 March 2018  15:13:17 +0000 (0:00:00.389)       0:01:54.715 ******* 
fatal: [54.193.4.223 -> localhost]: FAILED! => {"changed": true, "cmd": ["patch", "--force", "--quiet", "-u", "/tmp/openshift-logging-ansible-ICBJx8/configmap_new_file", "/tmp/openshift-logging-ansible-ICBJx8/patch.patch"], "delta": "0:00:00.003628", "end": "2018-03-07 15:13:17.665273", "msg": "non-zero return code", "rc": 1, "start": "2018-03-07 15:13:17.661645", "stderr": "", "stderr_lines": [], "stdout": "1 out of 1 hunk FAILED -- saving rejects to file /tmp/openshift-logging-ansible-ICBJx8/configmap_new_file.rej", "stdout_lines": ["1 out of 1 hunk FAILED -- saving rejects to file /tmp/openshift-logging-ansible-ICBJx8/configmap_new_file.rej"]}



Version-Release number of selected component (if applicable):
v3.9.1


Additional info:
Attaching patch information from control host.

Comment 4 Anping Li 2018-03-09 09:38:03 UTC
The number_of_shards and number_of_replicas are not set in the configmap logging-elasticsearch. I think the expected values should be same as inventory variable.

1) Inventory variable:
openshift_logging_es_number_of_shards=1
openshift_logging_es_number_of_replicas=1


2) The final configure file

# oc get configmap logging-elasticsearch -o yaml  |head -20
apiVersion: v1
data:
  elasticsearch.yml: |
    cluster:
      name: ${CLUSTER_NAME}

    script:
      inline: on
      indexed: on

    index:
      unassigned.node_left.delayed_timeout: 2m
      translog:
        flush_threshold_size: 256mb
        flush_threshold_period: 5m

    node:
      name: ${DC_NAME}
      master: ${IS_MASTER}
      data: ${HAS_DATA}


3) The final configure file
oc exec logging-es-data-master-ew9eniev-1-g7h77 -- head -20 /usr/share/java/elasticsearch/config/elasticsearch.yml
Defaulting container name to elasticsearch.
Use 'oc describe pod/logging-es-data-master-ew9eniev-1-g7h77 -n logging' to see all of the containers in this pod.
cluster:
  name: ${CLUSTER_NAME}

script:
  inline: on
  indexed: on

index:
  unassigned.node_left.delayed_timeout: 2m
  translog:
    flush_threshold_size: 256mb
    flush_threshold_period: 5m

node:
  name: ${DC_NAME}
  master: ${IS_MASTER}
  data: ${HAS_DATA}
  max_local_storage_nodes: 1

Comment 5 Jeff Cantrill 2018-03-09 14:05:36 UTC
@Eric,

Thoughts here about the fact that the existing deployment is 3.7 (early 3.7) and we are running the 3.9 playbooks to upgrade it.  I have not looked at the details to understand if there is any interplay here that may be causing this to fail. On the surface I believe it should work if we truly support N-1

Comment 6 Justin Pierce 2018-03-09 14:13:20 UTC
@Jeff - since 3.8 will not be an official release, 3.9 playbooks must support upgrading from 3.7->3.9.

Comment 7 Jeff Cantrill 2018-03-09 15:40:39 UTC
@Anping,

Regarding #c4, are you passed the error and the result is the block is missing from the configmap?  Does the upgrade still fail like in #c1?

Comment 8 Jeff Cantrill 2018-03-09 15:59:56 UTC
<anli> jcantril, Just no config values in configmap.

Comment 9 Jeff Cantrill 2018-03-09 16:15:36 UTC
Moving to 3.9.z as the installer failure is resolved but seems to still be an issue in the content of the configmap

Comment 10 ewolinet 2018-03-09 17:56:59 UTC
I think this is actually still a bug and should not be pushed off to the next release. The issue is the regex expression I used to resolve this. I'll have a PR opened to fix this soon.

Comment 12 Jeff Cantrill 2018-03-09 19:44:38 UTC
3.9 cherry-pick https://github.com/openshift/openshift-ansible/pull/7479

Comment 14 Anping Li 2018-03-12 06:34:50 UTC
The number of shards and replicas can be overwritten with ansible variables. and the values are kept when no variables are specified. So move to verified.

Test version: ose-ansible/images/v3.9.7-1

Comment 17 errata-xmlrpc 2018-12-13 19:26:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3748


Note You need to log in before you can comment on or make changes to this bug.