Bug 1439356 - Attempts to upgrade logging to 3.4.1.12-1 failing
Summary: Attempts to upgrade logging to 3.4.1.12-1 failing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.4.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.4.z
Assignee: Jeff Cantrill
QA Contact: Xia Zhao
URL:
Whiteboard:
: 1439554 1440245 1446499 (view as bug list)
Depends On:
Blocks: 1440245
TreeView+ depends on / blocked
 
Reported: 2017-04-05 19:18 UTC by Brennan Vincello
Modified: 2020-07-16 09:23 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Min master change was not taken into account when performing an upgrade Consequence: The min master config value was set to use an environment value that was never set. Fix: Replace the min master env var with node quorum which has been available since logging inception. Result: ES starts without generating a missing variable exception.
Clone Of:
Environment:
Last Closed: 2017-05-18 09:27:48 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1235 0 normal SHIPPED_LIVE OpenShift Container Platform 3.5, 3.4, 3.3, and 3.1 bug fix update 2017-05-18 13:15:52 UTC

Description Brennan Vincello 2017-04-05 19:18:39 UTC
Description of problem:

Attempting to upgrade the logging stack from 3.4.1.7 to 3.4.1.12-1

Version-Release number of selected component (if applicable): 3.4.1.12-1

How reproducible: Very

Steps to Reproduce:
1. oc new-app logging-deployer-template -p MODE=upgrade -p IMAGE_VERSION=v3.4 -n openshift-logging 

Actual results:

ES is failing to start with:

Comparing the specificed RAM to the maximum recommended for ElasticSearch...
Inspecting the maximum RAM available...
ES_JAVA_OPTS: '-Dmapper.allow_dots_in_name=true -Xms128M -Xmx2048m'
Exception in thread "main" java.lang.IllegalArgumentException: Could not resolve placeholder 'MIN_MASTERS'
	at org.elasticsearch.common.property.PropertyPlaceholder.parseStringValue(PropertyPlaceholder.java:128)
	at org.elasticsearch.common.property.PropertyPlaceholder.replacePlaceholders(PropertyPlaceholder.java:81)
	at org.elasticsearch.common.settings.Settings$Builder.replacePropertyPlaceholders(Settings.java:1179)
	at org.elasticsearch.node.internal.InternalSettingsPreparer.initializeSettings(InternalSettingsPreparer.java:131)
	at org.elasticsearch.node.internal.InternalSettingsPreparer.prepareEnvironment(InternalSettingsPreparer.java:100)
	at org.elasticsearch.common.cli.CliTool.<init>(CliTool.java:107)
	at org.elasticsearch.common.cli.CliTool.<init>(CliTool.java:100)
	at org.elasticsearch.bootstrap.BootstrapCLIParser.<init>(BootstrapCLIParser.java:48)
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:242)
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)

Expected results:

ES starts.

Additional info:

Comment 1 Jeff Cantrill 2017-04-05 20:10:22 UTC
How are you installing the dependent template?  It should be the same as:

https://github.com/openshift/origin-aggregated-logging/blob/release-1.4/deployer/templates/es.yaml

Comment 2 Boris Kurktchiev 2017-04-05 20:22:58 UTC
The docs ask to import the following logging-deployer template:
root@osmaster0d:~/ocp-configs/ansible:
----> cat /usr/share/openshift/hosted/logging-deployer.yaml
apiVersion: "v1"
kind: "List"
items:
-
  apiVersion: "v1"
  kind: "Template"
  metadata:
    name: logging-deployer-account-template
    annotations:
      description: "Template for creating the deployer account and roles needed for the aggregated logging deployer. Create as cluster-admin."
      tags: "infrastructure"
  objects:
  -
    apiVersion: v1
    kind: ServiceAccount
    name: logging-deployer
    metadata:
      name: logging-deployer
      labels:
        logging-infra: deployer
        provider: openshift
        component: deployer
  -
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: aggregated-logging-kibana
  -
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: aggregated-logging-elasticsearch
  -
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: aggregated-logging-fluentd
  -
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: aggregated-logging-curator
  - apiVersion: v1
    kind: ClusterRole
    metadata:
      name: oauth-editor
    rules:
    - resources:
      - oauthclients
      verbs:
      - create
      - delete
  - apiVersion: v1
    kind: ClusterRole
    metadata:
      name: daemonset-admin
    rules:
    - resources:
      - daemonsets
      apiGroups:
      - extensions
      verbs:
      - create
      - get
      - list
      - watch
      - delete
      - update
  - apiVersion: v1
    kind: ClusterRole
    metadata:
      name: rolebinding-reader
    rules:
    - resources:
      - clusterrolebindings
      verbs:
      - get
  -
    apiVersion: v1
    kind: RoleBinding
    metadata:
      name: logging-deployer-edit-role
    roleRef:
      name: edit
    subjects:
    - kind: ServiceAccount
      name: logging-deployer
  -
    apiVersion: v1
    kind: RoleBinding
    metadata:
      name: logging-deployer-dsadmin-role
    roleRef:
      name: daemonset-admin
    subjects:
    - kind: ServiceAccount
      name: logging-deployer
  -
    apiVersion: v1
    kind: RoleBinding
    metadata:
      name: logging-elasticsearch-view-role
    roleRef:
      name: view
    subjects:
    - kind: ServiceAccount
      name: aggregated-logging-elasticsearch
-
  apiVersion: "v1"
  kind: "Template"
  metadata:
    name: logging-deployer-template
    annotations:
      description: "Template for running the aggregated logging deployer in a pod. Requires empowered 'logging-deployer' service account."
      tags: "infrastructure"
  labels:
    logging-infra: deployer
    provider: openshift
  objects:
  -
    apiVersion: v1
    kind: Pod
    metadata:
      generateName: logging-deployer-
    spec:
      containers:
      - image: ${IMAGE_PREFIX}logging-deployer:${IMAGE_VERSION}
        imagePullPolicy: Always
        name: deployer
        volumeMounts:
        - name: empty
          mountPath: /etc/deploy
        env:
          - name: PROJECT
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: IMAGE_PREFIX
            value: ${IMAGE_PREFIX}
          - name: IMAGE_VERSION
            value: ${IMAGE_VERSION}
          - name: IMAGE_PULL_SECRET
            value: ${IMAGE_PULL_SECRET}
          - name: INSECURE_REGISTRY
            value: ${INSECURE_REGISTRY}
          - name: ENABLE_OPS_CLUSTER
            value: ${ENABLE_OPS_CLUSTER}
          - name: KIBANA_HOSTNAME
            value: ${KIBANA_HOSTNAME}
          - name: KIBANA_OPS_HOSTNAME
            value: ${KIBANA_OPS_HOSTNAME}
          - name: PUBLIC_MASTER_URL
            value: ${PUBLIC_MASTER_URL}
          - name: MASTER_URL
            value: ${MASTER_URL}
          - name: ES_INSTANCE_RAM
            value: ${ES_INSTANCE_RAM}
          - name: ES_PVC_SIZE
            value: ${ES_PVC_SIZE}
          - name: ES_PVC_PREFIX
            value: ${ES_PVC_PREFIX}
          - name: ES_PVC_DYNAMIC
            value: ${ES_PVC_DYNAMIC}
          - name: ES_CLUSTER_SIZE
            value: ${ES_CLUSTER_SIZE}
          - name: ES_NODE_QUORUM
            value: ${ES_NODE_QUORUM}
          - name: ES_RECOVER_AFTER_NODES
            value: ${ES_RECOVER_AFTER_NODES}
          - name: ES_RECOVER_EXPECTED_NODES
            value: ${ES_RECOVER_EXPECTED_NODES}
          - name: ES_RECOVER_AFTER_TIME
            value: ${ES_RECOVER_AFTER_TIME}
          - name: ES_OPS_INSTANCE_RAM
            value: ${ES_OPS_INSTANCE_RAM}
          - name: ES_OPS_PVC_SIZE
            value: ${ES_OPS_PVC_SIZE}
          - name: ES_OPS_PVC_PREFIX
            value: ${ES_OPS_PVC_PREFIX}
          - name: ES_OPS_PVC_DYNAMIC
            value: ${ES_OPS_PVC_DYNAMIC}
          - name: ES_OPS_CLUSTER_SIZE
            value: ${ES_OPS_CLUSTER_SIZE}
          - name: ES_OPS_NODE_QUORUM
            value: ${ES_OPS_NODE_QUORUM}
          - name: ES_OPS_RECOVER_AFTER_NODES
            value: ${ES_OPS_RECOVER_AFTER_NODES}
          - name: ES_OPS_RECOVER_EXPECTED_NODES
            value: ${ES_OPS_RECOVER_EXPECTED_NODES}
          - name: ES_OPS_RECOVER_AFTER_TIME
            value: ${ES_OPS_RECOVER_AFTER_TIME}
          - name: FLUENTD_NODESELECTOR
            value: ${FLUENTD_NODESELECTOR}
          - name: ES_NODESELECTOR
            value: ${ES_NODESELECTOR}
          - name: ES_OPS_NODESELECTOR
            value: ${ES_OPS_NODESELECTOR}
          - name: KIBANA_NODESELECTOR
            value: ${KIBANA_NODESELECTOR}
          - name: KIBANA_OPS_NODESELECTOR
            value: ${KIBANA_OPS_NODESELECTOR}
          - name: CURATOR_NODESELECTOR
            value: ${CURATOR_NODESELECTOR}
          - name: CURATOR_OPS_NODESELECTOR
            value: ${CURATOR_OPS_NODESELECTOR}
          - name: MODE
            value: ${MODE}
      dnsPolicy: ClusterFirst
      restartPolicy: Never
      serviceAccount: logging-deployer
      volumes:
      - name: empty
        emptyDir: {}
  parameters:
  -
    description: "The mode that the deployer runs in."
    name: MODE
    value: "install"
  -
    description: 'Specify prefix for logging components; e.g. for "registry.access.redhat.com/openshift3/logging-deployer:3.4.0", set prefix "registry.access.redhat.com/openshift3/"'
    name: IMAGE_PREFIX
    value: "registry.access.redhat.com/openshift3/"
  -
    description: 'Specify version for logging components; e.g. for "registry.access.redhat.com/openshift3/logging-deployer:3.4.0", set version "3.4.0"'
    name: IMAGE_VERSION
    value: "3.4.0"
  -
    description: "(Deprecated) Specify the name of an existing pull secret to be used for pulling component images from an authenticated registry."
    name: IMAGE_PULL_SECRET
  -
    description: "(Deprecated) Allow the registry for logging component images to be non-secure (not secured with a certificate signed by a known CA)"
    name: INSECURE_REGISTRY
    value: "false"
  -
    description: "(Deprecated) If true, set up to use a second ES cluster for ops logs."
    name: ENABLE_OPS_CLUSTER
    value: "false"
  -
    description: "(Deprecated) External hostname where clients will reach kibana"
    name: KIBANA_HOSTNAME
    value: "kibana.example.com"
  -
    description: "(Deprecated) External hostname at which admins will visit the ops Kibana."
    name: KIBANA_OPS_HOSTNAME
    value: kibana-ops.example.com
  -
    description: "(Deprecated) External URL for the master, for OAuth purposes"
    name: PUBLIC_MASTER_URL
    value: "https://localhost:8443"
  -
    description: "(Deprecated) Internal URL for the master, for authentication retrieval"
    name: MASTER_URL
    value: "https://kubernetes.default.svc.cluster.local"
  -
    description: "(Deprecated) How many instances of ElasticSearch to deploy."
    name: ES_CLUSTER_SIZE
    value: "1"
  -
    description: "(Deprecated) Amount of RAM to reserve per ElasticSearch instance."
    name: ES_INSTANCE_RAM
    value: "8G"
  -
    description: "(Deprecated) Size of the PersistentVolumeClaim to create per ElasticSearch instance, e.g. 100G. If empty, no PVCs will be created and emptyDir volumes are used instead."
    name: ES_PVC_SIZE
  -
    description: "(Deprecated) Prefix for the names of PersistentVolumeClaims to be created; a number will be appended per instance. If they don't already exist, they will be created with size ES_PVC_SIZE."
    name: ES_PVC_PREFIX
    value: "logging-es-"
  -
    description: '(Deprecated) Set to "true" to request dynamic provisioning (if enabled for your cluster) of a PersistentVolume for the ES PVC. '
    name: ES_PVC_DYNAMIC
  -
    description: "(Deprecated) Number of nodes required to elect a master (ES minimum_master_nodes). By default, derived from ES_CLUSTER_SIZE / 2 + 1."
    name: ES_NODE_QUORUM
  -
    description: "(Deprecated) Number of nodes required to be present before the cluster will recover from a full restart. By default, one fewer than ES_CLUSTER_SIZE."
    name: ES_RECOVER_AFTER_NODES
  -
    description: "(Deprecated) Number of nodes desired to be present before the cluster will recover from a full restart. By default, ES_CLUSTER_SIZE."
    name: ES_RECOVER_EXPECTED_NODES
  -
    description: "(Deprecated) Timeout for *expected* nodes to be present when cluster is recovering from a full restart."
    name: ES_RECOVER_AFTER_TIME
    value: "5m"
  -
    description: "(Deprecated) How many ops instances of ElasticSearch to deploy. By default, ES_CLUSTER_SIZE."
    name: ES_OPS_CLUSTER_SIZE
  -
    description: "(Deprecated) Amount of RAM to reserve per ops ElasticSearch instance."
    name: ES_OPS_INSTANCE_RAM
    value: "8G"
  -
    description: "(Deprecated) Size of the PersistentVolumeClaim to create per ElasticSearch ops instance, e.g. 100G. If empty, no PVCs will be created and emptyDir volumes are used instead."
    name: ES_OPS_PVC_SIZE
  -
    description: "(Deprecated) Prefix for the names of PersistentVolumeClaims to be created; a number will be appended per instance. If they don't already exist, they will be created with size ES_OPS_PVC_SIZE."
    name: ES_OPS_PVC_PREFIX
    value: "logging-es-ops-"
  -
    description: '(Deprecated) Set to "true" to request dynamic provisioning (if enabled for your cluster) of a PersistentVolume for the ES ops PVC. '
    name: ES_OPS_PVC_DYNAMIC
  -
    description: "(Deprecated) Number of ops nodes required to elect a master (ES minimum_master_nodes). By default, derived from ES_CLUSTER_SIZE / 2 + 1."
    name: ES_OPS_NODE_QUORUM
  -
    description: "(Deprecated) Number of ops nodes required to be present before the cluster will recover from a full restart. By default, one fewer than ES_OPS_CLUSTER_SIZE."
    name: ES_OPS_RECOVER_AFTER_NODES
  -
    description: "(Deprecated) Number of ops nodes desired to be present before the cluster will recover from a full restart. By default, ES_OPS_CLUSTER_SIZE."
    name: ES_OPS_RECOVER_EXPECTED_NODES
  -
    description: "(Deprecated) Timeout for *expected* ops nodes to be present when cluster is recovering from a full restart."
    name: ES_OPS_RECOVER_AFTER_TIME
    value: "5m"
  -
    description: "(Deprecated) The nodeSelector used for the Fluentd DaemonSet."
    name: FLUENTD_NODESELECTOR
    value: "logging-infra-fluentd=true"
  -
    description: "(Deprecated) Node selector Elasticsearch cluster (label=value)."
    name: ES_NODESELECTOR
    value: ""
  -
    description: "(Deprecated) Node selector Elasticsearch operations cluster (label=value)."
    name: ES_OPS_NODESELECTOR
    value: ""
  -
    description: "(Deprecated) Node selector Kibana cluster (label=value)."
    name: KIBANA_NODESELECTOR
    value: ""
  -
    description: "(Deprecated) Node selector Kibana operations cluster (label=value)."
    name: KIBANA_OPS_NODESELECTOR
    value: ""
  -
    description: "(Deprecated) Node selector Curator (label=value)."
    name: CURATOR_NODESELECTOR
    value: ""
  -
    description: "(Deprecated) Node selector operations Curator (label=value)."
    name: CURATOR_OPS_NODESELECTOR
    value: ""

Post import I just execute the command as stated by Brennan oc new-app logging-deployer-template -p MODE=upgrade -p IMAGE_VERSION=v3.4 -n openshift-logging as the template which gets installed on the system points to 3.4.0 by default and thus not getting the latest releases of the images based on looking at the available tags on registry.access.redhat.com

Comment 4 Jeff Cantrill 2017-04-06 19:25:37 UTC

*** This bug has been marked as a duplicate of bug 1439554 ***

Comment 5 Jeff Cantrill 2017-04-07 19:27:21 UTC
Reopenning in over https://bugzilla.redhat.com/show_bug.cgi?id=1439554 because it more correctly describes the issue.

Comment 6 Boris Kurktchiev 2017-04-07 19:44:21 UTC
OK so to update, this definitely happens when using MODE=upgrade. The workaround during upgrade is to edit the ES DC and add the ENV variable MIN_MASTERS=1 (in this test case I only have 1 ES node). 

Then the deployer finishes deploying as it should and everything works seemingly fine. 

If you do not add the MIN_MASTERS ENV variable the deployer and upgrade scrip eventually times out.

To iterate this is how I am upgrading: oc new-app logging-deployer-template -p MODE=upgrade -p IMAGE_VERSION=v3.4 -n openshift-logging

Comment 9 Jeff Cantrill 2017-04-17 20:48:32 UTC
Manual workaround is to:

1. oc edit configmap logging-elasticsearch
2. Replace the value of zen.minimum_master_nodes to be:  ${NODE_QUORUM}
3. foreach ES deployment config: oc rollout latest $ES_DC

Comment 10 Jeff Cantrill 2017-04-17 20:59:21 UTC
*** Bug 1439554 has been marked as a duplicate of this bug. ***

Comment 11 Jeff Cantrill 2017-04-18 15:50:47 UTC
*** Bug 1440245 has been marked as a duplicate of this bug. ***

Comment 12 Jeff Cantrill 2017-04-26 20:22:33 UTC
Upstream fix: https://github.com/openshift/origin-aggregated-logging/pull/368
Working on merging into ent

Comment 14 Xia Zhao 2017-04-28 09:00:32 UTC
Upgraded from 3.3.1 to 3.4.1, the upgrade deployer pod completes successfully, checked the es dc after upgrade, it contains:

$ oc get dc logging-es-28vo1wb4 -o yaml
        ...
        - name: NODE_QUORUM
          value: "1"

$ oc get configmap logging-elasticsearch -o yaml
    ...

    discovery:
      ...
      zen.minimum_master_nodes: ${NODE_QUORUM}

    gateway:
      expected_master_nodes: ${NODE_QUORUM}
      ...
    
    ...

How ever, encountered this regression, track it here: https://bugzilla.redhat.com/show_bug.cgi?id=1446504

Comment 15 Jeff Cantrill 2017-05-11 17:33:53 UTC
*** Bug 1446499 has been marked as a duplicate of this bug. ***

Comment 17 errata-xmlrpc 2017-05-18 09:27:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1235


Note You need to log in before you can comment on or make changes to this bug.