While upgrading OCP from 3.6 to 3.7.23 using openshift-ansible 3.7.37, I encountered this error: TASK [openshift_logging_elasticsearch : set_fact] ****************************** Thursday 15 March 2018 19:04:57 +0000 (0:00:00.621) 0:03:55.458 ******** fatal: [204.236.205.127]: FAILED! => {"msg": "The conditional check 'item.status.containerStatuses[1].ready == true' failed. The error was: error while evaluating conditional (item.status.containerStatuses[1].ready == true): list object has no element 1\n\nThe error appears to have been in '/home/opsmedic/aos-cd/git/openshift-tools/openshift/installer/vendored/openshift-ansible-3.7.37/roles/openshift_logging_elasticsearch/tasks/get_es_version.yml': line 9, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- set_fact:\n ^ here\n"} PLAY RECAP ********************************************************************* 18.232.94.26 : ok=47 changed=5 unreachable=0 failed=0 204.236.205.127 : ok=105 changed=8 unreachable=0 failed=1 34.207.102.240 : ok=47 changed=5 unreachable=0 failed=0 34.227.95.221 : ok=47 changed=5 unreachable=0 failed=0 34.228.247.65 : ok=47 changed=5 unreachable=0 failed=0 35.173.181.73 : ok=47 changed=5 unreachable=0 failed=0 35.174.0.226 : ok=48 changed=6 unreachable=0 failed=0 52.71.96.22 : ok=47 changed=5 unreachable=0 failed=0 52.91.74.69 : ok=48 changed=6 unreachable=0 failed=0 54.152.199.184 : ok=47 changed=5 unreachable=0 failed=0 localhost : ok=11 changed=0 unreachable=0 failed=0 INSTALLER STATUS *************************************************************** Initialization : Complete Logging Install : In Progress Deployment Config ----------------- apiVersion: v1 kind: DeploymentConfig metadata: creationTimestamp: 2018-03-14T18:02:25Z generation: 2 labels: component: es deployment: logging-es-data-master-1stfd5d8 logging-infra: elasticsearch provider: openshift name: logging-es-data-master-1stfd5d8 namespace: logging resourceVersion: "57277" selfLink: /oapi/v1/namespaces/logging/deploymentconfigs/logging-es-data-master-1stfd5d8 uid: d9f830ac-27b1-11e8-9294-0ede7012d9c6 spec: replicas: 1 selector: component: es deployment: logging-es-data-master-1stfd5d8 logging-infra: elasticsearch provider: openshift strategy: activeDeadlineSeconds: 21600 recreateParams: timeoutSeconds: 600 resources: {} type: Recreate template: metadata: creationTimestamp: null labels: component: es deployment: logging-es-data-master-1stfd5d8 logging-infra: elasticsearch provider: openshift name: logging-es-data-master-1stfd5d8 spec: containers: - env: - name: DC_NAME value: logging-es-data-master-1stfd5d8 - name: NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace - name: KUBERNETES_TRUST_CERT value: "true" - name: SERVICE_DNS value: logging-es-cluster - name: CLUSTER_NAME value: logging-es - name: INSTANCE_RAM value: 12Gi - name: HEAP_DUMP_LOCATION value: /elasticsearch/persistent/heapdump.hprof - name: NODE_QUORUM value: "2" - name: RECOVER_EXPECTED_NODES value: "3" - name: RECOVER_AFTER_TIME value: 5m - name: READINESS_PROBE_TIMEOUT value: "30" - name: POD_LABEL value: component=es - name: IS_MASTER value: "true" - name: HAS_DATA value: "true" image: registry.reg-aws.openshift.com:443/openshift3/logging-elasticsearch:v3.6 imagePullPolicy: IfNotPresent name: elasticsearch ports: - containerPort: 9200 name: restapi protocol: TCP - containerPort: 9300 name: cluster protocol: TCP readinessProbe: exec: command: - /usr/share/java/elasticsearch/probe/readiness.sh failureThreshold: 3 initialDelaySeconds: 10 periodSeconds: 5 successThreshold: 1 timeoutSeconds: 30 resources: limits: memory: 12Gi requests: cpu: 375m memory: 12Gi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/elasticsearch/secret name: elasticsearch readOnly: true - mountPath: /usr/share/java/elasticsearch/config name: elasticsearch-config readOnly: true - mountPath: /elasticsearch/persistent name: elasticsearch-storage dnsPolicy: ClusterFirst nodeSelector: type: infra restartPolicy: Always schedulerName: default-scheduler securityContext: supplementalGroups: - 65534 serviceAccount: aggregated-logging-elasticsearch serviceAccountName: aggregated-logging-elasticsearch terminationGracePeriodSeconds: 30 volumes: - name: elasticsearch secret: defaultMode: 420 secretName: logging-elasticsearch - configMap: defaultMode: 420 name: logging-elasticsearch name: elasticsearch-config - name: elasticsearch-storage persistentVolumeClaim: claimName: logging-es-0 test: false triggers: - type: ConfigChange status: availableReplicas: 1 conditions: - lastTransitionTime: 2018-03-14T18:03:54Z lastUpdateTime: 2018-03-14T18:03:54Z message: replication controller "logging-es-data-master-1stfd5d8-1" successfully rolled out reason: NewReplicationControllerAvailable status: "True" type: Progressing - lastTransitionTime: 2018-03-14T22:36:42Z lastUpdateTime: 2018-03-14T22:36:42Z message: Deployment config has minimum availability. status: "True" type: Available details: causes: - type: ConfigChange message: config change latestVersion: 1 observedGeneration: 2 readyReplicas: 1 replicas: 1 unavailableReplicas: 0 updatedReplicas: 1
https://github.com/openshift/openshift-ansible/pull/7546
Changing to "installer" component since this was an openshift-ansible issue.
I retested a 3.6->3.7 upgrade with openshift-ansible 3.7.42 on RHEL7, but it failed for a different reason: TASK [Upgrade master packages] ************************************************* Monday 23 April 2018 16:32:29 +0000 (0:00:05.043) 0:03:23.697 ********** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: TemplateRuntimeError: no test named 'equalto' fatal: [mbarnestest-master-84785]: FAILED! => {"msg": "Unexpected failure during module execution.", "stdout": ""} The "equalto" filter was added in Jinja 2.8 [1], but the latest available package on RHEL7 is python-jinja2-2.7.2-2.el7. Because of that failure, I can't confirm this particular bug is fixed yet. [1] http://jinja.pocoo.org/docs/2.10/changelog/#version-2-8
Looks like the equalto issue might be fixed in 3.7.43. Will retry.
Okay, finally got a successful upgrade with openshift-ansible 3.7.44 after adding a 3rd "infra" node for the 3rd ES pod to run on. Apparently before I was getting away with running 3 ES pods on only 2 nodes, but I guess the extra container per pod pushed the memory requirement beyond what 2 nodes could handle? In any case, sorry for the tangent. Looks like this is fixed.
The logging can be updated from v3.6.173.0.118 to v3.7.46 via openshift-ansile:v3.7.46. After upgrade, the index can be retrieved in kibana. the Key varaibles: openshift_logging_es_pvc_dynamic=true openshift_logging_es_number_of_shards=1 openshift_logging_es_number_of_replicas=1 openshift_logging_es_memory_limit=2Gi openshift_logging_es_cluster_size=3 After upgrade, 1) the cluster is healthy. oc exec -c elasticsearch logging-es-data-master-ew54449w-2-hrnn7 -- curl -s -XGET --cacert /etc/elasticsearch/secret/admin-ca --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cluster/health?pretty { "cluster_name" : "logging-es", "status" : "green", "timed_out" : false, "number_of_nodes" : 3, "number_of_data_nodes" : 3, "active_primary_shards" : 10, "active_shards" : 23, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 } 2) The new logs can be gathered and view in kibana project.anlitest.a840e525-5431-11e8-9ea4-fa163e36cc89.2018.05.10 3) The old index can be view in kibana project.install-test.20a798cb-53fe-11e8-8ad7-fa163e36cc89.2018.05.10
I am trying to upgrade to 3.7.46.... But I still have the error. Is the fix really in 3.7.46?
@Marc, What version of openshift-ansible are you using?
Hello, We are using 3.7.46.
Thanks for that Marc, I think I see what the issue is. Just to confirm though, do you happen to have an Logging Ops deployment? And what line are you seeing the failure occur on, is it line 45 of get_es_version.yml?
Hello, Yes we deployed an Logging OPS. Yes line 45! fatal: [XXX]: FAILED! => {"msg": "The conditional check 'item.status.containerStatuses[1].ready == true' failed. The error was: error while evaluating conditional (item.status.containerStatuses[1].ready == true): list object has no element 1\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/get_es_version.yml': line 45, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- set_fact:\n ^ here\n"}
The upgrade pass with openshift3/ose-ansible/images/v3.7.55-1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2009