Bug 1609019 - Prometheus deployment failed due to "No schedulable nodes found matching node selector" but it is not true
Summary: Prometheus deployment failed due to "No schedulable nodes found matching node...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.10.z
Assignee: Vadim Rutkovsky
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-26 18:16 UTC by Johnny Liu
Modified: 2018-11-11 16:40 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Sync pod not available when prometheus install checks for available nodes. Consequence: If a custom label is used for the prometheus install to select appropriate node, the sync pods must have already applied the label to the nodes. Otherwise, the prometheus installer will not find any nodes with matching label. Fix: Add check in install process to wait for sync pods to become available before continuing. Result: This ensures that the node labelling process will be complete, and the nodes will have the correct labels for the prometheus pod to be scheduled.
Clone Of:
Environment:
Last Closed: 2018-11-11 16:39:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
installation log with inventory file embedded (1.84 MB, text/plain)
2018-07-26 18:19 UTC, Johnny Liu
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1617990 0 unspecified CLOSED Task [Retrieve list of schedulable nodes matching selector] returns an empty list caused prometheus installation along w... 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2018:2709 0 None None None 2018-11-11 16:40:08 UTC

Internal Links: 1617990

Description Johnny Liu 2018-07-26 18:16:56 UTC
Description of problem:
See the following details.

Version-Release number of the following components:
openshift-ansible-3.10.21-1.git.0.6446011.el7.noarch

How reproducible:
50%

Steps to Reproduce:
1. Enable Prometheus deployment together when installing cluster
2.
3.

Actual results:
TASK [openshift_control_plane : Retrieve list of schedulable nodes matching selector] ***
Thursday 26 July 2018  13:37:36 -0400 (0:00:00.101)       0:28:14.786 *********
ok: [host-8-248-147.host.centralci.eng.rdu2.redhat.com] => {"changed": false, "failed": false, "results": {"cmd": "/usr/bin/oc get node --selector=node-role.kubernetes.io/infra=true --field-selector=spec.unschedulable!=true -o json -n default", "results": [{"apiVersion": "v1", "items": [], "kind": "List", "metadata": {"resourceVersion": "", "selfLink": ""}}], "returncode": 0}, "state": "list"}

TASK [openshift_control_plane : Ensure that Prometheus has nodes to run on] ****
Thursday 26 July 2018  13:37:37 -0400 (0:00:00.944)       0:28:15.730 *********
fatal: [host-8-248-147.host.centralci.eng.rdu2.redhat.com]: FAILED! => {
    "assertion": false,
    "changed": false,
    "evaluated_to": false,
    "failed": true,
    "msg": "No schedulable nodes found matching node selector for Prometheus - 'node-role.kubernetes.io/infra=true'"
}


Expected results:
installation is completed successfully.

Additional info:
After the failure, log into the cluster, run "/usr/bin/oc get node --selector=node-role.kubernetes.io/infra=true --field-selector=spec.unschedulable!=true -o json -n default", it return infra node successfully.

Seem like there is some delay for infra node get ready, that means when "Retrieve list of schedulable nodes matching selector", the infra node does not get ready yet, after several minutes, it get ready.

Comment 1 Johnny Liu 2018-07-26 18:19:41 UTC
Created attachment 1470865 [details]
installation log with inventory file embedded

Comment 2 Johnny Liu 2018-07-26 18:40:54 UTC
In my another cluster, this issue is reproduced. 

# oc get node
NAME                                STATUS    ROLES     AGE       VERSION
qe-jialiu310stg-master-etcd-nfs-1   Ready     master    12m       v1.10.0+b81c8f8
qe-jialiu310stg-node-1              Ready     compute   4m        v1.10.0+b81c8f8
qe-jialiu310stg-node-infra-1        Ready     <none>    4m        v1.10.0+b81c8f8

the infra node still not be labeled with "infra" after 4m.

# oc describe po sync-s9rbm -n openshift-node
Name:           sync-s9rbm
Namespace:      openshift-node
Node:           qe-jialiu310stg-node-infra-1/172.16.120.43
Start Time:     Thu, 26 Jul 2018 14:27:21 -0400
Labels:         app=sync
                component=network
                controller-revision-hash=3178364908
                openshift.io/component=sync
                pod-template-generation=2
                type=infra
Annotations:    scheduler.alpha.kubernetes.io/critical-pod=
Status:         Pending
IP:             172.16.120.43
Controlled By:  DaemonSet/sync
Containers:
  sync:
    Container ID:  
    Image:         registry.access.stage.redhat.com/openshift3/ose-node:v3.10
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
<--snip-->
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/unreachable:NoExecute
Events:
  Type    Reason   Age   From                                   Message
  ----    ------   ----  ----                                   -------
  Normal  Pulling  2m    kubelet, qe-jialiu310stg-node-infra-1  pulling image "registry.access.stage.redhat.com/openshift3/ose-node:v3.10"


Maybe this issue related to performance, registry bandwith, or the size of ose-node is too big.

After 11m, the node is labeled correctly.
# oc get node
NAME                                STATUS    ROLES     AGE       VERSION
qe-jialiu310stg-master-etcd-nfs-1   Ready     master    19m       v1.10.0+b81c8f8
qe-jialiu310stg-node-1              Ready     compute   11m       v1.10.0+b81c8f8
qe-jialiu310stg-node-infra-1        Ready     infra     11m       v1.10.0+b81c8f8

# crictl images
registry.access.stage.redhat.com/openshift3/ose-node                     v3.10               e3822f4f47155       1.28GB

Comment 3 Scott Dodson 2018-07-26 19:24:52 UTC
The sync pod is responsible for labeling the node so this makes sense that the node would be unlabeled until it become available.

Comment 4 Paul Gier 2018-08-17 20:10:49 UTC
PR to ignore the node check if the sync pods are not ready yet.
https://github.com/openshift/openshift-ansible/pull/9661

Comment 5 Paul Gier 2018-08-24 15:09:37 UTC
Waiting until 1617990 is verified for master (3.11) and then will backport the fix to 3.10.

Comment 6 Paul Gier 2018-08-29 14:54:20 UTC
PR for 3.10: https://github.com/openshift/openshift-ansible/pull/9819

Comment 7 Paul Gier 2018-08-30 13:43:11 UTC
Fix has been merged to the 3.10 branch.

Comment 8 Johnny Liu 2018-08-31 03:25:44 UTC
The latest installer rpm is openshift-ansible-3.10.41-1.git.0.fd15dd7.el7.noarch, the PR is not landed into the rpm yet, waiting for newer build.

Comment 9 Johnny Liu 2018-09-06 07:11:48 UTC
Re-test this bug with openshift-ansible-3.10.43-1.git.0.4794155.el7.noarch, still reproduced, though the PR already landed into the PRM and take effect.

TASK [openshift_node_group : Wait for the sync daemonset to become ready and available] ***
Thursday 06 September 2018  14:44:12 +0800 (0:00:00.176)       0:21:27.759 **** 
FAILED - RETRYING: Wait for the sync daemonset to become ready and available (60 retries left).
FAILED - RETRYING: Wait for the sync daemonset to become ready and available (59 retries left).
FAILED - RETRYING: Wait for the sync daemonset to become ready and available (58 retries left).
ok: [host-8-249-125.host.centralci.eng.rdu2.redhat.com] => {"attempts": 4, "changed": false, "failed": false, "failed_when_result": false, "results": {"cmd": "/usr/bin/oc get daemonset sync -o json -n openshift-node", "results": [{"apiVersion": "extensions/v1beta1", "kind": "DaemonSet", "metadata": {"annotations": {"image.openshift.io/triggers": "[\n  {\"from\":{\"kind\":\"ImageStreamTag\",\"name\":\"node:v3.10\"},\"fieldPath\":\"spec.template.spec.containers[?(@.name==\\\"sync\\\")].image\"}\n]\n", "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"apps/v1\",\"kind\":\"DaemonSet\",\"metadata\":{\"annotations\":{\"image.openshift.io/triggers\":\"[\\n  {\\\"from\\\":{\\\"kind\\\":\\\"ImageStreamTag\\\",\\\"name\\\":\\\"node:v3.10\\\"},\\\"fieldPath\\\":\\\"spec.template.spec.containers[?(@.name==\\\\\\\"sync\\\\\\\")].image\\\"}\\n]\\n\",\"kubernetes.io/description\":\"This daemon set provides dynamic configuration of nodes and relabels nodes as appropriate.\\n\"},\"name\":\"sync\",\"namespace\":\"openshift-node\"},\"spec\":{\"selector\":{\"matchLabels\":{\"app\":\"sync\"}},\"template\":{\"metadata\":{\"annotations\":{\"scheduler.alpha.kubernetes.io/critical-pod\":\"\"},\"labels\":{\"app\":\"sync\",\"component\":\"network\",\"openshift.io/component\":\"sync\",\"type\":\"infra\"}},\"spec\":{\"containers\":[{\"command\":[\"/bin/bash\",\"-c\",\"#!/bin/bash\\nset -euo pipefail\\n\\n# set by the node image\\nunset KUBECONFIG\\n\\ntrap 'kill $(jobs -p); exit 0' TERM\\n\\n# track the current state of the config\\nif [[ -f /etc/origin/node/node-config.yaml ]]; then\\n  md5sum /etc/origin/node/node-config.yaml \\u003e /tmp/.old\\nelse\\n  touch /tmp/.old\\nfi\\n\\n# loop until BOOTSTRAP_CONFIG_NAME is set\\nwhile true; do\\n  file=/etc/sysconfig/origin-node\\n  if [[ -f /etc/sysconfig/atomic-openshift-node ]]; then\\n    file=/etc/sysconfig/atomic-openshift-node\\n  elif [[ -f /etc/sysconfig/origin-node ]]; then\\n    file=/etc/sysconfig/origin-node\\n  else\\n    echo \\\"info: Waiting for the node sysconfig file to be created\\\" 2\\u003e\\u00261\\n    sleep 15 \\u0026 wait\\n    continue\\n  fi\\n  name=\\\"$(sed -nE 's|^BOOTSTRAP_CONFIG_NAME=([^#].+)|\\\\1|p' \\\"${file}\\\" | head -1)\\\"\\n  if [[ -z \\\"${name}\\\" ]]; then\\n    echo \\\"info: Waiting for BOOTSTRAP_CONFIG_NAME to be set\\\" 2\\u003e\\u00261\\n    sleep 15 \\u0026 wait\\n    continue\\n  fi\\n  # in the background check to see if the value changes and exit if so\\n  pid=$BASHPID\\n  (\\n    while true; do\\n      if ! updated=\\\"$(sed -nE 's|^BOOTSTRAP_CONFIG_NAME=([^#].+)|\\\\1|p' \\\"${file}\\\" | head -1)\\\"; then\\n        echo \\\"error: Unable to check for bootstrap config, exiting\\\" 2\\u003e\\u00261\\n        kill $pid\\n        exit 1\\n      fi\\n      if [[ \\\"${updated}\\\" != \\\"${name}\\\" ]]; then\\n        echo \\\"info: Bootstrap configuration profile name changed, exiting\\\" 2\\u003e\\u00261\\n        kill $pid\\n        exit 0\\n      fi\\n      sleep 15\\n    done\\n  ) \\u0026\\n  break\\ndone\\n\\n# periodically refresh both node-config.yaml and relabel the node\\nwhile true; do\\n  if ! oc extract \\\"configmaps/${name}\\\" -n openshift-node --to=/etc/origin/node --confirm --request-timeout=10s --config /etc/origin/node/node.kubeconfig \\\"--token=$( cat /var/run/secrets/kubernetes.io/serviceaccount/token )\\\"  \\u003e /dev/null; then\\n    echo \\\"error: Unable to retrieve latest config for node\\\" 2\\u003e\\u00261\\n    sleep 15 \\u0026\\n    wait $!\\n    continue\\n  fi\\n  # detect whether the node-config.yaml has changed, and if so trigger a restart of the kubelet.\\n  md5sum /etc/origin/node/node-config.yaml \\u003e /tmp/.new\\n  if [[ \\\"$( cat /tmp/.old )\\\" != \\\"$( cat /tmp/.new )\\\" ]]; then\\n    echo \\\"info: Configuration changed, restarting kubelet\\\" 2\\u003e\\u00261\\n    # TODO: kubelet doesn't relabel nodes, best effort for now\\n    # https://github.com/kubernetes/kubernetes/issues/59314\\n    if args=\\\"$(openshift start node --write-flags --config /etc/origin/node/node-config.yaml)\\\"; then\\n      labels=$(tr ' ' '\\\\n' \\u003c\\u003c\\u003c$args | sed -ne '/^--node-labels=/ { s/^--node-labels=//; p; }' | tr ',\\\\n' ' ')\\n      if [[ -n \\\"${labels}\\\" ]]; then\\n        echo \\\"info: Applying node labels $labels\\\" 2\\u003e\\u00261\\n        if ! oc label --config=/etc/origin/node/node.kubeconfig \\\"node/${NODE_NAME}\\\" ${labels} --overwrite; then\\n          echo \\\"error: Unable to apply labels, will retry in 10\\\" 2\\u003e\\u00261\\n          sleep 10 \\u0026\\n          wait $!\\n          continue\\n        fi\\n      fi\\n    else\\n      echo \\\"error: The downloaded node configuration is invalid, retrying later\\\" 2\\u003e\\u00261\\n      sleep 10 \\u0026\\n      wait $!\\n      continue\\n    fi\\n    if ! pkill -U 0 -f '(^|/)hyperkube kubelet '; then\\n      echo \\\"error: Unable to restart Kubelet\\\" 2\\u003e\\u00261\\n      sleep 10 \\u0026\\n      wait $!\\n      continue\\n    fi\\n  fi\\n  cp -f /tmp/.new /tmp/.old\\n  sleep 180 \\u0026\\n  wait $!\\ndone\\n\"],\"env\":[{\"name\":\"NODE_NAME\",\"valueFrom\":{\"fieldRef\":{\"fieldPath\":\"spec.nodeName\"}}}],\"image\":\" \",\"name\":\"sync\",\"securityContext\":{\"privileged\":true,\"runAsUser\":0},\"volumeMounts\":[{\"mountPath\":\"/etc/origin/node/\",\"name\":\"host-config\"},{\"mountPath\":\"/etc/sysconfig\",\"name\":\"host-sysconfig-node\",\"readOnly\":true}]}],\"hostNetwork\":true,\"hostPID\":true,\"serviceAccountName\":\"sync\",\"terminationGracePeriodSeconds\":1,\"volumes\":[{\"hostPath\":{\"path\":\"/etc/origin/node\"},\"name\":\"host-config\"},{\"hostPath\":{\"path\":\"/etc/sysconfig\"},\"name\":\"host-sysconfig-node\"}]}},\"updateStrategy\":{\"rollingUpdate\":{\"maxUnavailable\":\"50%\"},\"type\":\"RollingUpdate\"}}}\n", "kubernetes.io/description": "This daemon set provides dynamic configuration of nodes and relabels nodes as appropriate.\n"}, "creationTimestamp": "2018-09-06T06:44:26Z", "generation": 2, "labels": {"app": "sync", "component": "network", "openshift.io/component": "sync", "type": "infra"}, "name": "sync", "namespace": "openshift-node", "resourceVersion": "1148", "selfLink": "/apis/extensions/v1beta1/namespaces/openshift-node/daemonsets/sync", "uid": "4c0f62ed-b1a0-11e8-8411-fa163ea893ad"}, "spec": {"revisionHistoryLimit": 10, "selector": {"matchLabels": {"app": "sync"}}, "template": {"metadata": {"annotations": {"scheduler.alpha.kubernetes.io/critical-pod": ""}, "creationTimestamp": null, "labels": {"app": "sync", "component": "network", "openshift.io/component": "sync", "type": "infra"}}, "spec": {"containers": [{"command": ["/bin/bash", "-c", "#!/bin/bash\nset -euo pipefail\n\n# set by the node image\nunset KUBECONFIG\n\ntrap 'kill $(jobs -p); exit 0' TERM\n\n# track the current state of the config\nif [[ -f /etc/origin/node/node-config.yaml ]]; then\n  md5sum /etc/origin/node/node-config.yaml > /tmp/.old\nelse\n  touch /tmp/.old\nfi\n\n# loop until BOOTSTRAP_CONFIG_NAME is set\nwhile true; do\n  file=/etc/sysconfig/origin-node\n  if [[ -f /etc/sysconfig/atomic-openshift-node ]]; then\n    file=/etc/sysconfig/atomic-openshift-node\n  elif [[ -f /etc/sysconfig/origin-node ]]; then\n    file=/etc/sysconfig/origin-node\n  else\n    echo \"info: Waiting for the node sysconfig file to be created\" 2>&1\n    sleep 15 & wait\n    continue\n  fi\n  name=\"$(sed -nE 's|^BOOTSTRAP_CONFIG_NAME=([^#].+)|\\1|p' \"${file}\" | head -1)\"\n  if [[ -z \"${name}\" ]]; then\n    echo \"info: Waiting for BOOTSTRAP_CONFIG_NAME to be set\" 2>&1\n    sleep 15 & wait\n    continue\n  fi\n  # in the background check to see if the value changes and exit if so\n  pid=$BASHPID\n  (\n    while true; do\n      if ! updated=\"$(sed -nE 's|^BOOTSTRAP_CONFIG_NAME=([^#].+)|\\1|p' \"${file}\" | head -1)\"; then\n        echo \"error: Unable to check for bootstrap config, exiting\" 2>&1\n        kill $pid\n        exit 1\n      fi\n      if [[ \"${updated}\" != \"${name}\" ]]; then\n        echo \"info: Bootstrap configuration profile name changed, exiting\" 2>&1\n        kill $pid\n        exit 0\n      fi\n      sleep 15\n    done\n  ) &\n  break\ndone\n\n# periodically refresh both node-config.yaml and relabel the node\nwhile true; do\n  if ! oc extract \"configmaps/${name}\" -n openshift-node --to=/etc/origin/node --confirm --request-timeout=10s --config /etc/origin/node/node.kubeconfig \"--token=$( cat /var/run/secrets/kubernetes.io/serviceaccount/token )\"  > /dev/null; then\n    echo \"error: Unable to retrieve latest config for node\" 2>&1\n    sleep 15 &\n    wait $!\n    continue\n  fi\n  # detect whether the node-config.yaml has changed, and if so trigger a restart of the kubelet.\n  md5sum /etc/origin/node/node-config.yaml > /tmp/.new\n  if [[ \"$( cat /tmp/.old )\" != \"$( cat /tmp/.new )\" ]]; then\n    echo \"info: Configuration changed, restarting kubelet\" 2>&1\n    # TODO: kubelet doesn't relabel nodes, best effort for now\n    # https://github.com/kubernetes/kubernetes/issues/59314\n    if args=\"$(openshift start node --write-flags --config /etc/origin/node/node-config.yaml)\"; then\n      labels=$(tr ' ' '\\n' <<<$args | sed -ne '/^--node-labels=/ { s/^--node-labels=//; p; }' | tr ',\\n' ' ')\n      if [[ -n \"${labels}\" ]]; then\n        echo \"info: Applying node labels $labels\" 2>&1\n        if ! oc label --config=/etc/origin/node/node.kubeconfig \"node/${NODE_NAME}\" ${labels} --overwrite; then\n          echo \"error: Unable to apply labels, will retry in 10\" 2>&1\n          sleep 10 &\n          wait $!\n          continue\n        fi\n      fi\n    else\n      echo \"error: The downloaded node configuration is invalid, retrying later\" 2>&1\n      sleep 10 &\n      wait $!\n      continue\n    fi\n    if ! pkill -U 0 -f '(^|/)hyperkube kubelet '; then\n      echo \"error: Unable to restart Kubelet\" 2>&1\n      sleep 10 &\n      wait $!\n      continue\n    fi\n  fi\n  cp -f /tmp/.new /tmp/.old\n  sleep 180 &\n  wait $!\ndone\n"], "env": [{"name": "NODE_NAME", "valueFrom": {"fieldRef": {"apiVersion": "v1", "fieldPath": "spec.nodeName"}}}], "image": "registry.access.stage.redhat.com/openshift3/ose-node:v3.10", "imagePullPolicy": "IfNotPresent", "name": "sync", "resources": {}, "securityContext": {"privileged": true, "runAsUser": 0}, "terminationMessagePath": "/dev/termination-log", "terminationMessagePolicy": "File", "volumeMounts": [{"mountPath": "/etc/origin/node/", "name": "host-config"}, {"mountPath": "/etc/sysconfig", "name": "host-sysconfig-node", "readOnly": true}]}], "dnsPolicy": "ClusterFirst", "hostNetwork": true, "hostPID": true, "restartPolicy": "Always", "schedulerName": "default-scheduler", "securityContext": {}, "serviceAccount": "sync", "serviceAccountName": "sync", "terminationGracePeriodSeconds": 1, "volumes": [{"hostPath": {"path": "/etc/origin/node", "type": ""}, "name": "host-config"}, {"hostPath": {"path": "/etc/sysconfig", "type": ""}, "name": "host-sysconfig-node"}]}}, "templateGeneration": 2, "updateStrategy": {"rollingUpdate": {"maxUnavailable": "50%"}, "type": "RollingUpdate"}}, "status": {"currentNumberScheduled": 1, "desiredNumberScheduled": 1, "numberAvailable": 1, "numberMisscheduled": 0, "numberReady": 1, "observedGeneration": 2, "updatedNumberScheduled": 1}}], "returncode": 0}, "state": "list"}
<--snip-->

TASK [openshift_control_plane : Retrieve list of schedulable nodes matching selector] ***
Thursday 06 September 2018  14:48:58 +0800 (0:00:00.181)       0:26:13.734 **** 
ok: [host-8-249-125.host.centralci.eng.rdu2.redhat.com] => {"changed": false, "failed": false, "results": {"cmd": "/usr/bin/oc get node --selector=node-role.kubernetes.io/infra=true --field-selector=spec.unschedulable!=true -o json -n default", "results": [{"apiVersion": "v1", "items": [], "kind": "List", "metadata": {"resourceVersion": "", "selfLink": ""}}], "returncode": 0}, "state": "list"}

TASK [openshift_control_plane : Ensure that Prometheus has nodes to run on] ****
Thursday 06 September 2018  14:48:58 +0800 (0:00:00.481)       0:26:14.216 **** 
fatal: [host-8-249-125.host.centralci.eng.rdu2.redhat.com]: FAILED! => {
    "assertion": false, 
    "changed": false, 
    "evaluated_to": false, 
    "failed": true, 
    "msg": "No schedulable nodes found matching node selector for Prometheus - 'node-role.kubernetes.io/infra=true'"
}


After the failure, log into the cluster, have some check:
[root@qe-jialiu310-master-etcd-nfs-1 ~]# oc get node
NAME                             STATUS    ROLES     AGE       VERSION
qe-jialiu310-master-etcd-nfs-1   Ready     master    9m        v1.10.0+b81c8f8
qe-jialiu310-node-1              Ready     compute   5m        v1.10.0+b81c8f8
qe-jialiu310-node-infra-1        Ready     <none>    5m        v1.10.0+b81c8f8


Only infra node is not labeled correctly, after several mins, check again, labels are set well.

[root@qe-jialiu310-master-etcd-nfs-1 ~]# oc get node
NAME                             STATUS    ROLES     AGE       VERSION
qe-jialiu310-master-etcd-nfs-1   Ready     master    23m       v1.10.0+b81c8f8
qe-jialiu310-node-1              Ready     compute   19m       v1.10.0+b81c8f8
qe-jialiu310-node-infra-1        Ready     infra     19m       v1.10.0+b81c8f8

Comment 10 Paul Gier 2018-09-06 14:31:02 UTC
Ok, from the log it looks like the sync check was ok, but it is only checking if the sync pod is running on 1 node (master) at that point in the install because the other nodes haven't been added yet.  So I think what we need to do is check that all the sync pods are available either immediately before the prometheus install, or at least after the nodes have been added to the cluster.

Comment 11 Paul Gier 2018-09-07 22:02:32 UTC
New PR to move the location of the sync pod check: https://github.com/openshift/openshift-ansible/pull/9965

Johnny, are you able to test if this patch resolves the issue?

Comment 12 Marthen Luther 2018-09-10 13:53:16 UTC
I also encounter exactly the same issue while installing 3.10

This is the log [1].

Is there any update on this issue?


[1] http://pastebin.test.redhat.com/643583

Comment 13 Vadim Rutkovsky 2018-09-10 14:22:51 UTC
Checking that sync pods have started on all nodes is not sufficient. The plan to fix this:

* Update sync script to annotate the node with md5 of the applied config
* Check that sync DS has started on all nodes
* Check that all nodes have annotation with a hash
  Once this annotation appears this would mean the node has new config applied and it has restarted
* Wait for api server to be back

Comment 14 Vadim Rutkovsky 2018-09-13 07:39:23 UTC
PR to 3.10 https://github.com/openshift/openshift-ansible/pull/10045

holding the PR to make sure that it works in 3.11 first (see bug #1628208)

Comment 15 Vadim Rutkovsky 2018-09-21 09:45:40 UTC
3.10 backport PR - https://github.com/openshift/openshift-ansible/pull/10100

Comment 17 Vadim Rutkovsky 2018-10-05 08:46:52 UTC
Fix is available in openshift-ansible-3.10.52-1

Comment 18 Johnny Liu 2018-10-12 11:28:09 UTC
Upon testing this bug with openshift-ansible-3.10.53-1.git.0.ba2c2ec.el7.noarch, also hit https://bugzilla.redhat.com/show_bug.cgi?id=1636914, almost 100% reproduced against stage registry, I almost confirm this is related to stage registry performance. 
# for i in `seq 1 30`; do oc get po -n openshift-node; sleep 60; done
NAME         READY     STATUS              RESTARTS   AGE
sync-kn94l   0/1       ContainerCreating   0          29m
sync-tvlfm   1/1       Running             0          29m
sync-tzj99   1/1       Running             0          33m

# oc describe po sync-kn94l -n openshift-node
Events:
  Type    Reason   Age   From                          Message
  ----    ------   ----  ----                          -------
  Normal  Pulling  13m   kubelet, qe-jialiu310-node-1  pulling image "registry.access.stage.redhat.com/openshift3/ose-node:v3.10"

# for i in `seq 1 30`; do oc get po -n openshift-node; sleep 60; done
sync-kn94l   1/1       Running   0          30m
sync-tvlfm   1/1       Running   0          30m
sync-tzj99   1/1       Running   0          34m

The size of ose-node image is about 1.3GB!!! It is too big. This is easy to cause problem.
# crictl images 
W1012 07:22:54.727619   29319 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock".
IMAGE                                                           TAG                 IMAGE ID            SIZE
registry.access.stage.redhat.com/openshift3/ose-control-plane   v3.10               c7ed74a5e981a       804MB
registry.access.stage.redhat.com/openshift3/ose-node            v3.10               2128ae84c408a       1.32GB


Personally I think there are 3 ways to resolve this issue:
1. improve stage registry performance
2. implement some logic in openshift-ansible installer to pre-pull ose-node image based on different runtime.
3. increase timeout just like https://github.com/openshift/openshift-ansible/pull/10363

#2 is the best option, if this case is some edge cases, #3 is also acceptable for me, so pls backport the 3.11 fix for BZ#1636914 to 3.10 too.

BTW, stage testing is very important for QE, if this issue is not resolved, we have no good workaround, this would delay advisory release.

Comment 19 Vadim Rutkovsky 2018-10-12 11:56:13 UTC
(In reply to Johnny Liu from comment #18)
> 2. implement some logic in openshift-ansible installer to pre-pull ose-node
> image based on different runtime.

Node images are being prepulled, ContainerCreating means the image is available and runtime is creating a new container

> 3. increase timeout just like
> https://github.com/openshift/openshift-ansible/pull/10363
> 
> #2 is the best option, if this case is some edge cases, #3 is also
> acceptable for me, so pls backport the 3.11 fix for BZ#1636914 to 3.10 too.

Created https://github.com/openshift/openshift-ansible/pull/10393 to cherrypick the fix to 3.10 branch

Comment 20 Johnny Liu 2018-10-15 03:38:23 UTC
(In reply to Vadim Rutkovsky from comment #19)
> (In reply to Johnny Liu from comment #18)
> > 2. implement some logic in openshift-ansible installer to pre-pull ose-node
> > image based on different runtime.
> 
> Node images are being prepulled, ContainerCreating means the image is
> available and runtime is creating a new container
"Pre-pull node image" task is prepulling a docker image, do not take runtime into consideration. In my testing, I am using crio runtime, that means when sync pod is creating, it have to pull the image, it will take a long time.

And I do not think "ContainerCreating" means the image is available, in comment 18, when you run "oc describe" the "ContainerCreating", the event indicates it is pulling the image.
# oc describe po sync-kn94l -n openshift-node
Events:
  Type    Reason   Age   From                          Message
  ----    ------   ----  ----                          -------
  Normal  Pulling  13m   kubelet, qe-jialiu310-node-1  pulling image "registry.access.stage.redhat.com/openshift3/ose-node:v3.10"

Comment 21 Vadim Rutkovsky 2018-10-15 09:36:56 UTC
(In reply to Johnny Liu from comment #20)
> (In reply to Vadim Rutkovsky from comment #19)
> > (In reply to Johnny Liu from comment #18)
> > > 2. implement some logic in openshift-ansible installer to pre-pull ose-node
> > > image based on different runtime.
> > 
> > Node images are being prepulled, ContainerCreating means the image is
> > available and runtime is creating a new container
> "Pre-pull node image" task is prepulling a docker image, do not take runtime
> into consideration. In my testing, I am using crio runtime, that means when
> sync pod is creating, it have to pull the image, it will take a long time.

Ah, correct. I filed bug 1639201 to track that.

Keeping this bug in POST, as more PRs would be required to fully fix this issue

Comment 22 Scott Dodson 2018-10-17 21:18:06 UTC
It was discussed during our team meeting that https://github.com/openshift/openshift-ansible/pull/10393 was sufficient to resolve this bug.

Comment 23 Johnny Liu 2018-10-18 08:35:37 UTC
Due to once crio runtime is enabled, would hit https://bugzilla.redhat.com/show_bug.cgi?id=1639623 which is causing master api get ready timeout against stage registry, in this bug's verification, will use docker runtime for testing.

Verified this bug with openshift-ansible-3.10.58-1.git.0.5753b33.el7.noarch, and PASS.

TASK [openshift_manage_node : Wait for sync DS to set annotations on all nodes] ***
Thursday 18 October 2018  12:25:09 +0800 (0:00:00.476)       0:26:06.549 ****** 

FAILED - RETRYING: Wait for sync DS to set annotations on all nodes (180 retries left).
<--snip-->
FAILED - RETRYING: Wait for sync DS to set annotations on all nodes (134 retries left).
ok: [host-8-249-20.host.centralci.eng.rdu2.redhat.com -> host-8-249-20.host.centralci.eng.rdu2.redhat.com]
<snip-->
<snip-->
<snip-->
TASK [openshift_control_plane : Ensure that Prometheus has nodes to run on] ****
Thursday 18 October 2018  12:34:52 +0800 (0:00:00.365)       0:35:49.541 ****** 

ok: [host-8-249-20.host.centralci.eng.rdu2.redhat.com] => {
    "changed": false, 
    "failed": false, 
    "msg": "All assertions passed"
}

Comment 24 Johnny Liu 2018-10-22 02:38:18 UTC
Per comment 23, move this bug to VERIFIED.

Comment 26 errata-xmlrpc 2018-11-11 16:39:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2709


Note You need to log in before you can comment on or make changes to this bug.