Bug 1493182 - NodeNotReady scaling up to 11.4K pods in 950 projects - NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Summary: NodeNotReady scaling up to 11.4K pods in 950 projects - NetworkPluginNotReady...
Keywords:
Status: CLOSED DUPLICATE of bug 1451902
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.7.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Ben Bennett
QA Contact: Mike Fiedler
URL:
Whiteboard: aos-scalability-37
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-19 14:27 UTC by Mike Fiedler
Modified: 2017-10-24 17:40 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-24 17:40:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
syslog at loglevel 5 for node that went NotReady (9.04 MB, application/x-gzip)
2017-09-19 14:27 UTC, Mike Fiedler
no flags Details
Node log/system log (loglevel 2) (1.73 MB, application/x-gzip)
2017-09-27 20:36 UTC, Mike Fiedler
no flags Details

Description Mike Fiedler 2017-09-19 14:27:41 UTC
Created attachment 1327993 [details]
syslog at loglevel 5 for node that went NotReady

Description of problem:

Scaling a 50 x m4.xlarge compute node cluster to a density of 240 pause pods per node (achievable in 3.6) in 1000 namespaces.   After 950 namespaces and 11,400 pods are created, nodes start going into NotReady and pods get stuck in Pending.   The logs contain the messages below as the NotReady reason  - NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Full logs for a failed node at loglevel 5 are attached


Sep 19 12:21:28 ip-172-31-50-91.us-west-2.compute.internal atomic-openshift-node[8577]: I0919 12:21:28.593007    8577 kubelet_node_status.go:433] Recording NodeNotReady event message for node ip-172-31-50-91.us-west-2.compute.internal
Sep 19 12:21:28 ip-172-31-50-91.us-west-2.compute.internal atomic-openshift-node[8577]: I0919 12:21:28.593036    8577 kubelet_node_status.go:757] Node became not ready: {Type:Ready Status:False LastHeartbeatTime:2017-09-19 12:21:28.592998669 +0000 UTC LastTransitionTime:2017-09-19 12:21:28.592998669 +0000 UTC Reason:KubeletNotReady Message:runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized}
Sep 19 12:21:28 ip-172-31-50-91.us-west-2.compute.internal atomic-openshift-node[8577]: I0919 12:21:28.598079    8577 sdn_controller.go:205] AddHostSubnetRules for ip-172-31-61-231.us-west-2.compute.internal (host: "ip-172-31-61-231.us-west-2.compute.internal", ip: "172.31.61.231", subnet: "172.20.84.0/24")
Sep 19 12:21:28 ip-172-31-50-91.us-west-2.compute.internal atomic-openshift-node[8577]: I0919 12:21:28.613218    8577 sdn_controller.go:205] AddHostSubnetRules for ip-172-31-61-32.us-west-2.compute.internal (host: "ip-172-31-61-32.us-west-2.compute.internal", ip: "172.31.61.32", subnet: "172.20.65.0/24")
Sep 19 12:21:28 ip-172-31-50-91.us-west-2.compute.internal atomic-openshift-node[8577]: W0919 12:21:28.615392    8577 proxier.go:488] clusterCIDR not specified, unable to distinguish between internal and external traffic


Version-Release number of selected component (if applicable):

openshift v3.7.0-0.126.4          
kubernetes v1.7.0+80709908fd 


How reproducible: Always when scaling to this level


Steps to Reproduce:
1.  On AWS, install cluster with 1 master/etcd, 3 infra, 50 compute (m4.xlarge)
2.  Run cluster-loader (https://github.com/openshift/svt/tree/master/openshift_scalability) with the configuration file below to create 1000 projects with 12000 pods

Actual results:

After 950 projects are created and 11,398 pods are running, nodes start going into NotReady and additional pods fail to start.

Expected results:

All pods start successfully 

Additional info:

python cluster-loader.py -f config.yaml -p 5

cluster-loader config:

projects:
  - num: 1000
    basename: svt-1-
    ifexists: delete
    tuning: default
    pods:
      - total: 12
      - num: 100
        image: gcr.io/google_containers/pause-amd64:3.0
        basename: pausepods
        file: content/pod-pause.json
        storage:
          - type: none
          
quotas:
  - name: default
    file: default

tuningsets:
  - name: default
    pods:
      stepping:
        stepsize: 50
        pause: 60 s
      rate_limit:
        delay: 0 ms

Comment 1 Dan Williams 2017-09-21 18:34:50 UTC
Do you have full node logs from the affected nodes?

openshift-sdn only writes out the config file to enable NodeReady once it's done initial setup.  So it appears that it's taking a while to get everything set up, and we should figure out how long things really are taking.

Also, what's the CPU load on the affected nodes during setup?

Comment 2 Mike Fiedler 2017-09-21 19:23:42 UTC
The attached file is the full node log.  journalctl > <output>

I'll grab pbench data the next time I set this up.   Possibly tomorrow.

Comment 3 Dan Williams 2017-09-23 19:27:49 UTC
There's a lot of weird log messages there.  It looks like atomic-openshift-node gets restarted quite a few times by something, I'm not sure whether that's ansible or systemd or what.

At the 12:21:28 mark, the node has just restarted, and the SDN isn't up and running yet.  That takes a bit of time.  It's still initializing at 12:22:56.

Then for some reason, openshift-node gets killed by ansible at 12:23:01 and restarted.  Then it looks like everything is mostly ready at  12:23:11?

So I'm not sure this is a network problem.  It looks like something is either slowing the node down (in the first service run in the logs up until 12:20-ish) and then something is restarting openshift-node.  Any idea what that is?

Comment 4 Mike Fiedler 2017-09-27 20:35:11 UTC
Reproduced this today and captured system data right after a node when NotReady:   http://perf-infra.ec2.breakage.org/pbench/results/ip-172-31-17-233/20170927_from890_a/tools-default/ip-172-31-44-27.us-west-2.compute.internal/


sar shows the system (4vCPU, 16GB) memory getting busy in peaks - it has 244 pods running on it.   2.5 cores used in spikes and 5GB memory in use.    I grabbed the logs - there are earlier instances but if you search up from the bottom on NotReady you will find the instance captured here.   Note that the node service restarted immediately after.  Maybe this should belong to Pod component?

Will try to repro tomorrow with loglevel 5 - this one was only 2.

Comment 5 Mike Fiedler 2017-09-27 20:36:00 UTC
Created attachment 1331567 [details]
Node log/system log (loglevel 2)

Comment 10 Ben Bennett 2017-10-18 19:37:06 UTC
This may be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1451902

Comment 11 Ben Bennett 2017-10-24 17:40:04 UTC

*** This bug has been marked as a duplicate of bug 1451902 ***


Note You need to log in before you can comment on or make changes to this bug.