Description of problem: Seen when running the same type of test as described in: https://bugzilla.redhat.com/show_bug.cgi?id=1451110. However this message is popping a lot even when pods are not having problems. While creating 125 pods on a node, this message occurred 901 times: May 16 13:29:00 svt-n-2-67 atomic-openshift-node: W0516 13:29:00.278604 76000 docker_sandbox.go:263] NetworkPlugin cni failed on the status hook for pod "deploymentconfig2v6-1-796rm_svt-2-5": Unexpected command output nsenter: cannot open : No such file or directory Log spam? or real problem? Version-Release number of selected component (if applicable): 3.6.74 How reproducible: Always when creating a large number of pods Steps to Reproduce: 1. Run cluster-loader (https://github.com/openshift/svt/tree/master/openshift_scalability) with the configuration below on a cluster with 2 schedulable nodes with max-pods set at least to 125. Actual results: Many occurrences of: May 16 13:29:00 svt-n-2-67 atomic-openshift-node: W0516 13:29:00.278604 76000 docker_sandbox.go:263] NetworkPlugin cni failed on the status hook for pod "deploymentconfig2v6-1-796rm_svt-2-5": Unexpected command output nsenter: cannot open : No such file or directory Expected results: Additional info: projects: - num: 10 basename: svt-2- tuning: default templates: - num: 1 file: ./content/build-config-template.json - num: 1 file: ./content/build-template.json - num: 1 file: ./content/image-stream-template.json - num: 5 file: ./content/deployment-config-1rep-pause-template.json parameters: - ENV_VALUE: "asodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij12" - num: 10 file: ./content/deployment-config-2rep-pause-template.json parameters: - ENV_VALUE: "asodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij0emc2oed2ed2ed2e2easodfn209e8j0eij12" - num: 2 file: ./content/ssh-secret-template.json - num: 1 file: ./content/route-template.json # rcs and services are implemented in deployments. tuningsets: - name: default templates: stepping: stepsize: 3 pause: 3 s rate_limit: delay: 250 ms quotas: - name: default
Is there any chance we can reproduce this with nodes running --loglevel=5? The default '2' loglevel just doesn't give much network related info. I can see some veths being created, but nothing interesting after that, due to the logging.
Logging error only, we'll try to get this next sprint rather than perturbing the code now.
Duping to bug 1434950 as they have the same upstream fix and are basically the same problem. *** This bug has been marked as a duplicate of bug 1434950 ***