Bug 1584909

Summary:	oc cluster up does not work on docker-2:1.13.1-56.git6c336e4.fc28.x86_64
Product:	[Fedora] Fedora	Reporter:	Jason Montleon <jmontleo>
Component:	docker	Assignee:	Daniel Walsh <dwalsh>
Status:	CLOSED ERRATA	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	28	CC:	adimania, admiller, amurdaca, dustymabe, dwalsh, filbranden, fkluknav, ichavero, jcajka, jpazdziora, jwhiting, lsm5, marianne, nalin, rh-bugzilla, santiago, tom81094, tomek, ttomecek, twaugh, vbatts
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	docker-1.13.1-59.gitaf6b32b.fc28 docker-1.13.1-59.gitaf6b32b.fc27	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-06-13 15:18:25 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jason Montleon 2018-05-31 22:23:54 UTC

Description of problem:
oc cluster up fails using docker-2:1.13.1-56.git6c336e4.fc28.x86_64

Version-Release number of selected component (if applicable):
docker-2:1.13.1-56.git6c336e4.fc28.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Run oc cluster up on Fedora 28 with docker-2:1.13.1-56.git6c336e4.fc28.x86_64

Actual results:
The one container that comes up shows failed actions in the logs


Expected results:
oc cluster up works.

Additional info:
dnf downgrade docker, brings the system down to docker-2:1.13.1-51.git4032bd5.fc28.x86_64. After restarting the service it works properly.

oc version
oc v3.10.0-alpha.0+a861408-1354
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Using an /etc/systemd/system/docker.service.d/override.conf to change the cgroup driver gets the newest version working.

[Service]
ExecStart=
ExecStart=/usr/bin/dockerd-current \
          --add-runtime oci=/usr/libexec/docker/docker-runc-current \
          --default-runtime=oci \
          --authorization-plugin=rhel-push-plugin \
          --containerd /run/containerd.sock \
          --exec-opt native.cgroupdriver=cgroupfs \
          --userland-proxy-path=/usr/libexec/docker/docker-proxy-current \
          --init-path=/usr/libexec/docker/docker-init-current \
          --seccomp-profile=/etc/docker/seccomp.json \
          $OPTIONS \
          $DOCKER_STORAGE_OPTIONS \
          $DOCKER_NETWORK_OPTIONS \
          $ADD_REGISTRY \
          $BLOCK_REGISTRY \
          $INSECURE_REGISTRY \
          $REGISTRIES


Eventually you'll see the first container starts repeating:
E0531 22:02:00.112824   20802 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-scheduler-localhost": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:258: applying cgroup configuration for process caused \"No such device or address\""

Logs from the container.
docker logs -f 4c7e8cc33aef
Flag --address has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --allow-privileged has been deprecated, will be removed in a future version
Flag --anonymous-auth has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --authentication-token-webhook has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --authentication-token-webhook-cache-ttl has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --authorization-mode has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --authorization-webhook-cache-authorized-ttl has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --authorization-webhook-cache-unauthorized-ttl has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --cadvisor-port has been deprecated, The default will change to 0 (disabled) in 1.12, and the cadvisor port will be removed entirely in 1.13
Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --client-ca-file has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --cluster-domain has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --file-check-frequency has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --healthz-bind-address has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --healthz-port has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --host-ipc-sources has been deprecated, will be removed in a future version
Flag --host-ipc-sources has been deprecated, will be removed in a future version
Flag --host-network-sources has been deprecated, will be removed in a future version
Flag --host-network-sources has been deprecated, will be removed in a future version
Flag --host-pid-sources has been deprecated, will be removed in a future version
Flag --host-pid-sources has been deprecated, will be removed in a future version
Flag --http-check-frequency has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --iptables-masquerade-bit has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --max-pods has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --port has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --read-only-port has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cert-file has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-cipher-suites has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-min-version has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --tls-private-key-file has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --pod-manifest-path has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --cluster-dns has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
I0531 22:01:51.495324   20802 feature_gate.go:226] feature gates: &{{} map[]}
W0531 22:01:51.509511   20802 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
I0531 22:01:51.515382   20802 server.go:383] Version: v1.10.0+b81c8f8
I0531 22:01:51.515451   20802 feature_gate.go:226] feature gates: &{{} map[]}
I0531 22:01:51.515603   20802 plugins.go:89] No cloud provider specified.
I0531 22:01:52.166465   20802 server.go:621] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /
I0531 22:01:52.167596   20802 container_manager_linux.go:242] container manager verified user specified cgroup-root exists: /
I0531 22:01:52.167623   20802 container_manager_linux.go:247] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:systemd KubeletRootDir:/tmp/openshift.local.clusterup/openshift.local.volumes ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>} {Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>}]} ExperimentalQOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true}
I0531 22:01:52.167843   20802 container_manager_linux.go:266] Creating device plugin manager: true
I0531 22:01:52.168178   20802 state_mem.go:36] [cpumanager] initializing new in-memory state store
I0531 22:01:52.168956   20802 state_file.go:82] [cpumanager] state file: created new state file "/tmp/openshift.local.clusterup/openshift.local.volumes/cpu_manager_state"
I0531 22:01:52.169207   20802 kubelet.go:273] Adding pod path: /var/lib/origin/pod-manifests
I0531 22:01:52.169256   20802 kubelet.go:298] Watching apiserver
E0531 22:01:52.171967   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://localhost:8443/api/v1/pods?fieldSelector=spec.nodeName%3Dlocalhost&limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:01:52.173509   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: Failed to list *v1.Node: Get https://localhost:8443/api/v1/nodes?fieldSelector=metadata.name%3Dlocalhost&limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:01:52.173548   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: Failed to list *v1.Service: Get https://localhost:8443/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
W0531 22:01:52.175867   20802 kubelet_network.go:139] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
I0531 22:01:52.175913   20802 kubelet.go:565] Hairpin mode set to "hairpin-veth"
I0531 22:01:52.177877   20802 client.go:75] Connecting to docker on unix:///var/run/docker.sock
I0531 22:01:52.177917   20802 client.go:104] Start docker client with request timeout=2m0s
W0531 22:01:52.182236   20802 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
I0531 22:01:52.187186   20802 docker_service.go:244] Docker cri networking managed by kubernetes.io/no-op
I0531 22:01:52.492151   20802 docker_service.go:249] Docker Info: &{ID:4S3O:NG4F:OEN3:CNV7:IOVE:AYAD:DY45:RTMT:AN6J:VJQY:OBRW:5R4O Containers:172 ContainersRunning:1 ContainersPaused:0 ContainersStopped:171 Images:1173 Driver:overlay2 DriverStatus:[[Backing Filesystem xfs] [Supports d_type false] [Native Overlay Diff true]] SystemStatus:[] Plugins:{Volume:[local] Network:[bridge host macvlan null overlay] Authorization:[rhel-push-plugin] Log:[]} MemoryLimit:true SwapLimit:true KernelMemory:true CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:false NFd:31 OomKillDisable:true NGoroutines:31 SystemTime:2018-05-31T18:01:52.481010732-04:00 LoggingDriver:journald CgroupDriver:systemd NEventsListener:0 KernelVersion:4.16.12-300.fc28.x86_64 OperatingSystem:Fedora 28 (Twenty Eight) OSType:linux Architecture:x86_64 IndexServerAddress:https://registry.fedoraproject.org/v1/ RegistryConfig:0xc42152e850 NCPU:16 MemTotal:135117225984 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:jmontleo.usersys.redhat.com Labels:[] ExperimentalBuild:false ServerVersion:1.13.1 ClusterStore: ClusterAdvertise: Runtimes:map[oci:{Path:/usr/libexec/docker/docker-runc-current Args:[]} runc:{Path:docker-runc Args:[]}] DefaultRuntime:oci Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:0xc420985540} LiveRestoreEnabled:true Isolation: InitBinary:/usr/libexec/docker/docker-init-current ContainerdCommit:{ID:c301b045f9faddcf7693229601303639af6b0885 Expected:aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1} RuncCommit:{ID:1ab62f1ec5429ccf03e8dfcc2110bc665ec9e308-dirty Expected:9df8b306d01f59d3a8029be411de015b7304dd8f} InitCommit:{ID:N/A Expected:949e6facb77383876aeff8a6944dde66b3089574} SecurityOptions:[name=seccomp,profile=/etc/docker/seccomp.json name=selinux]}
I0531 22:01:52.492316   20802 docker_service.go:262] Setting cgroupDriver to systemd
I0531 22:01:52.794304   20802 remote_runtime.go:43] Connecting to runtime service unix:///var/run/dockershim.sock
I0531 22:01:52.797431   20802 kuberuntime_manager.go:186] Container runtime docker initialized, version: 1.13.1, apiVersion: 1.26.0
W0531 22:01:52.799268   20802 probe.go:215] Flexvolume plugin directory at /usr/libexec/kubernetes/kubelet-plugins/volume/exec/ does not exist. Recreating.
I0531 22:01:52.800143   20802 csi_plugin.go:61] kubernetes.io/csi: plugin initializing...
E0531 22:01:52.802293   20802 kubelet.go:1299] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data for container /
I0531 22:01:52.802295   20802 server.go:129] Starting to listen on 0.0.0.0:10250
I0531 22:01:52.802389   20802 server.go:952] Started kubelet
E0531 22:01:52.803362   20802 event.go:209] Unable to write event: 'Post https://localhost:8443/api/v1/namespaces/default/events: dial tcp 127.0.0.1:8443: getsockopt: connection refused' (may retry after sleeping)
I0531 22:01:52.803699   20802 server.go:303] Adding debug handlers to kubelet server.
I0531 22:01:52.804019   20802 fs_resource_analyzer.go:66] Starting FS ResourceAnalyzer
I0531 22:01:52.804075   20802 status_manager.go:140] Starting to sync pod status with apiserver
I0531 22:01:52.804097   20802 volume_manager.go:247] Starting Kubelet Volume Manager
I0531 22:01:52.804117   20802 kubelet.go:1799] Starting kubelet main sync loop.
I0531 22:01:52.804207   20802 desired_state_of_world_populator.go:129] Desired state populator starts to run
I0531 22:01:52.804167   20802 kubelet.go:1816] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 2562047h47m16.854775807s ago; threshold is 3m0s]
I0531 22:01:52.904326   20802 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
I0531 22:01:52.904356   20802 kubelet.go:1816] skipping pod synchronization - [container runtime is down]
I0531 22:01:52.910487   20802 kubelet_node_status.go:82] Attempting to register node localhost
E0531 22:01:52.911502   20802 kubelet_node_status.go:106] Unable to register node "localhost" with API server: Post https://localhost:8443/api/v1/nodes: dial tcp 127.0.0.1:8443: getsockopt: connection refused
I0531 22:01:53.104499   20802 kubelet.go:1816] skipping pod synchronization - [container runtime is down]
I0531 22:01:53.111720   20802 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
I0531 22:01:53.122791   20802 kubelet_node_status.go:82] Attempting to register node localhost
E0531 22:01:53.123590   20802 kubelet_node_status.go:106] Unable to register node "localhost" with API server: Post https://localhost:8443/api/v1/nodes: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:01:53.173167   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://localhost:8443/api/v1/pods?fieldSelector=spec.nodeName%3Dlocalhost&limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:01:53.174609   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: Failed to list *v1.Node: Get https://localhost:8443/api/v1/nodes?fieldSelector=metadata.name%3Dlocalhost&limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:01:53.175612   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: Failed to list *v1.Service: Get https://localhost:8443/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
I0531 22:01:53.504664   20802 kubelet.go:1816] skipping pod synchronization - [container runtime is down]
I0531 22:01:53.523839   20802 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
I0531 22:01:53.531345   20802 kubelet_node_status.go:82] Attempting to register node localhost
E0531 22:01:53.532278   20802 kubelet_node_status.go:106] Unable to register node "localhost" with API server: Post https://localhost:8443/api/v1/nodes: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:01:54.174504   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://localhost:8443/api/v1/pods?fieldSelector=spec.nodeName%3Dlocalhost&limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:01:54.176349   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: Failed to list *v1.Node: Get https://localhost:8443/api/v1/nodes?fieldSelector=metadata.name%3Dlocalhost&limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:01:54.176654   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: Failed to list *v1.Service: Get https://localhost:8443/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
I0531 22:01:54.304816   20802 kubelet.go:1816] skipping pod synchronization - [container runtime is down]
I0531 22:01:54.332526   20802 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
I0531 22:01:54.339529   20802 kubelet_node_status.go:82] Attempting to register node localhost
E0531 22:01:54.340458   20802 kubelet_node_status.go:106] Unable to register node "localhost" with API server: Post https://localhost:8443/api/v1/nodes: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:01:55.175704   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://localhost:8443/api/v1/pods?fieldSelector=spec.nodeName%3Dlocalhost&limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:01:55.177501   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: Failed to list *v1.Node: Get https://localhost:8443/api/v1/nodes?fieldSelector=metadata.name%3Dlocalhost&limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:01:55.178495   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: Failed to list *v1.Service: Get https://localhost:8443/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
I0531 22:01:55.905084   20802 kubelet.go:1816] skipping pod synchronization - [container runtime is down]
I0531 22:01:55.940661   20802 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
I0531 22:01:55.948096   20802 kubelet_node_status.go:82] Attempting to register node localhost
E0531 22:01:55.949007   20802 kubelet_node_status.go:106] Unable to register node "localhost" with API server: Post https://localhost:8443/api/v1/nodes: dial tcp 127.0.0.1:8443: getsockopt: connection refused
I0531 22:01:56.114536   20802 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
I0531 22:01:56.119335   20802 cpu_manager.go:155] [cpumanager] starting with none policy
I0531 22:01:56.119358   20802 cpu_manager.go:156] [cpumanager] reconciling every 10s
I0531 22:01:56.119373   20802 policy_none.go:42] [cpumanager] none policy: Start
E0531 22:01:56.119417   20802 container_manager_linux.go:544] failed to get rootfs info,  cannot set ephemeral storage capacity: failed to get device for dir "/tmp/openshift.local.clusterup/openshift.local.volumes": could not find device with major: 0, minor: 45 in cached partitions map
Starting Device Plugin manager
E0531 22:01:56.176908   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://localhost:8443/api/v1/pods?fieldSelector=spec.nodeName%3Dlocalhost&limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:01:56.178576   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: Failed to list *v1.Node: Get https://localhost:8443/api/v1/nodes?fieldSelector=metadata.name%3Dlocalhost&limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:01:56.179418   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: Failed to list *v1.Service: Get https://localhost:8443/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:01:56.327769   20802 event.go:209] Unable to write event: 'Post https://localhost:8443/api/v1/namespaces/default/events: dial tcp 127.0.0.1:8443: getsockopt: connection refused' (may retry after sleeping)
E0531 22:01:57.178448   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://localhost:8443/api/v1/pods?fieldSelector=spec.nodeName%3Dlocalhost&limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:01:57.179605   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: Failed to list *v1.Node: Get https://localhost:8443/api/v1/nodes?fieldSelector=metadata.name%3Dlocalhost&limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:01:57.180773   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: Failed to list *v1.Service: Get https://localhost:8443/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:01:58.180001   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://localhost:8443/api/v1/pods?fieldSelector=spec.nodeName%3Dlocalhost&limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:01:58.180332   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:461: Failed to list *v1.Node: Get https://localhost:8443/api/v1/nodes?fieldSelector=metadata.name%3Dlocalhost&limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:01:58.181724   20802 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: Failed to list *v1.Service: Get https://localhost:8443/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: getsockopt: connection refused
I0531 22:01:59.105503   20802 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
I0531 22:01:59.127892   20802 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
I0531 22:01:59.128051   20802 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
W0531 22:01:59.135632   20802 status_manager.go:461] Failed to get status for pod "kube-scheduler-localhost_kube-system(4699e4e1c8a7146d6acd92baf39234ef)": Get https://localhost:8443/api/v1/namespaces/kube-system/pods/kube-scheduler-localhost: dial tcp 127.0.0.1:8443: getsockopt: connection refused
I0531 22:01:59.141711   20802 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
I0531 22:01:59.141844   20802 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
W0531 22:01:59.148217   20802 status_manager.go:461] Failed to get status for pod "master-api-localhost_kube-system(8364885b3a3efddca13f5a6ada480812)": Get https://localhost:8443/api/v1/namespaces/kube-system/pods/master-api-localhost: dial tcp 127.0.0.1:8443: getsockopt: connection refused
I0531 22:01:59.149165   20802 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
I0531 22:01:59.154798   20802 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
I0531 22:01:59.154924   20802 kubelet_node_status.go:82] Attempting to register node localhost
I0531 22:01:59.154959   20802 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
E0531 22:01:59.157234   20802 kubelet_node_status.go:106] Unable to register node "localhost" with API server: Post https://localhost:8443/api/v1/nodes: dial tcp 127.0.0.1:8443: getsockopt: connection refused
W0531 22:01:59.164447   20802 status_manager.go:461] Failed to get status for pod "master-etcd-localhost_kube-system(4bb0d3d3b26a7aa3dc2bd08a4b4326a5)": Get https://localhost:8443/api/v1/namespaces/kube-system/pods/master-etcd-localhost: dial tcp 127.0.0.1:8443: getsockopt: connection refused
E0531 22:02:09.284089   20802 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "kube-scheduler-localhost_kube-system(4699e4e1c8a7146d6acd92baf39234ef)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-scheduler-localhost": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:258: applying cgroup configuration for process caused \"No such device or address\""
E0531 22:02:09.284126   20802 kuberuntime_manager.go:646] createPodSandbox for pod "kube-scheduler-localhost_kube-system(4699e4e1c8a7146d6acd92baf39234ef)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-scheduler-localhost": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:258: applying cgroup configuration for process caused \"No such device or address\""
E0531 22:02:09.284228   20802 pod_workers.go:186] Error syncing pod 4699e4e1c8a7146d6acd92baf39234ef ("kube-scheduler-localhost_kube-system(4699e4e1c8a7146d6acd92baf39234ef)"), skipping: failed to "CreatePodSandbox" for "kube-scheduler-localhost_kube-system(4699e4e1c8a7146d6acd92baf39234ef)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-scheduler-localhost_kube-system(4699e4e1c8a7146d6acd92baf39234ef)\" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod \"kube-scheduler-localhost\": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:258: applying cgroup configuration for process caused \\\"No such device or address\\\"\""

Comment 1 Tom Nguyen 2018-06-02 19:18:02 UTC

The bug usually starts at the point origin needs to pull down the openshift/origin-web-console image:

Events:
  Type     Reason                  Age                From                Message
  ----     ------                  ----               ----                -------
  Normal   Scheduled               1m                 default-scheduler   Successfully assigned webconsole-7dfbffd44d-bz44s to localhost
  Normal   SuccessfulMountVolume   1m                 kubelet, localhost  MountVolume.SetUp succeeded for volume "webconsole-config"
  Normal   SuccessfulMountVolume   1m                 kubelet, localhost  MountVolume.SetUp succeeded for volume "serving-cert"
  Normal   SuccessfulMountVolume   1m                 kubelet, localhost  MountVolume.SetUp succeeded for volume "webconsole-token-zxjsp"
  Warning  FailedCreatePodSandBox  13s (x2 over 36s)  kubelet, localhost  Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "webconsole-7dfbffd44d-bz44s": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:258: applying cgroup configuration for process caused \"No such device or address\""
  Normal   SandboxChanged          10s (x2 over 36s)  kubelet, localhost  Pod sandbox changed, it will be killed and re-created.

https://gitlab.com/tom81094/bugs/raw/master/f28/docker-2:1.13.1-56.git6c336e4/openshift-web-console-logs
https://gitlab.com/tom81094/bugs/raw/master/f28/docker-2:1.13.1-56.git6c336e4/oc-cluster-logs

Comment 2 Enrico Scholz 2018-06-04 11:34:07 UTC

ditto here; can be reproduced by 

docker run --rm  --cpu-shares=128  fedora:28 bash

Comment 3 Filipe Brandenburger 2018-06-04 18:01:10 UTC

Please see:

https://github.com/projectatomic/runc/pull/10

Which fixes this problem.

(NOTE: While the backport of the single commit/PR seems to be enough, it's probably best to look at backporting more, since there were other changes around that code. Perhaps a whole refresh of upstream "runc" would be good there.)

Cheers,
Filipe

Comment 4 Dusty Mabe 2018-06-11 20:56:21 UTC

I'm seeing this when installing/running openshift origin 3.9.0 on Fedora Atomic Host release candidate. This is blocking future releases of FAH.

I tracked down the problem to this change:

```
# rpm-ostree db diff a5f1234a302fb064f67f09afe8ddd9cbac524a406a257a562fd18000dac99ba8 cefc79e6ea4d7e5eec51a32c00e1ecd6ca678d322406fecd347bc9c49e5d5255 
ostree diff commit old: a5f1234a302fb064f67f09afe8ddd9cbac524a406a257a562fd18000dac99ba8
ostree diff commit new: cefc79e6ea4d7e5eec51a32c00e1ecd6ca678d322406fecd347bc9c49e5d5255
Upgraded:
  docker 2:1.13.1-51.git4032bd5.fc28 -> 2:1.13.1-56.git6c336e4.fc28
  docker-common 2:1.13.1-51.git4032bd5.fc28 -> 2:1.13.1-56.git6c336e4.fc28
  docker-rhel-push-plugin 2:1.13.1-51.git4032bd5.fc28 -> 2:1.13.1-56.git6c336e4.fc28
  quota 1:4.04-5.fc28 -> 1:4.04-6.fc28
  quota-nls 1:4.04-5.fc28 -> 1:4.04-6.fc28
  selinux-policy 3.14.1-29.fc28 -> 3.14.1-30.fc28
  selinux-policy-targeted 3.14.1-29.fc28 -> 3.14.1-30.fc28
Removed:
  oci-register-machine-0-6.1.git66fa845.fc28.x86_64
  systemd-container-238-8.git0e0aa59.fc28.x86_64
```

An example of a container not getting started is one of the glusterfs daemonset containers. Here is a snippet from oc describe:

```
  Warning  FailedCreatePodSandBox  7m (x16287 over 5h)  kubelet, 10.0.12.155  Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "glusterfs-storage-mlpdl": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:258: applying cgroup configuration for process caused \"No such device or address\""
  Normal   SandboxChanged          2m (x16532 over 5h)  kubelet, 10.0.12.155  Pod sandbox changed, it will be killed and re-created.
```

Comment 5 Daniel Walsh 2018-06-12 12:57:40 UTC

Any chance this is SELinux related?

Comment 6 Antonio Murdaca 2018-06-12 12:59:05 UTC

Mrunal, this is another bz about the cgroup fix that went into runc :/ https://github.com/projectatomic/runc/commit/99a2d0844a013541744154a07380422a073c4926

Comment 7 Fedora Update System 2018-06-12 19:26:05 UTC

docker-1.13.1-59.gitaf6b32b.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-c2e93d5623

Comment 8 Fedora Update System 2018-06-12 20:16:44 UTC

docker-1.13.1-59.gitaf6b32b.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-993659ebfd

Comment 9 Dusty Mabe 2018-06-12 21:31:21 UTC

Ran an openshift cluster on top of docker-1.13.1-59.gitaf6b32b.fc28 using ostree ref `fedora/28/x86_64/atomic-host ` in repo `https://dustymabe.fedorapeople.org/repo/` fixes it for me

Comment 10 Fedora Update System 2018-06-13 04:31:54 UTC

docker-1.13.1-59.gitaf6b32b.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-c2e93d5623

Comment 11 Fedora Update System 2018-06-13 15:18:25 UTC

docker-1.13.1-59.gitaf6b32b.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Comment 12 Fedora Update System 2018-06-14 13:47:26 UTC

docker-1.13.1-59.gitaf6b32b.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-993659ebfd

Comment 13 Fedora Update System 2018-06-17 19:44:16 UTC

docker-1.13.1-59.gitaf6b32b.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.