1585735 – Regression of bz#1504709

Bug 1585735 - Regression of bz#1504709

Summary: Regression of bz#1504709

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	docker
Sub Component:
Version:	28
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Antonio Murdaca
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-06-04 14:30 UTC by Brett Lentz
Modified:	2023-09-14 04:29 UTC (History)
CC List:	15 users (show)
Fixed In Version:	docker-1.13.1-59.gitaf6b32b.fc28 docker-1.13.1-59.gitaf6b32b.fc27
Clone Of:
Environment:
Last Closed:	2018-06-13 15:18:34 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Brett Lentz 2018-06-04 14:30:02 UTC

Description of problem:
After upgrading to docker-1.13.1-56.git6c336e4.fc28.x86_64, I am unable to run `oc cluster up` and have a complete openshift cluster.

Version-Release number of selected component (if applicable):
docker-1.13.1-56.git6c336e4.fc28.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Install docker-1.13.1-56.git6c336e4.fc28.x86_64 and oc-3.7.46
2. Run `oc cluster up`
3. Run docker ps -a


Actual results:
Tons of containers in Created state, but failing. 

Logs show:

container_linux.go:247: starting container process caused "process_linux.go:258: applying cgroup configuration for process caused \"No such device or address\""


Expected results:
A running origin cluster

Additional info:

Comment 1 Daniel Walsh 2018-06-04 20:03:53 UTC

Any chance this is an SELinux issue?

Comment 2 Brett Lentz 2018-06-04 21:20:55 UTC

While it doesn't appear to be an SELinux error, I'm getting additional output. See below for my output with setenforce=1 and setenforce=0:


$ oc cluster up --image=registry.access.redhat.com/openshift3/ose --version=v3.7 --host-data-dir=/home/blentz/.oc/openshift.local.data --use-existing-config=true

Starting OpenShift using registry.access.redhat.com/openshift3/ose:v3.7 ...
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for registry.access.redhat.com/openshift3/ose:v3.7 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... 
   WARNING: Binding DNS on port 8053 instead of 53, which may not be resolvable from all clients.
-- Checking type of volume mount ... 
   Using nsenter mounter for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ... 
   Using 127.0.0.1 as the server IP
-- Starting OpenShift container ... 
   Starting OpenShift using container 'origin'
   Waiting for API server to start listening
FAIL
   Error: cannot access master readiness URL https://127.0.0.1:8443/healthz/ready
   Details:
     Last 10 lines of "origin" container log:
     E0604 21:09:44.095751   19781 reflector.go:216] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:72: Failed to list *rbac.ClusterRoleBinding: no kind "ClusterRoleBinding" is registered for version "rbac.authorization.k8s.io/v1"
     E0604 21:09:44.109782   19781 status.go:62] apiserver received an error that is not an metav1.Status: no kind "ClusterRole" is registered for version "rbac.authorization.k8s.io/v1"
     E0604 21:09:44.111169   19781 reflector.go:216] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:72: Failed to list *rbac.ClusterRole: no kind "ClusterRole" is registered for version "rbac.authorization.k8s.io/v1"
     E0604 21:09:44.646567   19781 controllers.go:118] Server isn't healthy yet. Waiting a little while.
     E0604 21:09:44.651282   19781 reflector.go:216] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:72: Failed to list *api.Endpoints: User "system:node:localhost" cannot list endpoints at the cluster scope: User "system:node:localhost" cannot list all endpoints in the cluster (get endpoints)
     E0604 21:09:44.861087   19781 status.go:62] apiserver received an error that is not an metav1.Status: no kind "ClusterRole" is registered for version "rbac.authorization.k8s.io/v1"
     E0604 21:09:44.861719   19781 storage_rbac.go:166] unable to initialize clusterroles: no kind "ClusterRole" is registered for version "rbac.authorization.k8s.io/v1"
     E0604 21:09:44.864708   19781 status.go:62] apiserver received an error that is not an metav1.Status: no kind "ClusterRole" is registered for version "rbac.authorization.k8s.io/v1"
     E0604 21:09:44.865419   19781 storage_rbac.go:166] unable to initialize clusterroles: no kind "ClusterRole" is registered for version "rbac.authorization.k8s.io/v1"
     F0604 21:09:44.865824   19781 hooks.go:133] PostStartHook "authorization.openshift.io-bootstrapclusterroles" failed: unable to initialize roles: timed out waiting for the condition


   Caused By:
     Error: Get https://127.0.0.1:8443/healthz/ready: dial tcp 127.0.0.1:8443: getsockopt: connection refused

 $ oc cluster down
 $ sudo setenforce 0
 $ oc cluster up --image=registry.access.redhat.com/openshift3/ose --version=v3.7 --host-data-dir=/home/blentz/.oc/openshift.local.data --use-existing-config=true

Starting OpenShift using registry.access.redhat.com/openshift3/ose:v3.7 ...
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for registry.access.redhat.com/openshift3/ose:v3.7 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... 
   WARNING: Binding DNS on port 8053 instead of 53, which may not be resolvable from all clients.
-- Checking type of volume mount ... 
   Using nsenter mounter for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ... 
   Using 127.0.0.1 as the server IP
-- Starting OpenShift container ... 
   Starting OpenShift using container 'origin'
   Waiting for API server to start listening
FAIL
   Error: cannot access master readiness URL https://127.0.0.1:8443/healthz/ready
   Details:
     Last 10 lines of "origin" container log:
     E0604 21:10:49.341243   25230 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "persistent-volume-setup-vpxs6_default(f2d29cfc-6801-11e8-a25f-c85b76f3eace)" failed: rpc error: code = 2 desc = failed to start sandbox container for pod "persistent-volume-setup-vpxs6": Error response from daemon: {"message":"oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:258: applying cgroup configuration for process caused \\\"No such device or address\\\"\"\n"}
     E0604 21:10:49.341267   25230 kuberuntime_manager.go:622] createPodSandbox for pod "persistent-volume-setup-vpxs6_default(f2d29cfc-6801-11e8-a25f-c85b76f3eace)" failed: rpc error: code = 2 desc = failed to start sandbox container for pod "persistent-volume-setup-vpxs6": Error response from daemon: {"message":"oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:258: applying cgroup configuration for process caused \\\"No such device or address\\\"\"\n"}
     E0604 21:10:49.341364   25230 pod_workers.go:186] Error syncing pod f2d29cfc-6801-11e8-a25f-c85b76f3eace ("persistent-volume-setup-vpxs6_default(f2d29cfc-6801-11e8-a25f-c85b76f3eace)"), skipping: failed to "CreatePodSandbox" for "persistent-volume-setup-vpxs6_default(f2d29cfc-6801-11e8-a25f-c85b76f3eace)" with CreatePodSandboxError: "CreatePodSandbox for pod \"persistent-volume-setup-vpxs6_default(f2d29cfc-6801-11e8-a25f-c85b76f3eace)\" failed: rpc error: code = 2 desc = failed to start sandbox container for pod \"persistent-volume-setup-vpxs6\": Error response from daemon: {\"message\":\"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:258: applying cgroup configuration for process caused \\\\\\\"No such device or address\\\\\\\"\\\"\\n\"}"
     W0604 21:10:49.392559   25230 pod_container_deletor.go:77] Container "c9db9acf9ed59d4c6839864468622e32c3984a9be848fe3f84123503c016ba6c" not found in pod's containers
     E0604 21:10:49.662676   25230 reflector.go:216] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:72: Failed to list *api.Endpoints: User "system:node:localhost" cannot list endpoints at the cluster scope: User "system:node:localhost" cannot list all endpoints in the cluster (get endpoints)
     E0604 21:10:49.838991   25230 status.go:62] apiserver received an error that is not an metav1.Status: no kind "ClusterRole" is registered for version "rbac.authorization.k8s.io/v1"
     E0604 21:10:49.839534   25230 storage_rbac.go:166] unable to initialize clusterroles: no kind "ClusterRole" is registered for version "rbac.authorization.k8s.io/v1"
     E0604 21:10:49.842465   25230 status.go:62] apiserver received an error that is not an metav1.Status: no kind "ClusterRole" is registered for version "rbac.authorization.k8s.io/v1"
     E0604 21:10:49.843252   25230 storage_rbac.go:166] unable to initialize clusterroles: no kind "ClusterRole" is registered for version "rbac.authorization.k8s.io/v1"
     F0604 21:10:49.843559   25230 hooks.go:133] PostStartHook "authorization.openshift.io-bootstrapclusterroles" failed: unable to initialize roles: timed out waiting for the condition


   Caused By:
     Error: Get https://127.0.0.1:8443/healthz/ready: dial tcp 127.0.0.1:8443: getsockopt: connection refused

Comment 3 Daniel Walsh 2018-06-05 11:32:47 UTC

Antonio another healthcheck error? Also related to runc?

Comment 4 Antonio Murdaca 2018-06-05 12:27:37 UTC

(In reply to Daniel Walsh from comment #3)
> Antonio another healthcheck error? Also related to runc?

I believe so, Mrunal?

Comment 5 Jakub Čajka 2018-06-05 13:27:13 UTC

For the record. I have seen similar issue on f26 when I have seen similar issue while trying to run osbs-box with origin 3.9 (docker-1.13.1-44.git584d391.fc26.x86_64). I have tried to disable selinux(set to disabled in /etc/selinux/config, even in permissive mode the error appeared) and it have workaround the issue for me.

Also after hitting this issue system's selinux seemed in the inconsistent state, i.e. adit2allow failed with really weird errors. I will try to get hold of it again and reproduce with more recent version of Fedora. 

Does disabling selinux workaround the issue?

Comment 6 Jakub Čajka 2018-06-05 15:10:35 UTC

(In reply to Jakub Čajka from comment #5)
> For the record. I have seen similar issue on f26 when I have seen similar
> issue while trying to run osbs-box with origin 3.9
> (docker-1.13.1-44.git584d391.fc26.x86_64). I have tried to disable
> selinux(set to disabled in /etc/selinux/config, even in permissive mode the
> error appeared) and it have workaround the issue for me.
> 
> Also after hitting this issue system's selinux seemed in the inconsistent
> state, i.e. adit2allow failed with really weird errors. I will try to get
> hold of it again and reproduce with more recent version of Fedora. 
> 
> Does disabling selinux workaround the issue?

Retracting my previous statement. It seems to also affect f27 docker-1.13.1-54.git6c336e4.fc27.x86_64 and it seems to be unrelated to the whatever selinux is enforcing or not.

I see those poping up in the log:

Jun 05 17:08:53 osbs dockerd-current[808]: E0605 15:08:53.644311    1895 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "router-1-deploy_default(b3486eea-68ca-11e8-9c55-5254002bdf1c)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "router-1-deploy": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:258: applying cgroup configuration for process caused \"No such device or address\""
Jun 05 17:08:53 osbs dockerd-current[808]: E0605 15:08:53.644323    1895 kuberuntime_manager.go:647] createPodSandbox for pod "router-1-deploy_default(b3486eea-68ca-11e8-9c55-5254002bdf1c)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "router-1-deploy": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:258: applying cgroup configuration for process caused \"No such device or address\""
Jun 05 17:08:53 osbs dockerd-current[808]: E0605 15:08:53.644376    1895 pod_workers.go:186] Error syncing pod b3486eea-68ca-11e8-9c55-5254002bdf1c ("router-1-deploy_default(b3486eea-68ca-11e8-9c55-5254002bdf1c)"), skipping: failed to "CreatePodSandbox" for "router-1-deploy_default(b3486eea-68ca-11e8-9c55-5254002bdf1c)" with CreatePodSandboxError: "CreatePodSandbox for pod \"router-1-deploy_default(b3486eea-68ca-11e8-9c55-5254002bdf1c)\" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod \"router-1-deploy\": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:258: applying cgroup configuration for process caused \\\"No such device or address\\\"\""

Comment 7 Lukas Slebodnik 2018-06-06 10:57:15 UTC

BTW BZ1586107 and BZ1584909 seems to be duplicates.

and my workaround for working "oc cluster up" is to downgrade docker to docker-1.13.1-51.git4032bd5.fc28.x86_64

Comment 8 Jakub Čajka 2018-06-06 16:01:17 UTC

For the record this is not architecturally dependent. I'm hitting the same issue on ppc64le. 

And docker-1.13.1-51.git4032bd5.fc27.x86_64/ppc64le is also workaround for the issue for me.

Comment 9 Frantisek Kluknavsky 2018-06-08 15:04:46 UTC

Seems to be caused by https://github.com/projectatomic/runc/pull/8

Comment 10 Frantisek Kluknavsky 2018-06-08 15:10:49 UTC

Lokesh, can you please share some details about https://github.com/projectatomic/runc/pull/8#issuecomment-381171089 ?

Comment 11 Fedora Update System 2018-06-12 16:23:01 UTC

docker-1.13.1-58.git6c336e4.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-bace62295c

Comment 12 Fedora Update System 2018-06-12 16:23:35 UTC

docker-1.13.1-58.git6c336e4.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-b6ba25b167

Comment 13 Fedora Update System 2018-06-12 19:26:25 UTC

docker-1.13.1-59.gitaf6b32b.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-c2e93d5623

Comment 14 Fedora Update System 2018-06-12 20:16:53 UTC

docker-1.13.1-59.gitaf6b32b.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-993659ebfd

Comment 15 Fedora Update System 2018-06-13 04:32:02 UTC

docker-1.13.1-59.gitaf6b32b.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-c2e93d5623

Comment 16 Fedora Update System 2018-06-13 15:18:34 UTC

docker-1.13.1-59.gitaf6b32b.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Comment 17 Fedora Update System 2018-06-14 13:47:35 UTC

docker-1.13.1-59.gitaf6b32b.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-993659ebfd

Comment 18 Fedora Update System 2018-06-17 19:44:24 UTC

docker-1.13.1-59.gitaf6b32b.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 19 Red Hat Bugzilla 2023-09-14 04:29:23 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.