Description of problem: In current Fedora 28, kubelet.service fails to start: Started Kubernetes Kubelet Server. I0320 04:25:35.639143 3613 server.go:182] Version: v1.9.3 I0320 04:25:35.639739 3613 feature_gate.go:226] feature gates: &{map[]} W0320 04:25:35.656218 3613 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d I0320 04:25:35.663914 3613 plugins.go:101] No cloud provider specified. I0320 04:25:35.695457 3613 server.go:428] --cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to / I0320 04:25:35.696133 3613 container_manager_linux.go:242] container manager verified user specified cgroup-root exists: / I0320 04:25:35.696228 3613 container_manager_linux.go:247] Creating Container Manager object based on Node Config: {RuntimeCgroupsNa> I0320 04:25:35.696425 3613 container_manager_linux.go:266] Creating device plugin manager: false I0320 04:25:35.696563 3613 kubelet.go:313] Watching apiserver W0320 04:25:35.708014 3613 kubelet_network.go:139] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back> I0320 04:25:35.709303 3613 kubelet.go:571] Hairpin mode set to "hairpin-veth" I0320 04:25:35.711523 3613 client.go:80] Connecting to docker on unix:///var/run/docker.sock I0320 04:25:35.711655 3613 client.go:109] Start docker client with request timeout=2m0s W0320 04:25:35.716276 3613 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d I0320 04:25:35.727211 3613 docker_service.go:232] Docker cri networking managed by kubernetes.io/no-op I0320 04:25:35.739755 3613 docker_service.go:237] Docker Info: &{ID:OX6T:X64L:HMXL:4B7X:NMCA:T6M3:AXIS:FWIV:WKIS:UGF5:BA7L:QSZQ Cont> I0320 04:25:35.740025 3613 docker_service.go:250] Setting cgroupDriver to systemd I0320 04:25:35.785358 3613 remote_runtime.go:43] Connecting to runtime service unix:///var/run/dockershim.sock I0320 04:25:35.810820 3613 kuberuntime_manager.go:186] Container runtime docker initialized, version: 1.13.1, apiVersion: 1.26.0 I0320 04:25:35.834825 3613 server.go:755] Started kubelet E0320 04:25:35.837863 3613 kubelet.go:1275] Image garbage collection failed once. Stats initialization may not have completed yet: f> I0320 04:25:35.838939 3613 kubelet_node_status.go:273] Setting node annotation to enable volume controller attach/detach I0320 04:25:35.841976 3613 server.go:129] Starting to listen on 127.0.0.1:10250 I0320 04:25:35.844064 3613 server.go:299] Adding debug handlers to kubelet server. E0320 04:25:35.887124 3613 node_container_manager.go:51] Failed to create "/kubepods" cgroup F0320 04:25:35.887275 3613 kubelet.go:1364] Failed to start ContainerManager Delegation not available for unit type kubelet.service: Main process exited, code=exited, status=255/n/a Version-Release number of selected component (if applicable): kubernetes-node-1.9.3-1.fc28.x86_64 How reproducible: Always Steps to Reproduce: 1. Install kubernetes on current Fedora 28: dnf install kubernetes 2. Set up Kubernetes; in the Cockpit test VMs we use this script: https://github.com/cockpit-project/cockpit/blob/master/bots/images/scripts/lib/kubernetes.setup 3. systemctl start kubelet.service
This seems to be related: https://github.com/kubernetes/kubernetes/issues/61474
I commented on the upstream issue: It looks like ControllerManager is a slice and slices can no longer Delegate. Here: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/cgroup_manager_linux.go#L43-L50 I added .can_delegate = true, at https://github.com/systemd/systemd/blob/master/src/core/slice.c#L376 and rebuilt/reinstalled systemd 238 packages and after that I was able run oc cluster up successfully. It's possible this is due to this commit in systemd, although the comment on it causes me to have doubts: https://github.com/systemd/systemd/commit/1d9cc8768f173b25757c01aa0d4c7be7cd7116bc
This also appears to break the ability to do 'oc cluster up' on Fedora 28. One can workaround it by changing the 'cgroupdriver' that 'docker' uses (hat tip to Jason Brooks): # cp /usr/lib/systemd/system/docker.service /etc/systemd/system/ # sed -i 's/cgroupdriver=systemd/cgroupdriver=cgroupfs/' /etc/systemd/system/docker.service # systemctl daemon-reload # systemctl restart docker
Proposed as a Freeze Exception for 28-final by Fedora user miabbott using the blocker tracking app because: This bug is blocking the ability for users to run Kubernetes on Fedora 28. This affects users that are spinning up a Kubernetes cluster manually, using the 'openshift-ansible' playbook to spin up an OpenShift cluster, or using the 'oc cluster up' method for launching an OpenShift cluster.
should this bug be moved to the runc component?
Dusty seems so, as the required patch (https://github.com/opencontainers/runc/pull/1776) is against runc.
so it turns out that we need to fix this in *both* runc *and* docker because docker has its own vendored version of runc as well. So updating runc by itself won't fix it for most people since most people are still using docker. We'll need them both updated. I'm going to change the component to docker, but we need runc as well I think.
This gets fixed in libcontainer by this PR: https://github.com/opencontainers/runc/pull/1776 I'll import that into Kubernetes vendored libcontainer once it's merged into runc. Cheers! Filipe
(In reply to Micah Abbott from comment #3) > This also appears to break the ability to do 'oc cluster up' on Fedora 28. > > One can workaround it by changing the 'cgroupdriver' that 'docker' uses (hat > tip to Jason Brooks): > > # cp /usr/lib/systemd/system/docker.service /etc/systemd/system/ > # sed -i 's/cgroupdriver=systemd/cgroupdriver=cgroupfs/' > /etc/systemd/system/docker.service > # systemctl daemon-reload > # systemctl restart docker This can be done overriding docker unit like this: # systemctl edit docker.service On the editor just paste this: [Service] ExecStart= ExecStart=/usr/bin/dockerd-current \ --add-runtime oci=/usr/libexec/docker/docker-runc-current \ --default-runtime=oci \ --authorization-plugin=rhel-push-plugin \ --containerd /run/containerd.sock \ --exec-opt native.cgroupdriver=cgroupfs \ --userland-proxy-path=/usr/libexec/docker/docker-proxy-current \ --init-path=/usr/libexec/docker/docker-init-current \ --seccomp-profile=/etc/docker/seccomp.json \ $OPTIONS \ $DOCKER_STORAGE_OPTIONS \ $DOCKER_NETWORK_OPTIONS \ $ADD_REGISTRY \ $BLOCK_REGISTRY \ $INSECURE_REGISTRY \ $REGISTRIES To rollback just delete this file /etc/systemd/system/docker.service.d/override.conf
to fix this, other than the runc/docker patch, we also need a kube fix (see https://github.com/opencontainers/runc/pull/1776#issuecomment-380571191) so it's not gonna be fixed by just runc/docker back ports
(In reply to Micah Abbott from comment #3) > This also appears to break the ability to do 'oc cluster up' on Fedora 28. > > One can workaround it by changing the 'cgroupdriver' that 'docker' uses (hat > tip to Jason Brooks): > > # cp /usr/lib/systemd/system/docker.service /etc/systemd/system/ > # sed -i 's/cgroupdriver=systemd/cgroupdriver=cgroupfs/' > /etc/systemd/system/docker.service > # systemctl daemon-reload > # systemctl restart docker In my testing (RHEL 7), changing the cgroup-driver of Docker broke "oci-register-machine", so it also required changing /etc/oci-register-machine.conf to set "disable : true". I'd personally avoid changing the cgroup driver, though. (In reply to Antonio Murdaca from comment #10) > to fix this, other than the runc/docker patch, we also need a kube fix (see > https://github.com/opencontainers/runc/pull/1776#issuecomment-380571191) > > so it's not gonna be fixed by just runc/docker back ports Yes. opencontainers/runc#1776 has just been merged... So I'm gonna push a sync to that into Kubernetes code base. Planning to update master and release-1.10, but I could go back to 1.9 if you need it (and looks like you do, so I'll do that too...) Cheers, Filipe
https://github.com/kubernetes/kubernetes/pull/61926 updated to include all the relevant PRs into Kubernetes vendored libcontainer. Once that one is merged, I'll prepare cherry-picks into 1.10 and 1.9 branches. Cheers, Filipe
docker-1.13.1-52.git89b0e65.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-3c62c7e959
docker-1.13.1-52.git89b0e65.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-3c62c7e959
Discussed during the 2018-04-16 blocker review meeting: [1] The decision to punt was made: "we don't want to delay *too* long on this, but it's a fairly complex area and it doesn't feel like folks have all the consequences of this entirely worked out yet, so we would like to wait a few days to see if a clearer pictures emerges and then perhaps vote async (in bugzilla comments) on this one" [1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2018-04-16/f28-blocker-review.2018-04-16-16.00.log.txt
https://bugzilla.redhat.com/show_bug.cgi?id=1568594 is the alternative to this: instead of trying to fix everything for the systemd change in a hurry, that proposes reverting the systemd change while we fix stuff carefully. I much prefer that alternative, so I am now -1 FE to this one.
Agreed with Adam, -1 FE in favor of the systemd revert.
I prefer the alternative rather than reverting the systemd change -1 FE
(In reply to Mohan Boddu from comment #18) > I prefer the alternative rather than reverting the systemd change > > -1 FE Sorry, wrong wording I prefer the alternative which reverts the systemd change -1 FE
That's -3, setting rejected.
Looks like I'm late to the vote, but I'm also -1 to a FE, as I prefer the alternative (systemd change)
I was struck by this again (while trying to do `oc cluster up` on my rawhide system. openshift/origin:v3.9.0 docker-1.13.1-59.gitaf6b32b.fc29.x86_64 systemd-239-1.fc29.x86_64 W0704 13:09:29.210188 17481 factory.go:1180] Request for pod default/persistent-volume-setup-jcfv5 already in flight, abandoning I0704 13:09:29.610405 17481 kubelet.go:1779] skipping pod synchronization - [Failed to start ContainerManager Delegation not available for unit type] Did something regress? I can also confirm that changing cgroup driver to `cgroupdriver=cgroupfs` fixed the problem.
I got the same problem on Fedora 29 with "oc cluster up", changing the cgroup driver to cgroupfs fixed it. Is this fix not pushed for the f29 branch? Do you need a new bug to be opened for that?
(In reply to Elad Alfassa from comment #23) > I got the same problem on Fedora 29 with "oc cluster up", changing the > cgroup driver to cgroupfs fixed it. > > Is this fix not pushed for the f29 branch? Do you need a new bug to be > opened for that? So this issue is because the vendored runc in openshift is not up to date enough. The runc in fedora and the vendored runc in docker in fedora are new enough. I did some reasearch and have a conversation going on about it [here](https://pagure.io/atomic-wg/issue/510#comment-531340). Feel free to pick up the conversation there. Dusty
Resetting the status. The bodhi update got rejected, and this is still unfixed.
Does anyone know a current workaround for this? I tried to switch docker to cgroupfs with adding "--exec-opt native.cgroupdriver=cgroupfs" to OPTIONS in /etc/sysconfig/docker, and "--cgroup-driver=cgroupfs" in KUBELET_ARGS in /etc/kubernetes/kubelet, but kubelet.service does not recognize this: failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"
Nevermind, comment #9 works; apparently specifying "--exec-opt native.cgroupdriver" twice doesn't work and the second time (from OPTIONS) doesn't override the first option from /usr/lib/systemd/system/kubelet.service.
This message is a reminder that Fedora 28 is nearing its end of life. On 2019-May-28 Fedora will stop maintaining and issuing updates for Fedora 28. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '28'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 28 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
This bug is still alive and current on Fedora 29 and 30. We've had this workaround in our Cockpit tests for over a year now: sed -i 's/--cgroup-driver=systemd/--cgroup-driver=cgroupfs/' /etc/kubernetes/kubelet sed -i 's/native.cgroupdriver=systemd/native.cgroupdriver=cgroupfs/' /usr/lib/systemd/system/docker.service systemctl daemon-reload systemctl try-restart docker
I don't think docker is every going to get updated. Do you want to try with either cri-o or moby-engine (new docker, different package name) instead?
Docker has been removed from Fedora 31 totally. I think it is best to switch to moby engine for cockpit.
FTR, this isn't about cockpit, it's about the kubernetes package.
(In reply to Martin Pitt from comment #32) > FTR, this isn't about cockpit, it's about the kubernetes package. should we open a bug somewhere else?
> This bug is still alive and current on Fedora 29 and 30 Correction: Status is unknown (by me) on Fedora 30, as we dropped cockpit-kubernetes and thus don't try to install/run kubernetes on F30. So if we are the only ones to complain, I'm fine with leaving this closed. Thanks!