1569318 – cri-o updated to 1.10, but kubelet still on 1.9, rendering Kubernetes unusable after dnf update

Bug 1569318 - cri-o updated to 1.10, but kubelet still on 1.9, rendering Kubernetes unusable after dnf update

Summary: cri-o updated to 1.10, but kubelet still on 1.9, rendering Kubernetes unusabl...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kubernetes
Sub Component:
Version:	27
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Jan Chaloupka
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-19 03:41 UTC by Mike Cronce
Modified:	2018-05-16 13:05 UTC (History)
CC List:	14 users (show)
Fixed In Version:	kubernetes-1.10.1-0.fc27 kubernetes-1.10.1-0.fc28
Clone Of:
Environment:
Last Closed:	2018-05-11 01:49:07 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Mike Cronce 2018-04-19 03:41:51 UTC

Description of problem:
After a `dnf update` tonight, Kubernetes became unusable, printing a few messages like these before exiting:
    E0223 06:55:38.333733   24666 remote_runtime.go:69] Version from runtime service failed: rpc error: code = Unimplemented desc = unknown service runtime.RuntimeService

After a quick Google search, I found a Github issue - https://github.com/containerd/cri/issues/619 - which states that cri-o 1.10 is not backward-compatible.

Version-Release number of selected component (if applicable):
1.10.0

How reproducible:
Very.

Steps to Reproduce:
1. Install and configure Kubernetes with cri-o for the container runtime; single node is fine
2. dnf update to ensure latest packages are installed
3. Observe kubelet not starting

Actual results:
Kubelet, when configured to use cri-o, does not start after a system update.

Expected results:
Kubelet starts after a system update.

Additional info:
Either updating the kubernetes-node package to contain kubelet 1.10 or downgrading cri-o to 1.9.x is required.

I did not set the Severity field as I don't know what the guidelines are, but I would consider this pretty critical.

Comment 2 Jan Pazdziora (Red Hat) 2018-04-19 05:44:00 UTC

Adding Kubernetes maintainer to Cc, in case the correct course of action is to either upgrade Kubernetes to 1.10 to match CRI-O version, or at least add some conflicts with versions of CRI-O that would be too small or too big.

Comment 3 Daniel Walsh 2018-04-19 10:22:15 UTC

Well I would prefer to update Kubernetes  But I guess we could be convinced to downgrade CRI-O

Comment 4 Mike Cronce 2018-04-19 14:15:35 UTC

For what it's worth, I agree - definitely prefer updating to 1.10, but downgrading cri-o to 1.9.x is a fine consolation prize :)

Comment 5 Jan Chaloupka 2018-04-19 14:17:52 UTC

I am fine with bumping to 1.10. What about docker? Is it going to be fine with k8s 1.10?

Comment 6 Daniel Walsh 2018-04-20 10:47:25 UTC

It should be.  We are not changing the version of docker, and kubernetes supports docker-1.12 and docker-1.13.

Comment 7 Jan Chaloupka 2018-04-26 23:13:21 UTC

k8s f27 update to 1.10.1: https://bodhi.fedoraproject.org/updates/FEDORA-2018-16c8fdf9b8

Comment 8 Mike Cronce 2018-04-26 23:15:29 UTC

Taking a look at the bug linked in that - https://bugzilla.redhat.com/show_bug.cgi?id=1572389 - I think it's safe to close this as a duplicate of #1572389

Not knowing if there's any process required around that, though, I'll leave it to the experts ;)

Comment 9 Jan Chaloupka 2018-04-26 23:16:32 UTC

Feel free to test f28 update as well: https://bodhi.fedoraproject.org/updates/FEDORA-2018-9b965c4eed

Comment 10 Jan Chaloupka 2018-04-26 23:23:14 UTC

Let' make this Kubernetes bug so it gets resolved as part of the update.

Comment 11 Fedora Update System 2018-04-26 23:24:22 UTC

kubernetes-1.10.1-0.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-16c8fdf9b8

Comment 12 Fedora Update System 2018-04-26 23:24:36 UTC

kubernetes-1.10.1-0.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-9b965c4eed

Comment 13 Fedora Update System 2018-04-28 01:53:21 UTC

kubernetes-1.10.1-0.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-16c8fdf9b8

Comment 14 Fedora Update System 2018-04-28 04:07:10 UTC

kubernetes-1.10.1-0.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-9b965c4eed

Comment 15 Jan Pazdziora (Red Hat) 2018-04-30 10:46:27 UTC

The services startup now passes, with cri-o-1.10.0-4.git623b502.fc27.x86_64 and kubernetes-master-1.10.1-0.fc27.x86_64 and kubernetes-node-1.10.1-0.fc27.x86_64.

However, attempt to create simple pod like

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:latest
    ports:
    - containerPort: 80

fail with

  Warning  FailedCreatePodSandBox  4s (x2 over 17s)  kubelet, 127.0.0.1  Failed create pod sandbox: rpc error: code = Unknown desc = cri-o configured with cgroupfs cgroup manager, but received systemd slice as parent: /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podea659ebb_4c5c_11e8_9cbb_00215e258b54.slice

even if /etc/kubernetes/kubelet includes --cgroup-driver=systemd configuration in

KUBELET_ARGS="--cgroup-driver=systemd --fail-swap-on=false --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-request-timeout=5m"

and this setup used to work with 1.9. Setup with docker obviously does not have this problem.

Also, I've noticed that merely commenting out KUBE_ADMISSION_CONTROL in /etc/kubernetes/apiserver which in 1.9 seemed to have caused the admission control to default to something pretty permissive (AlwaysAdmit per https://kubernetes.io/docs/reference/generated/kube-apiserver/ ?) no longer works and creating the pod fails with

Error from server (ServerTimeout): error when creating "test-nginx-pod.json": No API token found for service account "default", retry after the token is automatically created and added to the service account

Explicitly setting

KUBE_ADMISSION_CONTROL=--admission-control=AlwaysAdmit

in /etc/kubernetes/apiserver works. This problem is obviously present with docker-based setups as well.

I'm not sure if these issues are specific to the Fedora packaging of Kubernetes 1.10 and should be discussed here or perhaps in separate bugzilla, or whether it's something that ought to be brought to Kubernetes upstream.

Comment 16 Jan Chaloupka 2018-04-30 11:07:04 UTC

> cri-o configured with cgroupfs cgroup manager, but received systemd slice as parent: /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podea659ebb_4c5c_11e8_9cbb_00215e258b54.slice

Hard to tell but it looks like cri-o bug to me rather than Kubernetes bug.

> even if /etc/kubernetes/kubelet includes --cgroup-driver=systemd configuration in KUBELET_ARGS="--cgroup-driver=systemd --fail-swap-on=false --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-request-timeout=5m"

The kubeadm sets `--cgroup-driver=systemd` by default. Anyway, this is the default kubelet configuration issue rather than a Kubernetes bug.

> Also, I've noticed that merely commenting out KUBE_ADMISSION_CONTROL in /etc/kubernetes/apiserver which in 1.9 seemed ...

Currently, it's impossible to deploy Kubernetes just by installing the Kubernetes rpms and starting the services (at least due to the fact the kubelet no longer registers itself without the kubeconfig file). Additional configuration is needed. Maybe more like a documentation issue.

> I'm not sure if these issues are specific to the Fedora packaging of Kubernetes 1.10 and should be discussed here or perhaps in separate bugzilla, or whether it's something that ought to be brought to Kubernetes upstream.

First, we should verify the issue is/is not caused by the cri-o/docker.

Comment 17 Mike Cronce 2018-04-30 13:08:58 UTC

If I remember correctly from setting up my home cluster, CRI-O defaults to cgroupfs for its cgroup driver, while Kubelet defaults to systemd.  One or the other has to be switched.  I don't remember whether or not this was documented anywhere, though; I might have just made the switch after getting an error like that.

Comment 18 Jan Pazdziora (Red Hat) 2018-04-30 13:49:39 UTC

The cgroupfs is a change in defaults in crio.conf since cri-o 1.9, I believe Dan W. is fixing it with the new cri-o build.

As for deploying Kubernetes, yes, it might be good to have some updated version of documentation and expectations. I do use kubeconfig since 1.9 in my testing or I was hitting bug 1549151.

Comment 19 Fedora Update System 2018-05-11 01:49:07 UTC

kubernetes-1.10.1-0.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 20 Fedora Update System 2018-05-16 13:05:53 UTC

kubernetes-1.10.1-0.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.