RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1943700 - Rancher [rke] clusters unable to start due to frozen container runtime in docker-1.13.1-203 and above
Summary: Rancher [rke] clusters unable to start due to frozen container runtime in doc...
Keywords:
Status: CLOSED DUPLICATE of bug 1896883
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: docker
Version: 7.9
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: Daniel Walsh
QA Contact: atomic-bugs@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-26 20:36 UTC by Robb Manes
Modified: 2024-06-14 01:03 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-26 22:18:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Robb Manes 2021-03-26 20:36:49 UTC
Description of problem:
=======================
We're seeing reports of Rancher installs failing with updated Red Hat packages of Docker.  It appears to be stuck launching the kubelet on the node, and I can easily reproduce it.

This is mentioned here, as well:

https://github.com/rancher/rke/issues/2401

I'm debugging it myself currently as well.

Version-Release number of selected component (if applicable):
=============================================================
# uname -a
Linux rhel7 3.10.0-1160.el7.x86_64 #1 SMP Tue Aug 18 14:50:17 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux

# rpm -q docker
docker-1.13.1-204.git0be3e21.el7_9.x86_64

(I also confirmed on docker-1.13.1-203)

How reproducible:
=================
Every time.

Steps to Reproduce:
===================
Download rke v1.2.4-rc9:

# wget https://github.com/rancher/rke/releases/download/v1.2.4-rc9/rke_linux-amd64 -O /bin/rke

# chmod +x /bin/rke

# rke --version
rke version v1.2.4-rc9

Create an "rke" user and add them to the docker group with a password.

# useradd rke

# usermod -a -G docker rke

# passwd rke

Become the "rke" user and ensure you can run Docker commands.

# sudo su - rke

$ docker ps -a

Generate SSH keys for the "rke" user and copy them:

$ ssh-keygen -t rsa

$ ssh-copy-id rke@localhost

Run `rke config`, I used these values and it will automatically create a `cluster.yml' file:

$ rke config
WARN[0000] This is not an officially supported version (v1.2.4-rc9) of RKE. Please download the latest official release at https://github.com/rancher/rke/releases
[+] Cluster Level SSH Private Key Path [~/.ssh/id_rsa]:
[+] Number of Hosts [1]:
[+] SSH Address of host (1) [none]: localhost
[+] SSH Port of host (1) [22]:
[+] SSH Private Key Path of host (localhost) [none]:
[-] You have entered empty SSH key path, trying fetch from SSH key parameter
[+] SSH Private Key of host (localhost) [none]:
[-] You have entered empty SSH key, defaulting to cluster level SSH key: ~/.ssh/id_rsa
[+] SSH User of host (localhost) [ubuntu]: rke
[+] Is host (localhost) a Control Plane host (y/n)? [y]:
[+] Is host (localhost) a Worker host (y/n)? [n]: y
[+] Is host (localhost) an etcd host (y/n)? [n]: y
[+] Override Hostname of host (localhost) [none]:
[+] Internal IP of host (localhost) [none]:
[+] Docker socket path on host (localhost) [/var/run/docker.sock]:
[+] Network Plugin Type (flannel, calico, weave, canal, aci) [canal]: flannel
[+] Authentication Strategy [x509]:
[+] Authorization Mode (rbac, none) [rbac]: none
[+] Kubernetes Docker image [rancher/hyperkube:v1.19.6-rancher1]:
[+] Cluster domain [cluster.local]:
[+] Service Cluster IP Range [10.43.0.0/16]:
[+] Enable PodSecurityPolicy [n]:
[+] Cluster Network CIDR [10.42.0.0/16]:
[+] Cluster DNS Service IP [10.43.0.10]:
[+] Add addon manifest URLs or YAML files [no]:

Bring the cluster online:

$ rke up
WARN[0000] This is not an officially supported version (v1.2.4-rc9) of RKE. Please download the latest official release at https://github.com/rancher/rke/releases
INFO[0000] Running RKE version: v1.2.4-rc9
INFO[0000] Initiating Kubernetes cluster
- - - - 8< - - - -
INFO[0140] Image [rancher/hyperkube:v1.19.6-rancher1] exists on host [localhost]
INFO[0140] Starting container [kubelet] on host [localhost], try #1
WARN[0190] Can't start Docker container [kubelet] on host [localhost]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
INFO[0190] Starting container [kubelet] on host [localhost], try #2
WARN[0240] Can't start Docker container [kubelet] on host [localhost]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
INFO[0240] Starting container [kubelet] on host [localhost], try #3

Observe that docker is hung during this step:

[root@rhel7 ~]# docker ps -a
(hung forever)

Actual results:
Docker is frozen when `kubelet` hyperkube rancher container is coming online.

Expected results:
Docker should not freeze/hang.

Additional info:
Downgrading to docker-1.13.1-162.git64e9980.el7_8 resolves the issue.

Comment 2 Robb Manes 2021-03-26 22:18:14 UTC
I can confirm the above issue is resolved by the patch presented in https://bugzilla.redhat.com/show_bug.cgi?id=1896883#c17, and am closing this as a duplicate.

*** This bug has been marked as a duplicate of bug 1896883 ***


Note You need to log in before you can comment on or make changes to this bug.