1879029 – RFE: kubelet reruns init containers on failure of critical containers

Bug 1879029 - RFE: kubelet reruns init containers on failure of critical containers

Summary: RFE: kubelet reruns init containers on failure of critical containers

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Ryan Phillips
QA Contact:	Weinan Liu
Docs Contact:
URL:
Whiteboard:
Depends On:	1827569
Blocks:	1797475 1859307
TreeView+	depends on / blocked

Reported:	2020-09-15 09:03 UTC by Martin Bukatovic
Modified:	2023-09-14 06:08 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1827569
Environment:
Last Closed:	2020-10-23 16:04:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	kubernetes enhancements issues 871	0	None	closed	Re-run initContainers in a Deployment when containers exit on error	2021-01-26 15:15:45 UTC

Description Martin Bukatovic 2020-09-15 09:03:47 UTC

+++ This bug was initially created as a clone of Bug #1827569 +++

Originally reported by Shekhar Berry during OCS Performance analysis on Azure.

The bug description of the cloned bug has been polished to communicate the
issue wrt OCP. See original BZ 1827569 for full details and history.

Description of problem
======================

When one changes cache configuration of an Azure disk attached to an Azure
virtual machine, which is hosting a worker node of OCP, pods using this Azure
disk loses acess to the disk.

There are couple of "known" kubernetes issues about this:
https://github.com/kubernetes/kubernetes/issues/52345 and a KEP was open but
discontinued https://github.com/kubernetes/enhancements/issues/871.

Version-Release number of selected component
============================================

- OCP 4.5.0-0.nightly-2020-08-15-052753
- OCS 4.4.2

- OCP 4.5.0-0.nightly-2020-08-20-051434
- OCS 4.5.0-54.ci

How reproducible
================

100%

Steps to Reproduce
==================

1. Install OCP cluster on Azure (with at least 3 worker nodes)
2. Install OCS (one OSD Azure disk will be attached to each worker)
3. Check that OCS is running fine (status is ok, all OCS pods are running)
4. In Azure Console web, locate OSD Azure disk for each worker VM and set it's
   **Host caching** from **Read-only** to **None**
5. Check status of OSD pods again

Actual results
==============

Two OSD pods out of 3 get stuck in CrashLoopBackOff state:

```
rook-ceph-osd-0-67db8b7b97-x6vlk                                  0/1     CrashLoopBackOff   6          23h
rook-ceph-osd-1-6cfd5dbfb6-wdpn8                                  1/1     Running            0          23h
rook-ceph-osd-2-7f78cc585c-4wvgg                                  0/1     CrashLoopBackOff   6          23h
```

To recover, a manual intervention is necessary.

Expected results
================

OSD pods are able to recover change of disk caching in Azure, without getting
stuck in CBO state.

Additional info
===============

As analyzed by leseb in comment
https://bugzilla.redhat.com/show_bug.cgi?id=1827569#c25:

Ok so the issue is the following:

1. disk /dev/sdd is used by the OSD and identified by major and minor "8, 48"
2. rook in its init containers copies the pvc onto the osd location /var/lib/ceph/osd/ceph-0/block (so still idenfied as "8, 48")
2. the cache is changed
3. a new disk appears! basically the copied disk identifier "8, 48" does not exist anymore
4. not the disk is /dev/sde and is obviously different

Unfortunately Kubernetes never re-run the entire deployment, it only restarts the main called "osd" container.
So the osd keeps trying to read /var/lib/ceph/osd/ceph-0/block which points to nothing, orphan fd basically and horribly fails forever.

The problem is that Kubernetes never runs the full deployment, if it did, we would go by the init container sequence again.

I've found a couple of "known" kubernetes issues about this: https://github.com/kubernetes/kubernetes/issues/52345 and a KEP was open but discontinued https://github.com/kubernetes/enhancements/issues/871.

So it looks like we don't have a good way to fix this now (from OCS perspective).

Comment 1 Seth Jennings 2020-09-16 15:20:45 UTC

Triaging this for now, but this is not a bug; it is an feature request.

Comment 4 Ryan Phillips 2020-10-23 16:04:50 UTC

This is working as designed Closing... InitContainers can be run again on failures and need to be idempotent. I do not believe this behavior will change.

Comment 5 Sébastien Han 2020-10-26 08:35:45 UTC

Ryan, did you mean "InitContainers can NOT be run again on failures"?
I don't understand why re-running them on failure wouldn't make them idempotent.

Why can not we change this behavior?
Could you please clarify?

Thanks!

Comment 6 Red Hat Bugzilla 2023-09-14 06:08:29 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.