+++ This bug was initially created as a clone of Bug #1827569 +++ Originally reported by Shekhar Berry during OCS Performance analysis on Azure. The bug description of the cloned bug has been polished to communicate the issue wrt OCP. See original BZ 1827569 for full details and history. Description of problem ====================== When one changes cache configuration of an Azure disk attached to an Azure virtual machine, which is hosting a worker node of OCP, pods using this Azure disk loses acess to the disk. There are couple of "known" kubernetes issues about this: https://github.com/kubernetes/kubernetes/issues/52345 and a KEP was open but discontinued https://github.com/kubernetes/enhancements/issues/871. Version-Release number of selected component ============================================ - OCP 4.5.0-0.nightly-2020-08-15-052753 - OCS 4.4.2 - OCP 4.5.0-0.nightly-2020-08-20-051434 - OCS 4.5.0-54.ci How reproducible ================ 100% Steps to Reproduce ================== 1. Install OCP cluster on Azure (with at least 3 worker nodes) 2. Install OCS (one OSD Azure disk will be attached to each worker) 3. Check that OCS is running fine (status is ok, all OCS pods are running) 4. In Azure Console web, locate OSD Azure disk for each worker VM and set it's **Host caching** from **Read-only** to **None** 5. Check status of OSD pods again Actual results ============== Two OSD pods out of 3 get stuck in CrashLoopBackOff state: ``` rook-ceph-osd-0-67db8b7b97-x6vlk 0/1 CrashLoopBackOff 6 23h rook-ceph-osd-1-6cfd5dbfb6-wdpn8 1/1 Running 0 23h rook-ceph-osd-2-7f78cc585c-4wvgg 0/1 CrashLoopBackOff 6 23h ``` To recover, a manual intervention is necessary. Expected results ================ OSD pods are able to recover change of disk caching in Azure, without getting stuck in CBO state. Additional info =============== As analyzed by leseb in comment https://bugzilla.redhat.com/show_bug.cgi?id=1827569#c25: Ok so the issue is the following: 1. disk /dev/sdd is used by the OSD and identified by major and minor "8, 48" 2. rook in its init containers copies the pvc onto the osd location /var/lib/ceph/osd/ceph-0/block (so still idenfied as "8, 48") 2. the cache is changed 3. a new disk appears! basically the copied disk identifier "8, 48" does not exist anymore 4. not the disk is /dev/sde and is obviously different Unfortunately Kubernetes never re-run the entire deployment, it only restarts the main called "osd" container. So the osd keeps trying to read /var/lib/ceph/osd/ceph-0/block which points to nothing, orphan fd basically and horribly fails forever. The problem is that Kubernetes never runs the full deployment, if it did, we would go by the init container sequence again. I've found a couple of "known" kubernetes issues about this: https://github.com/kubernetes/kubernetes/issues/52345 and a KEP was open but discontinued https://github.com/kubernetes/enhancements/issues/871. So it looks like we don't have a good way to fix this now (from OCS perspective).
Triaging this for now, but this is not a bug; it is an feature request.
This is working as designed Closing... InitContainers can be run again on failures and need to be idempotent. I do not believe this behavior will change.
Ryan, did you mean "InitContainers can NOT be run again on failures"? I don't understand why re-running them on failure wouldn't make them idempotent. Why can not we change this behavior? Could you please clarify? Thanks!
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days