Bug 1934867

Summary: Static pod logs (and others?) are not being separated by container restarts
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: NodeAssignee: Ryan Phillips <rphillips>
Node sub component: Kubelet QA Contact: Weinan Liu <weinliu>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, nagrawal, rphillips, tsweeney, vjaypurk
Version: 4.8   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-01 16:55:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Clayton Coleman 2021-03-03 22:03:49 UTC
In an e2e test I noticed that MCD caused a reboot and after the reboot the new container (new start) shared the same log directory.  That is a bug - whenever Kubelet starts a new container the logs should be distinct.  That goes for static, daemonset, or regular pods, and applies to reboots, crashed containers, or deliberate container stops by an admin using process kill.  The container ID must change everytime a container is started, this is a part of Kube conformance i.e. (if pid 1 dies the value in status: containerID: cri-o://922f57399fddb33cbeeec03203fa508f00553587a8ea711a3c0bd9b879ff1057 must be different the next time).  Everytime the new process is started (which means the previous process died) we must get a new container id and a new set of logs.

This is high because it means some pods are not behaving the same as others and we are not conformant.  We need an e2e test in kube that ensures that if a container process is killed as a static pod, the next time it starts it has a different containerID and the logs are empty, and we need cri-o to behave that way.  I would expect to make this a conformance test.

The feature behavior (this is nice!) can be separate and discussed separately.

Comment 1 Ryan Phillips 2021-03-04 01:04:36 UTC
I tested this a number of times with a local-up-cluster. The rotation is working correctly (with and without the UID) in the example pod. Moving the static pod in and out of the static pod directory works correctly. The log name is based on the restartCount: 0.log, 1.log. The problem stems from crio-wipe being run and wiping the restartCount.

apiVersion: v1
kind: Pod
uid: 8673c22a-27d6-4a2e-aedd-bc791bf665cf
metadata:
  name: test-pod-1
  namespace: default
spec:
  containers:
  - name: test-hello
    image: fedora
    command:
    - /bin/sh
    - -c
    - |
      #!/bin/sh
      while true ; do
        date
        sleep 5
      done

Comment 2 Ryan Phillips 2021-03-04 02:52:59 UTC
Potential PR: https://github.com/kubernetes/kubernetes/pull/99748

Comment 7 Ryan Phillips 2021-06-01 16:55:48 UTC
Fixed and merged via a separate BZ.

*** This bug has been marked as a duplicate of bug 1956898 ***

Comment 9 Red Hat Bugzilla 2023-09-15 01:02:43 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days