Bug 1823406 - /var/log/containers log symlinks are missing for some pods
Summary: /var/log/containers log symlinks are missing for some pods
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.5.0
Assignee: Ryan Phillips
QA Contact: Sunil Choudhary
URL:
Whiteboard:
: 1814859 (view as bug list)
Depends On:
Blocks: 1872726 1878795
TreeView+ depends on / blocked
 
Reported: 2020-04-13 15:47 UTC by hgomes
Modified: 2023-12-15 17:40 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1872726 1878795 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:27:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 24926 0 None closed Bug 1823406: Upstream: 89160: Remove potentially unhealthy symlink only for dead containers 2021-02-08 09:01:47 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:27:45 UTC

Description hgomes 2020-04-13 15:47:59 UTC
Description of problem:
/var/log/containers should contain a symlink for every pod and container log found in /var/log/pods

Because the symlink is missing, fluentd does not tail the log files and send them to kibana, but logs are still visible in the webconsole and when using crictl logs command.

Kubernetes issue for similar problem. The status is closed, but many comments say it can be recreated and we are seeing this behavior:
https://github.com/kubernetes/kubernetes/issues/52172

Recreating the symlinks causes the logs currently in the /var/log/pods/ path to be sent by fluentd, but any logs lost after the log-rotation completes remain missing. 

Version-Release number of selected component (if applicable):
atomic-openshift-node-3.11.161-1.git.0.4ccbe25.el7.x86_64
atomic-openshift-clients-3.11.161-1.git.0.4ccbe25.el7.x86_64
cri-o-1.11.16-0.5.dev.rhaos3.11.git3f89eba.el7.x86_64


Actual results:

Using Docker os CRI-O the same happens.

Expected results:
This should happen automatically and do not lose the symlinks

Additional info:

Comment 6 Ted Yu 2020-05-15 20:43:58 UTC
Issue #52172 was reopened.
The race condition existed in 3.11

PR #89160 has been merged upstream.

My back port PR :
https://github.com/openshift/origin/pull/24926

Comment 9 Ryan Phillips 2020-05-26 14:25:43 UTC
*** Bug 1814859 has been marked as a duplicate of this bug. ***

Comment 10 Ryan Phillips 2020-05-26 14:26:58 UTC
*** Bug 1814859 has been marked as a duplicate of this bug. ***

Comment 11 MinLi 2020-05-28 08:53:31 UTC
verified with version : 4.5.0-0.nightly-2020-05-27-010813

run test with the following script and found no cases:

#!/bin/bash 
crictl ps -q | while read line;
do
  if [ `find /var/log/containers -name *$line* | wc -l` = "0" ]
  then
    name=`crictl inspect  $line  | grep "cri-o.Name"   | cut -d ":" -f 2 | head -c 5`
    if [ $name != '"k8s_' ]
    then
      crictl inspect  $line | grep "cri-o.Name" 
    fi
  fi  
done

Comment 12 errata-xmlrpc 2020-07-13 17:27:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Comment 13 Oscar Casal Sanchez 2020-08-05 11:44:47 UTC
Hello,

I can see that this Bugzilla was opened for OCP 3.11 and it is fixed for OCP 4.5. This is not going to be backported to OCP 3.11 where the bug was reported?

Regards,
Oscar

Comment 25 Seth Jennings 2020-09-30 20:56:22 UTC
4.4.z backport is here https://bugzilla.redhat.com/show_bug.cgi?id=1878795

Comment 31 Red Hat Bugzilla 2023-09-15 00:30:58 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.