Bug 1664234 - /var/log monitoring for node disk pressure when mounted with individual mount option.
Summary: /var/log monitoring for node disk pressure when mounted with individual moun...
Keywords:
Status: CLOSED DUPLICATE of bug 1574866
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Seth Jennings
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-08 07:23 UTC by Sudarshan Chaudhari
Modified: 2019-07-22 20:15 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-22 20:15:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3428961 0 None None None 2019-01-08 07:28:44 UTC

Description Sudarshan Chaudhari 2019-01-08 07:23:23 UTC
Description of problem:

pods scheduled on the nodes which have /var/log mounted with a separate disk fails in "Container Creating" state as /var/log is 100% filled. 

##############################################################################################################
-sh-4.2$ oc get pods
NAME                       READY     STATUS              RESTARTS   AGE
docker-registry-19-csclh   0/1       ContainerCreating   0          1h
registry-console-4-sfrc8   1/1       Running             0          1h
router-2-n3d4d             0/1       ContainerCreating   0          1h
router-2-ppcjm             1/1       Running             0          1h

-sh-4.2$ oc describe pod router-2-n3d4d
Events:
  FirstSeen     LastSeen        Count   From                                    SubObjectPath   Type            Reason          Message
  ---------     --------        -----   ----                                    -------------   --------        ------          -------
  1m            1m              1       default-scheduler                                       Normal          Scheduled       Successfully assigned router-2-n3d4d to infra1.example.com
  1m            5s              9       kubelet, infra1.example.com                     Warning         FailedSync      Error syncing pod

-sh-4.2$ oc describe node -l region=infra
Name:                   infra1.example.com
. . .
Phase:
Conditions:
  Type                  Status  LastHeartbeatTime                       LastTransitionTime                      Reason                          Message
  ----                  ------  -----------------                       ------------------                      ------                          -------
  OutOfDisk             False   Tue, 20 Mar 2018 10:44:23 -0300         Mon, 19 Mar 2018 11:20:42 -0300         KubeletHasSufficientDisk        kubelet has sufficient disk space available   <-----------[1]
  MemoryPressure        False   Tue, 20 Mar 2018 10:44:23 -0300         Mon, 19 Mar 2018 11:20:42 -0300         KubeletHasSufficientMemory      kubelet has sufficient memory available
  DiskPressure          False   Tue, 20 Mar 2018 10:44:23 -0300         Mon, 19 Mar 2018 11:20:42 -0300         KubeletHasNoDiskPressure        kubelet has no disk pressure
  Ready                 True    Tue, 20 Mar 2018 10:44:23 -0300         Mon, 19 Mar 2018 11:20:52 -0300         KubeletReady                    kubelet is posting ready status

After looking at the logs from /sos-command/logs/journalctl_--no-pager_--all_--boot_--output_verbose
~~~SNIP~~~
 MESSAGE=E0321 11:59:06.703800   91791 kuberuntime_manager.go:619] createPodSandbox for pod "router-2-n3d4d_default(fdd4f587-2d13-11e8-9f0e-00505682f47f)" failed: mkdir /var/log/pods/fdd4f587-2d13-11e8-9f0e-00505682f47f: no space left on device
 MESSAGE=E0321 11:59:06.703825   91791 pod_workers.go:182] Error syncing pod fdd4f587-2d13-11e8-9f0e-00505682f47f ("router-2-n3d4d_default(fdd4f587-2d13-11e8-9f0e-00505682f47f)"), skipping: failed to "CreatePodSandbox" for "router-2-n3d4d_default(fdd4f587-2d13-11e8-9f0e-00505682f47f)" with CreatePodSandboxError: "Create pod log directory for pod \"router-2-n3d4d_default(fdd4f587-2d13-11e8-9f0e-00505682f47f)\" failed: mkdir /var/log/pods/fdd4f587-2d13-11e8-9f0e-00505682f47f: no space left on device"
~~~SNIP~~~

Also after looking at the output of the df command from /sos-command/filesys/df_-al
~~~
Filesystem                     1K-blocks    Used Available Use% Mounted on
rootfs                                 -       -         -    - /
.
.
/dev/mapper/rhel-tmp             9754624   35104   9719520   1% /tmp
/dev/mapper/rhel-var            19523584 1148144  18375440   6% /var
/dev/mapper/rhel-var_log         6825984 6825964        20 100% /var/log        <-------------------------- /var/lib/pod no space
~~~
##############################################################################################################

Even if /var/log with individual mount point is filled the node status is [1]


How reproducible:

Often

Steps to Reproduce:
1. mount /var/log with a separate partition
2. fill the /var/log mount point
3. try to schedule the node

Actual results:
pods are failing to run.

Expected results:
the node should have observed the disk pressure

Comment 2 Seth Jennings 2019-07-22 20:15:16 UTC

*** This bug has been marked as a duplicate of bug 1574866 ***


Note You need to log in before you can comment on or make changes to this bug.