Bug 1422511 - [3.4] Eviction manager fails to list docker images
Summary: [3.4] Eviction manager fails to list docker images
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 3.4.1
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 3.4.z
Assignee: Jhon Honce
QA Contact: weiwei jiang
URL:
Whiteboard:
Depends On:
Blocks: OSOPS_V3
TreeView+ depends on / blocked
 
Reported: 2017-02-15 12:51 UTC by Marek Mahut
Modified: 2023-09-14 03:53 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-22 19:08:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Script to dump memory profiles (1.33 KB, text/plain)
2017-03-20 19:08 UTC, Jhon Honce
no flags Details

Description Marek Mahut 2017-02-15 12:51:32 UTC
Description of problem:

We have implemented the eviction on one of our clusters, but it seems to fail to list the docker images when it should kick it. Could it be the load is already too high to docker list images and therefore the eviction manager fails?

Logs and configuration files are attached.


Version-Release number of selected component (if applicable):

oc v3.4.1.2
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

Comment 4 Marek Mahut 2017-02-15 13:26:45 UTC
The error in question is:

 Feb 15 11:49:47 ip-172-31-8-225.ec2.internal atomic-openshift-node[945]: E0215  11:49:47.151424     945 eviction_manager.go:162] eviction manager: unexpected err: failed ImageStats: failed to list docker images - operation timeout: context deadline exceeded

Comment 5 Derek Carr 2017-02-15 22:30:31 UTC
For context, the general flow is as follows:

1. The eviction manager runs every 10s to verify the state of the node.
2. At each interval, it invokes the kubelet /summary/stats API endpoint to collect information on the signals it tracks.  
3. To satisfy that request, the kubelet asks the container runtime to ListImages to determine the total storage in bytes used by the images.

The request to the underlying container runtime to list images is made via a client that has a timeout configured via the kubeletArgument --runtime-request-timeout.  It should default to 2m0s, and the supplied configuration does not appear to do any override.

From the log output, it appears that this timeout was reached 4 times over the course of 8 hrs.  If the log output was set to "-v 3", I would be able to know if the operation ever succeeded.

For the node in question, are you able to do the following:

$ oc get --raw api/v1/nodes/<nodename>/proxy/stats/summary

And not hit the timeout?

Comment 6 Derek Carr 2017-02-15 22:44:01 UTC
It may also be possible that the call to list images is being impacted by excessive invocations of thin_ls which was fixed in https://github.com/openshift/ose/pull/604 but is not available in v3.4.1.2 but does appear to be available in v3.4.1.7.

Is it possible to try and reproduce with the later version so we can rule out thin_ls causing the delay?

Comment 7 Marek Mahut 2017-02-20 17:25:33 UTC
Derek, attaching the increasing logging from two nodes which has been running over memory during the weekend. The upgrade in question is planned, but we do not have a date yet.

Comment 11 Marek Mahut 2017-03-07 08:50:31 UTC
We have upgraded to v3.4.1.8, but we still see the problem. Logs to follow.

Comment 20 Derek Carr 2017-03-09 23:26:24 UTC
For awareness, cAdvisor/kubelet does the following every 10s to calculate memory.available for the node.

#!/bin/bash
#!/usr/bin/env bash

# This script reproduces what kubelet/cAdvisor does
# to calculate memory.available relative to root cgroup.
# The major change is that it excludes total_inactive_file memory.

# current memory usage
memory_capacity_in_kb=$(cat /proc/meminfo | grep MemTotal | awk '{print $2}')
memory_capacity_in_bytes=$((memory_capacity_in_kb * 1024))
memory_usage_in_bytes=$(cat /sys/fs/cgroup/memory/memory.usage_in_bytes)
memory_total_inactive_file=$(cat /sys/fs/cgroup/memory/memory.stat | grep total_inactive_file | awk '{print $2}')
memory_working_set=$((memory_usage_in_bytes - memory_total_inactive_file))
memory_available_in_bytes=$((memory_capacity_in_bytes - memory_working_set))
memory_available_in_kb=$((memory_available_in_bytes / 1024))
memory_available_in_mb=$((memory_available_in_kb / 1024))

echo "memory.capacity_in_bytes $memory_capacity_in_bytes"
echo "memory.usage_in_bytes $memory_usage_in_bytes"
echo "memory.total_inactive_file $memory_total_inactive_file"
echo "memory.working_set $memory_working_set"
echo "memory.available_in_bytes $memory_available_in_bytes"
echo "memory.available_in_kb $memory_available_in_kb"
echo "memory.available_in_mb $memory_available_in_mb"

I would like to do the following:

1. know the eviction configuration
2. set a --hard-threshold that will trigger before your alerts
3. update your alerting to use the memory.available_in_bytes as defined above
4. eviction runs ~10s
5. skew w/ cAdvisor could cause stats to be stale by ~10s depending on skew
6. ideally, we get new rpms that contain https://github.com/openshift/ose/pull/648 to help us know if eviction is missing alerting windows based on its logged observations on the system.

Comment 21 Derek Carr 2017-03-09 23:47:58 UTC
update script to handle lt 0 case.

#!/bin/bash
#!/usr/bin/env bash

# This script reproduces what kubelet/cAdvisor does
# to calculate memory.available relative to root cgroup.
# The major change is that it excludes total_inactive_file memory.

# current memory usage
memory_capacity_in_kb=$(cat /proc/meminfo | grep MemTotal | awk '{print $2}')
memory_capacity_in_bytes=$((memory_capacity_in_kb * 1024))
memory_usage_in_bytes=$(cat /sys/fs/cgroup/memory/memory.usage_in_bytes)
memory_total_inactive_file=$(cat /sys/fs/cgroup/memory/memory.stat | grep total_inactive_file | awk '{print $2}')

memory_working_set=$memory_usage_in_bytes
if [ "$memory_working_set" -lt "$memory_total_inactive_file" ];
then
    memory_working_set=0
else
    memory_working_set=$((memory_usage_in_bytes - memory_total_inactive_file))
fi

memory_available_in_bytes=$((memory_capacity_in_bytes - memory_working_set))
memory_available_in_kb=$((memory_available_in_bytes / 1024))
memory_available_in_mb=$((memory_available_in_kb / 1024))

echo "memory.capacity_in_bytes $memory_capacity_in_bytes"
echo "memory.usage_in_bytes $memory_usage_in_bytes"
echo "memory.total_inactive_file $memory_total_inactive_file"
echo "memory.working_set $memory_working_set"
echo "memory.available_in_bytes $memory_available_in_bytes"
echo "memory.available_in_kb $memory_available_in_kb"
echo "memory.available_in_mb $memory_available_in_mb"

Comment 22 Marek Mahut 2017-03-10 15:18:54 UTC
(In reply to Derek Carr from comment #20)
(...)
> I would like to do the following:
> 
> 1. know the eviction configuration

  system-reserved:
  - cpu=500m,memory=1Gi
  eviction-soft:
  - memory.available<1Gi
  eviction-soft-grace-period:
  - memory.available=1m

> 2. set a --hard-threshold that will trigger before your alerts

I can set the hard threshold to 1G and soft to 2G. Ok?

> 3. update your alerting to use the memory.available_in_bytes as defined above

Will investigate how to do this, as it looks like zabbix reports only memory including inactive.

I run your script on one of the node with low memory and high load (into 25+) and it reported memory.available_in_mb as 1452, but we have still seen performance impact.

Comment 37 Jhon Honce 2017-03-20 19:08:15 UTC
Created attachment 1264831 [details]
Script to dump memory profiles

Requires socat and curl to be installed.

Optionally will use: jq, go tools

Comment 40 Jhon Honce 2017-06-14 15:38:42 UTC
Can this be reproduced with the json logger vs. journald?

Comment 41 Eric Paris 2017-06-14 15:51:15 UTC
Production clusters are unable to change the log driver. json file is the log driver of choice of operations for the forseeable future. Any problems found with the json file logger must be resolved.

Comment 42 Jhon Honce 2017-06-14 16:20:41 UTC
Please provide a core file from the docker daemon during an event.  Thanks

Comment 45 Lisa Seelye 2019-01-16 16:34:57 UTC
Hi folks,

This has come up with a ticket that my team is working on and we'd like to ask the status of this issue. Is it still outstanding?

-lisa

Comment 49 Red Hat Bugzilla 2023-09-14 03:53:41 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.