Bug 1980437
Summary: | High memory usage by crio.service in worker nodes | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Alexey Kazakov <alkazako> |
Component: | Release | Assignee: | Peter Hunt <pehunt> |
Status: | CLOSED DUPLICATE | QA Contact: | Sunil Choudhary <schoudha> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.7 | CC: | aos-bugs, dawilson, dmunneor, dofinn, jokerman, kgordeev, mharri, mhofmann, mpatel, mrobson, pehunt, sgrunert, ssonigra, wking |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-09-01 16:18:26 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Alexey Kazakov
2021-07-08 16:31:53 UTC
I am also interested in the metrics for the number of bytes pulled for image pulls as time goes on. Specifically, it'd be interesteing to see how `container_memory_working_set_bytes` correlates with the `crio_image_pulls_by_name` metrics are you able to get that information for me? Created attachment 1799755 [details]
container_memory_working_set_bytes{id=~"/system.slice/.*",node="ip-10-0-228-21.us-east-2.compute.internal"}
I've added the screenshot from Prometheus for:
container_memory_working_set_bytes{id=~"/system.slice/.*",node="ip-10-0-228-21.us-east-2.compute.internal"}
I don't see any crio_image_pulls_by_name metric.. I must be doing something wrong. Can you assist me with this please?
ah the metric is actually named `container_runtime_crio_image_pulls_by_name` we may not be scraping them by prom, as I cannot see them in the dashboard of a node I have up. If not, you can ssh to the node and grab the value now by doing: `curl localhost:9537/metrics` I think it is pretty likely it's due to multiple concurrent pulls bumping up cri-o's rss. Note the following script: ``` grep _digest{ /tmp/mozilla_pehunt0/crio-metrics.out | grep -o size=\".*\" | sed 's/\"/ /g' | awk '{ printf $2"\n" }' > /tmp/bytes paste -sd+ /tmp/bytes | bc ``` I get 385648780939 (385 GB) This number is the sum of the size of all images that node has pulled. We have a couple of options: - periodically force go GC (which may be inefficient) - set the GC threshold to be lower so go GCs faster - set a number of concurrent pulls to mitigate the amount of concurrent memory cri-o needs - bump the system reserved for nodes that we expect tons of image pulls The last option "bump the system reserved for nodes" is what we could do right now to stabilize our nodes, right? We have a ticket for the OSD sre folks for that: https://issues.redhat.com/browse/OHSS-5120 Do you think setting it to 4GB would be reasonable (instead of the current 1GB)? Our nodes are open to our users and we have tons of users starting different pods. So, yes, there can be big number of concurrent pulls. it's hard to tell what's "reasonable" as it's hard to tell the max concurrent pulls (and how much memory they'd consume). I would hope we'd stay under it, but it is hard to tell given how much it's ballooned already. I do believe we'll want some sort of cri-o fix as well. Created attachment 1799782 [details]
System memory usage for the last two weeks
I'm attaching the system memory usage for this node for the last two weeks. You can see how it was growing.
And sure. We would love to get something fixed on the crio side but meanwhile we need to stabilize our nodes ASAP since they seem to be choking for a week already.
So, I'm thinking about increasing the reservation and maybe re-starting the nodes so it provides some temporal relieve while we are waiting for more robust solution on the crio/OCP end.
Any thoughts?
that sounds great. If you could also periodically check the total image pull size (as shown above) along with the crio rss to get a correlation, I'd love to have that information. fyi, you should be able to just restart crio and it'll relinquish the hoarded rss. full node reboot shouldn't be necessary How do I restart crio? Just "systemctl start crio"? Or should I take care of something else? "systemctl restart crio" but yes essentially :) Created attachment 1799788 [details]
crio working set after restart
Created attachment 1799789 [details]
crio rss after restart
Just for the record. The system memory usage significantly dropped after restarting crio (see attached screenshots above) but we still see network issues. At least in pods in this node :((( Another thing to try is force golang GC by sending SIGUSR2 to crio process. I am interested to see how much that helps. Check rss before and after grep -i rss /proc/$(pidof crio)/status kill -USR2 $(pidof crio) grep -i rss /proc/$(pidof crio)/status We just updated our clusters to 4.7.19, so we will have to wait for awhile until the crio memory usage builds up again. Then we will try the SIGUSR2 signal. > I think it is pretty likely it's due to multiple concurrent pulls bumping up cri-o's rss.
Does CRI-O hold image layers in memory? I would have expected it to stream them from the network right onto the disk, while teeing the bytes into a hasher to confirm that we got the expected digest.
I think we can consider this as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2000092, feel free to reopen if the issue is not resolved. *** This bug has been marked as a duplicate of bug 2000092 *** The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |