Hide Forgot
+++ This bug was initially created as a clone of Bug #1894146 +++ Description of problem: Memory consumption by metal3 is pretty high, at least when we compare it to some other things in the deployment. The metal3 pod as a whole uses 880+MiB, of which the ironic-api container is using 530MiB. Inside the container, I see # ps -o pid,user,%mem,rss,command -ax PID USER %MEM RSS COMMAND 1 root 0.5 86988 /usr/bin/python3 /usr/bin/ironic-api --config-file /usr/share/ironic/ironic-dist.conf --config-file /etc/ironic/ironic.conf 40 root 0.5 92804 /usr/bin/python3 /usr/bin/ironic-api --config-file /usr/share/ironic/ironic-dist.conf --config-file /etc/ironic/ironic.conf 41 root 0.5 91944 /usr/bin/python3 /usr/bin/ironic-api --config-file /usr/share/ironic/ironic-dist.conf --config-file /etc/ironic/ironic.conf 42 root 0.5 92768 /usr/bin/python3 /usr/bin/ironic-api --config-file /usr/share/ironic/ironic-dist.conf --config-file /etc/ironic/ironic.conf 43 root 0.5 92072 /usr/bin/python3 /usr/bin/ironic-api --config-file /usr/share/ironic/ironic-dist.conf --config-file /etc/ironic/ironic.conf 44 root 0.5 96096 /usr/bin/python3 /usr/bin/ironic-api --config-file /usr/share/ironic/ironic-dist.conf --config-file /etc/ironic/ironic.conf 45 root 0.5 92764 /usr/bin/python3 /usr/bin/ironic-api --config-file /usr/share/ironic/ironic-dist.conf --config-file /etc/ironic/ironic.conf 46 root 0.5 92244 /usr/bin/python3 /usr/bin/ironic-api --config-file /usr/share/ironic/ironic-dist.conf --config-file /etc/ironic/ironic.conf 47 root 0.5 91452 /usr/bin/python3 /usr/bin/ironic-api --config-file /usr/share/ironic/ironic-dist.conf --config-file /etc/ironic/ironic.conf 61 root 0.0 3716 bash 109 root 0.0 3784 ps -o pid,user,%mem,rss,command -ax So we have 9 API server processes (1 main process and 8 copies managed by oslo.service, as configured in our ironic.conf) each of which is around 90MiB. If we think about what's going to be talking to the server, it would be at most 3 calls at a time from the baremetal-operator (one per reconcile thread) and then the agents during provisioning. Version-Release number of selected component (if applicable): 4.6+ How reproducible: Always Steps to Reproduce: 1. oc exec -it ${name-of-metal3-pod} -- bash 2. ps -o pid,user,%mem,rss,command -ax Actual results: 9 processes Expected results: Fewer than 9 processes Additional info: --- Additional comment from Derek Higgins on 2020-11-03 17:22:39 UTC --- (In reply to Doug Hellmann from comment #0) > So we have 9 API server processes (1 main process and 8 copies managed by > oslo.service, as configured in our ironic.conf) each of which is around > 90MiB. > Looks like this is set based on the number of processors configure-ironic.sh:export NUMWORKERS=$(( NUMPROC < 12 ? NUMPROC : 12 )) ironic.conf.j2:api_workers = {{ env.NUMWORKERS }} if that is the best behaviour or not is another matter --- Additional comment from Zane Bitter on 2020-11-03 18:06:31 UTC --- Each API worker has a thread pool that can process up to 100 requests simultaneously (though not necessarily performantly!), plus it will queue up to 128 further requests before accept()ing them. AFAIK even on OpenStack underclouds we only configure half as many worker threads as CPUs, and the only reason we have so many workers in general is to make sure that CPU-bound parts of the installation that are dependent on a single service don't get slowed down. For metal³, the bottleneck is ironic-conductor - we only have one of those. Work is in progress to limit it to provisioning 20 nodes at a time by default. ironic-api responds to requests from both the baremetal-operator (max 3 at one time) and from IPA running on any non-provisioned nodes. Nonetheless, I'd be surprised if we needed more than one worker to avoid ironic-api being a bottleneck at current scales. Knowing when and how to scale in future (e.g. when the ironic-conductor is deployed and scaled separately) is more challenging. ironic-api listens on a fixed IP (the provisioning VIP) and port using host networking, and has to work before Ingresses are available, so we can't scale out in a traditional k8s way. --- Additional comment from Doug Hellmann on 2020-11-03 22:10:42 UTC --- (In reply to Derek Higgins from comment #1) > (In reply to Doug Hellmann from comment #0) > > So we have 9 API server processes (1 main process and 8 copies managed by > > oslo.service, as configured in our ironic.conf) each of which is around > > 90MiB. > > > > Looks like this is set based on the number of processors > > configure-ironic.sh:export NUMWORKERS=$(( NUMPROC < 12 ? NUMPROC : 12 )) > ironic.conf.j2:api_workers = {{ env.NUMWORKERS }} > > if that is the best behaviour or not is another matter Yeah, good point. I checked this on a dev-scripts deployment. I suppose that means on real hardware we're likely to be running at 12 instead of 8, so consumption would be even worse. --- Additional comment from Dmitry Tantsur on 2020-11-12 16:39:24 UTC --- Do we need to clone this for 4.6? I assume yes. --- Additional comment from Doug Hellmann on 2020-11-12 16:54:28 UTC --- (In reply to Dmitry Tantsur from comment #4) > Do we need to clone this for 4.6? I assume yes. Yes, let's do that.
Verified on 4.6.0-0.nightly-2021-01-28-083619 [kni@provisionhost-0-0 ~]$ oc exec -it metal3-86c9d47458-55tpg -c metal3-ironic-api -- bash [root@master-0-2 /]# ps -o pid,user,%mem,rss,command -ax PID USER %MEM RSS COMMAND 1 root 0.2 90956 /usr/bin/python3 /usr/bin/ironic-api --config-file /usr/share/ironic/ironic-dist.conf --config-file /etc/ironic/ironic.conf 31 root 0.2 91196 /usr/bin/python3 /usr/bin/ironic-api --config-file /usr/share/ironic/ironic-dist.conf --config-file /etc/ironic/ironic.conf 32 root 0.2 91208 /usr/bin/python3 /usr/bin/ironic-api --config-file /usr/share/ironic/ironic-dist.conf --config-file /etc/ironic/ironic.conf 33 root 0.2 93308 /usr/bin/python3 /usr/bin/ironic-api --config-file /usr/share/ironic/ironic-dist.conf --config-file /etc/ironic/ironic.conf 34 root 0.2 89480 /usr/bin/python3 /usr/bin/ironic-api --config-file /usr/share/ironic/ironic-dist.conf --config-file /etc/ironic/ironic.conf 35 root 0.0 3772 bash 56 root 0.0 3800 ps -o pid,user,%mem,rss,command -ax
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.6.16 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:0308