Bug 2118774

Summary: quadratic startup behavior with many mounts in /var/lib/containers
Product: Red Hat Enterprise Linux 8 Reporter: Colin Walters <walters>
Component: rpm-ostreeAssignee: Colin Walters <walters>
Status: CLOSED ERRATA QA Contact: HuijingHei <hhei>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.7CC: aaradhak, hhei, mnguyen, rhcos-sst, travier, wking
Target Milestone: rcKeywords: Regression, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rpm-ostree-2022.10.90.g4abaf4b4-4.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-08 09:49:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Colin Walters 2022-08-16 18:04:52 UTC
This tracks applying https://github.com/coreos/rpm-ostree/pull/3941

xref https://bugzilla.redhat.com/show_bug.cgi?id=2111817

The way systemd implements InaccessiblePaths involves
recursively traversing all mounted paths in the target root, and
unmounting them; this takes about a tenth of a second in the kernel.
Further, after each unmount, it also re-parses
/proc/self/mountinfo. The combination of these two things
makes handling mounts there O(N²) in userspace CPU time.

This impacts OCP upgrades.

Comment 1 Colin Walters 2022-08-16 19:57:12 UTC
To verify: 

$ systemctl show rpm-ostreed | grep -i inaccessible

should show no results.

But a stronger verification is to set up a ton of containers:

for x in $(seq 200); do podman run -d --rm registry.access.redhat.com/ubi8/ubi:latest sleep infinity; done

Then compare the service restart time, xref https://github.com/openshift/machine-config-operator/pull/3291#issuecomment-1216984687

Comment 2 HuijingHei 2022-08-17 03:41:45 UTC
Verify passed with rpm-ostree-2022.10.90.g4abaf4b4-4.el8.x86_64.

Test with rpm-ostree-2022.10.90.g4abaf4b4-4.el8.x86_64 (set `systemctl set-property rpm-ostreed.service CPUAccounting=yes`), then set up 200 containers, get rpm-ostreed.service restart time is `Consumed 348ms CPU time`, which is much faster than old rpm-ostree `Consumed 5.208s CPU time`

Steps:
1) Start rhcos-412.86.202208161753-0-qemu.x86_64.qcow2 with 4G memory, test with old rpm-ostree, set up 200 containers, get rpm-ostreed.service restart time is `Consumed 5.208s CPU time`

$ cosa run --qemu-image rhcos-412.86.202208161753-0-qemu.x86_64.qcow2 -m 4096
# rpm -q rpm-ostree
rpm-ostree-2022.10.86.gd8f0c67a-3.el8.x86_64

# systemctl show rpm-ostreed | grep -i inaccessible
InaccessiblePaths=-/var/lib/containers

# systemctl set-property rpm-ostreed.service CPUAccounting=yes
# systemctl show rpm-ostreed | grep CPUAccounting=
CPUAccounting=yes
# systemctl daemon-reload

[root@cosa-devsh ~]# systemctl restart rpm-ostreed.service && systemctl stop rpm-ostreed && systemctl status rpm-ostreed
Aug 17 02:52:47 cosa-devsh systemd[1]: rpm-ostreed.service: Consumed 66ms CPU time

# for x in $(seq 200); do podman run -d --rm registry.access.redhat.com/ubi8/ubi:latest sleep infinity; done


[root@cosa-devsh ~]# systemctl restart rpm-ostreed.service && systemctl stop rpm-ostreed && systemctl status rpm-ostreed
Aug 17 03:10:57 cosa-devsh systemd[1]: rpm-ostreed.service: Consumed 5.208s CPU time

=====================================================================================
2) Upgrade rpm-ostree to rpm-ostree-2022.10.90.g4abaf4b4-4.el8.x86_64, set up 200 containers, get rpm-ostreed.service restart time is `Consumed 348ms CPU time` 

$ cosa run --qemu-image rhcos-412.86.202208161753-0-qemu.x86_64.qcow2 -m 4096
# rpm-ostree override replace http://download.eng.bos.redhat.com/brewroot/vol/rhel-8/packages/rpm-ostree/2022.10.90.g4abaf4b4/4.el8/x86_64/rpm-ostree-2022.10.90.g4abaf4b4-4.el8.x86_64.rpm http://download.eng.bos.redhat.com/brewroot/vol/rhel-8/packages/rpm-ostree/2022.10.90.g4abaf4b4/4.el8/x86_64/rpm-ostree-libs-2022.10.90.g4abaf4b4-4.el8.x86_64.rpm
# systemctl reboot

[root@cosa-devsh ~]# rpm -q rpm-ostree
rpm-ostree-2022.10.90.g4abaf4b4-4.el8.x86_64
[root@cosa-devsh ~]# systemctl show rpm-ostreed | grep -i inaccessible

# systemctl set-property rpm-ostreed.service CPUAccounting=yes
# systemctl show rpm-ostreed | grep CPUAccounting=
CPUAccounting=yes
# systemctl daemon-reload

[root@cosa-devsh ~]# systemctl restart rpm-ostreed.service && systemctl stop rpm-ostreed && systemctl status rpm-ostreed
Aug 17 03:22:09 cosa-devsh systemd[1]: rpm-ostreed.service: Consumed 63ms CPU time

# for x in $(seq 200); do podman run -d --rm registry.access.redhat.com/ubi8/ubi:latest sleep infinity; done

[root@cosa-devsh ~]# systemctl restart rpm-ostreed.service && systemctl stop rpm-ostreed && systemctl status rpm-ostreed
Aug 17 03:25:46 cosa-devsh systemd[1]: rpm-ostreed.service: Consumed 348ms CPU time

Comment 6 HuijingHei 2022-09-01 01:51:40 UTC
Verify passed with rpm-ostree-2022.10.90.g4abaf4b4-4.el8.x86_64.

Upgrade rpm-ostree to fixed version, set up 200 containers, get rpm-ostreed.service restart time is `Consumed 394ms CPU time` which is expected

$ cosa run --qemu-image rhcos-412.86.202208311537-0-qemu.x86_64.qcow2 -m 4096
# rpm-ostree override replace rpm-ostree-2022.10.90.g4abaf4b4-4.el8.x86_64.rpm rpm-ostree-libs-2022.10.90.g4abaf4b4-4.el8.x86_64.rpm
# systemctl reboot

[root@cosa-devsh ~]# rpm -q rpm-ostree
rpm-ostree-2022.10.90.g4abaf4b4-4.el8.x86_64
[root@cosa-devsh ~]# systemctl show rpm-ostreed | grep -i inaccessible

[core@cosa-devsh ~]# systemctl show rpm-ostreed | grep CPUAccounting=
CPUAccounting=no
# systemctl set-property rpm-ostreed.service CPUAccounting=yes
# systemctl show rpm-ostreed | grep CPUAccounting=
CPUAccounting=yes
# systemctl daemon-reload

[root@cosa-devsh ~]# systemctl restart rpm-ostreed.service && systemctl stop rpm-ostreed && systemctl status rpm-ostreed
Sep 01 01:41:54 cosa-devsh systemd[1]: rpm-ostreed.service: Consumed 69ms CPU time

# for x in $(seq 200); do podman run -d --rm registry.access.redhat.com/ubi8/ubi:latest sleep infinity; done

[root@cosa-devsh ~]# systemctl restart rpm-ostreed.service && systemctl stop rpm-ostreed && systemctl status rpm-ostreed
Sep 01 01:44:41 cosa-devsh systemd[1]: rpm-ostreed.service: Consumed 394ms CPU time

Comment 8 Colin Walters 2022-09-01 14:16:25 UTC
Wait do we really need an exception for this?  Isn't it already queued to ship?  We just missed adding the BZ association in the errata (because we forgot to manually flip the bug into modified).

Comment 9 HuijingHei 2022-09-02 02:57:36 UTC
(In reply to Colin Walters from comment #8)
> Wait do we really need an exception for this?  Isn't it already queued to
> ship?  We just missed adding the BZ association in the errata (because we
> forgot to manually flip the bug into modified).

Not quite sure about this, @Michael, could you help to confirm?

Comment 10 Michael Nguyen 2022-09-02 12:53:18 UTC
It missed ITM 26. Anything ITM 27+ needs to go through the exception process.

======= Impact Statement ======= 

What is the benefit of making this change after the deadline? What is the impact on customer satisfaction, and on the business?

     This will improved the performance of updates on OCP customers.

What is the risk to the release schedule, quality, and the impact of diverting resources from other efforts? Will there be enough time to do the necessary large-scale, cross-function, regression, stress, or fault-insertion testing that may be required to verify this change? 

     This risk is low.  The fix is already in the build and has been pre-verified and verified by QE.  The BZ was not moved to MODIFIED to get picked up by the errata automation.
============================= 

In addition to the impact statement above, check the BZ to confirm the following: 

1. The Internal Target Release (ITR) field reflects the correct release.
2. Set the Internal Target Milestone (ITM) field to indicate by when the work will be done. The approval for this exception will expire on that milestone. 
3. Ensure qa_ack+ and devel_ack+ are set, and that those acks are not leftover from before the exception request.
4. Prepare a RHEL rpm scratch build and have this change validated on the latest RHEL milestone compose by someone other than the developer. A comment must be added to the Bugzilla indicating the validation is successful and there were no observed regressions.

5. Provide reasoning why this request is being solved after regular DTD cycle. This will help us to assess & improve the exception process. 

     The BZ was not moved to MODIFIED to get picked up by the errata automation.

All of these steps must be complete before this change is reviewed as an exception.

Comment 16 errata-xmlrpc 2022-11-08 09:49:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (rpm-ostree bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7612