Bug 1718230

Summary: Severe performance problem of podman commands under I/O load
Product: Red Hat Enterprise Linux 8 Reporter: Damien Ciabrini <dciabrin>
Component: podmanAssignee: Brent Baude <bbaude>
Status: CLOSED DUPLICATE QA Contact: atomic-bugs <atomic-bugs>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 8.0CC: bbaude, dornelas, dwalsh, jeckersb, jligon, jnovy, lmiccini, lsm5, mcornea, mheon, michele, mlammon, mschuppe, sasha
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-01 16:10:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Damien Ciabrini 2019-06-07 10:52:22 UTC
Description of problem:
Environment: reproduced on VM, bare metal w/regular SATA disk 7200 RPM and baremetal w/SSD drives

When we run a podman inspect command on an environment which is I/O loaded, the podman command does not take the usual couple time to finish, but instead it can take an enourmous amount of time, sometimes minutes (see the data and the reproducer at the end of this BZ). In comparison docker maintained reasonable performance under I/O load.

This is affecting OSP particularly because the entire architecture is based on
 . a) paunch being able to read container's labels via the inspect commands. this is used to track the service deployment and the service restart on update.
 . b) currently some container healthchecks rely on inspect because currently podman exec cannot return a specific error code to distinguish whether a container is stopped or a container doesn't exist 

Full reproducer script can be found here: 
https://gitlab.cee.redhat.com/osp15/rhel8podmanperf/blob/master/repro.sh

Perf data (all tests done rhel-8 GA, from freshly installed machines Numbers seem to be fairly repeatable so far):
1) HP Workstation [OCZ-VERTEX4 ssd disk]
1.a) No I/O load
docker 100 exec: 15 s
podman 100 exec: 11 s
docker 100 inspect: 7 s
podman 100 inspect: 12 s

1.b) I/O Load with a dd in the background
docker 100 exec: 15 s
podman 100 exec: 35 s
docker 100 inspect: 7 s
podman 100 inspect: 408 s

2) VM (virtio disks) running on top of rhel7 hypervisor (w/sata disks)
2.a) No I/O load
docker 100 exec: 12 s
podman 100 exec: 7 s
docker 100 inspect: 5 s
podman 100 inspect: 10 s

2.b) I/O Load with a dd in the background
docker 100 exec: 49 s
podman 100 exec: 14 s
docker 100 inspect: 8 s
podman 100 inspect: 1658 s

3) RHEL8 Baremetal Dell w/ SATA disks
3.a) No I/O Load
docker 100 exec: 14 s
podman 100 exec: 14 s
docker 100 for inspect: 3 s
podman 100 for inspect: 16 s

3.b) I/O Load with a dd in the background
docker  100 for exec: 16 s
podman 100 for exec: 42 s
docker 100 for inspect: 4 s
podman 100 for inspect : 7828 s

Version-Release number of selected component (if applicable):
podman-1.0.0-2.git921f98f.module+el8+2785+ff8a053f.x86_64

How reproducible:
Always

Steps to Reproduce:
1. start a podman container
2. runs a IO load like e.g. what's specified in https://gitlab.cee.redhat.com/osp15/rhel8podmanperf/blob/master/repro.sh
3. run podman inspect on the container

Actual results:
The podman inspect command is taking minutes to finish when there's some IO
The podman exec command does regress in performance when there's some IO

Expected results:
The podman inspect command finishes in a couple of seconds
The podman exec command does not regress in performance (NB: the 2.b) seems to be an exception here, but it is likely driven by the nosafe virtio cache of the VM. We never saw that behaviour on real baremetal hardware)

Additional info:

Comment 6 Derrick Ornelas 2019-07-01 16:10:50 UTC

*** This bug has been marked as a duplicate of bug 1723879 ***