DescriptionDamien Ciabrini
2019-06-07 10:52:22 UTC
Description of problem:
Environment: reproduced on VM, bare metal w/regular SATA disk 7200 RPM and baremetal w/SSD drives
When we run a podman inspect command on an environment which is I/O loaded, the podman command does not take the usual couple time to finish, but instead it can take an enourmous amount of time, sometimes minutes (see the data and the reproducer at the end of this BZ). In comparison docker maintained reasonable performance under I/O load.
This is affecting OSP particularly because the entire architecture is based on
. a) paunch being able to read container's labels via the inspect commands. this is used to track the service deployment and the service restart on update.
. b) currently some container healthchecks rely on inspect because currently podman exec cannot return a specific error code to distinguish whether a container is stopped or a container doesn't exist
Full reproducer script can be found here:
https://gitlab.cee.redhat.com/osp15/rhel8podmanperf/blob/master/repro.sh
Perf data (all tests done rhel-8 GA, from freshly installed machines Numbers seem to be fairly repeatable so far):
1) HP Workstation [OCZ-VERTEX4 ssd disk]
1.a) No I/O load
docker 100 exec: 15 s
podman 100 exec: 11 s
docker 100 inspect: 7 s
podman 100 inspect: 12 s
1.b) I/O Load with a dd in the background
docker 100 exec: 15 s
podman 100 exec: 35 s
docker 100 inspect: 7 s
podman 100 inspect: 408 s
2) VM (virtio disks) running on top of rhel7 hypervisor (w/sata disks)
2.a) No I/O load
docker 100 exec: 12 s
podman 100 exec: 7 s
docker 100 inspect: 5 s
podman 100 inspect: 10 s
2.b) I/O Load with a dd in the background
docker 100 exec: 49 s
podman 100 exec: 14 s
docker 100 inspect: 8 s
podman 100 inspect: 1658 s
3) RHEL8 Baremetal Dell w/ SATA disks
3.a) No I/O Load
docker 100 exec: 14 s
podman 100 exec: 14 s
docker 100 for inspect: 3 s
podman 100 for inspect: 16 s
3.b) I/O Load with a dd in the background
docker 100 for exec: 16 s
podman 100 for exec: 42 s
docker 100 for inspect: 4 s
podman 100 for inspect : 7828 s
Version-Release number of selected component (if applicable):
podman-1.0.0-2.git921f98f.module+el8+2785+ff8a053f.x86_64
How reproducible:
Always
Steps to Reproduce:
1. start a podman container
2. runs a IO load like e.g. what's specified in https://gitlab.cee.redhat.com/osp15/rhel8podmanperf/blob/master/repro.sh
3. run podman inspect on the container
Actual results:
The podman inspect command is taking minutes to finish when there's some IO
The podman exec command does regress in performance when there's some IO
Expected results:
The podman inspect command finishes in a couple of seconds
The podman exec command does not regress in performance (NB: the 2.b) seems to be an exception here, but it is likely driven by the nosafe virtio cache of the VM. We never saw that behaviour on real baremetal hardware)
Additional info: