Description of problem: This has shown itself in Openshift V3. We (Openshift Operations) are running docker on RHEL 7 hosts inside of AWS. We have multiple clusters and this problem has been showing up over the last few weeks maybe even months. Docker has extremely poor performance to the point of it not responding to simple docker commands even "docker info". We have two regular (gp2, general purpose) EBS volumes configured on the hosts. One for the root file system, one for docker to use as direct lvm. We configure the /etc/sysconfig/docker-storage-setup and then run docker-storage-setup to configure the EBS volume to be used with docker. We will notice that docker goes completely unresponsive at times. When this happens, all docker commands will stop working. We can't pull, run stuff, or even do a "docker info" command. What is strange is that it is possible to come back to the machine after an unknown amount of time (sometimes an hour, sometimes 4 hours) and the machine will be back in a usable state. While the machine is having these issues, restarting docker and even a reboot of the system does not tend to fix the problem. One thing we have noticed is that if we change the volume type to the AWS EBS Provisioned IOPS, we generally get much better performance. The provisioned IOPS are guaranteed by AWS to provide much better io performance. This does however come at a much greater cost. We hesitate to move to these because: 1. doesn't seem like we should *have* to use provisioned iops in AWS to use docker 2. it's way more expensive 3. if its a bug, we should uncover it. Version-Release number of selected component (if applicable): docker-1.7.1-108.el7.x86_64 How reproducible: This happens on many of the nodes within our clusters. We see issues, and then things will resolve themselves. Steps to Reproduce: 1. Install Openshift V3 2. Roll out applications 3. Wait for an undetermined amount of time Actual results: Docker will go unresponsive Expected results: Docker to be responsive
An update to this ticket and what we believe is happening. We are running Openshift V3 on these nodes with 32GB EBS volumes from AWS. These are the GP2 (General Purpose class) EBS volumes. AWS EBS GP2 volumes work on a credit system and how much IO can be used on the volumes. MOre can be found here: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html For the 32 GB drives, we have (3 x 32) 96 IOPS for these volumes. What we have seen is that Openshift + Docker is exhausting these resources. With help of Vivek, we have seen that the throughput of these volumes is really low (in the ~10-400k/s rang) when we are having these issues. We are testing with bigger drives currently to see if we can find a place where w don't exhaust the credits of the volumes. We have also had better performance with the provisioned iops volumes (guaranteed IO).
I also did some aws work with in correlation with docker. I can state above behaviour for case when docker is installed and started ( and on t*.micro instances it will start to use loop lvm by default, as there is not by default storage device docker-storage-setup can use ) and if run intensive I/O test (fio) inside docker container, then docker will hang and it will eventually come back after some time but it is really slow. simple commands, for case when loop lvm is used ( while running i/o test inside container ) as docker images docker ps docker ps -a takes long time, and have very slow response When used loop lvm ( this is what docker will get on t*.micro ec2 instances) it takes 21 min to save 1.4 GB docker image # time docker save r7perf > r7perfedited.tar real 21m15.156s user 0m0.257s sys 0m1.748s In case I attach general purpose SSD disk to instance and then use that disk for docker storage backend ( configuring /etc/sysconfig/docker-storage-setup to use this device ) above issues related to docker commands response are not generally visible. I was able to get better response, docker is faster, docker ps|images does not hang, etc. docker save for same size image in this case is time docker save r7perf > r7perfmodified.tar real 10m8.090s user 0m0.277s sys 0m1.247s
Runcom, have you been able to look at this?
I'm still investigating it. I'm also seeing better performance on SSD drives and EBS with provisioned IOPS but I'm still struggling finding how to speed up this (becouse AWS is also involved in the sense it limits IOPS for their normal EBS drives)
This seems like it's basically expected...the AWS guidelines say GP EBS is for "development/test" and Provisioned EBS is for "Critical business applications that require sustained IOPS performance". It's not just that Provisioned is "better" performance - it's that it's *consistent*, or not burst. Wouldn't this have also affected Online v2? This gets into an important point in that what we really want for more than 2-3 machines is clustered images. It doesn't make sense to fully store the complete copy of each image unpacked on every node. Think something like overlayfs on top of NFS/CephFS/GlusterFS.
It's definitely expected...Thomas is trying to take advantage of the burst performance though, since it's much better than provisioned iops. The "problem" is that once you exhaust credits, docker goes sideways. Thomas mentioned trying to game a way to monitor credits and shift work around to try and stay within their credit limits. Colin, Overlay itself does significantly less IO than device mapper. It would be awesome if that were somehow usable for the Ops guys. I think we're blocked on SELinux integration there? In terms of image placement, agreed -- Glance's caching behavior and policies need to be duplicated in Kube/Docker world.
Doing less IO with overlayfs just delays the performance cliff, it wouldn't actually solve this, right? I could imagine one way to use GP volumes here would be to reschedule idle pods to nodes with low credits. Also possibly trashing volumes when they run low on credits and provisioning new ones? The observation here is we don't actually care about persistence for the image volumes. The cost then would be redoing the I/O to write the images again. Which gets back to not using EBS for this, but doing something more NFS like which I think would work much better in a cluster in general.
Yes, delays the inevitable :/ From an efficiency standpoint, which is high on the openshift.com hit-list, overlay has that and page cache sharing as it's advantages.
Is there an action item here? Other then if you have this problem investigate overlayfs? We are working to make improvements in standard file systems, but they are a ways a way.
Overlayfs support is getting closer. Now available on Rawhide.
Overlay is being back ported to RHEL7. Should be in RHEL7.4 release. Also Major work is moving forward on read-only containers support for devicemapper.
If I am not mistaken this is still an issue with docker 1.12 https://github.com/docker/docker/issues/28183 https://github.com/docker/docker/issues/25993
I am going to mark this as fixed in RHEL7.4.
As overlay/overlay2 is supported with selinux in docker-1.12.6-48.git0fdc778.el7.x86_64, move to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2344