Bug 1264971

Summary: docker hangs/poor performance on AWS direct LVM
Product: Red Hat Enterprise Linux 7 Reporter: Matt Woodson <mwoodson>
Component: dockerAssignee: Vivek Goyal <vgoyal>
Status: CLOSED ERRATA QA Contact: atomic-bugs <atomic-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.1CC: amurdaca, dwalsh, ekuric, erich, jeder, jgoulding, lsm5, lsu, twiest, walters
Target Milestone: rcKeywords: Extras
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-02 00:11:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1303130    

Description Matt Woodson 2015-09-21 18:56:36 UTC
Description of problem:

This has shown itself in Openshift V3.

We (Openshift Operations) are running docker on RHEL 7 hosts inside of AWS.  We have multiple clusters and this problem has been showing up over the last few weeks maybe even months.  Docker has extremely poor performance to the point of it not responding to simple docker commands even "docker info".

We have two regular (gp2, general purpose) EBS volumes configured on the hosts.  One for the root file system, one for docker to use as direct lvm.  We configure the /etc/sysconfig/docker-storage-setup and then run docker-storage-setup to configure the EBS volume to be used with docker.

We will notice that docker goes completely unresponsive at times.  When this happens, all docker commands will stop working.  We can't pull, run stuff, or even do a "docker info"  command.  What is strange is that it is possible to come back to the machine after an unknown amount of time (sometimes an hour, sometimes 4 hours) and the machine will be back in a usable state.

While the machine is having these issues, restarting docker and even a reboot of the system does not tend to fix the problem.

One thing we have noticed is that if we change the volume type to the 
AWS EBS Provisioned IOPS, we generally get much better performance.  The provisioned IOPS are guaranteed by AWS to provide much better io performance.  This does however come at a much greater cost.  We hesitate to move to these because:

1. doesn't seem like we should *have* to use provisioned iops in AWS to use docker
2. it's way more expensive
3. if its a bug, we should uncover it.


Version-Release number of selected component (if applicable):

docker-1.7.1-108.el7.x86_64

How reproducible:

This happens on many of the nodes within our clusters.  We see issues, and then things will resolve themselves.

Steps to Reproduce:
1.  Install Openshift V3
2.  Roll out applications
3.  Wait for an undetermined amount of time
 
Actual results:

Docker will go unresponsive

Expected results:

Docker to be responsive

Comment 2 Matt Woodson 2015-09-29 14:21:22 UTC
An update to this ticket and what we believe is happening.

We are running Openshift V3 on these nodes with 32GB EBS volumes from AWS.  These are the GP2 (General Purpose class) EBS volumes.

AWS EBS GP2 volumes work on a credit system and how much IO can be used on the volumes.  MOre can be found here:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html

For the 32 GB drives, we have (3 x 32) 96 IOPS for these volumes.

What we have seen is that Openshift + Docker is exhausting these resources.  With help of Vivek, we have seen that the throughput of these volumes is really low (in the ~10-400k/s rang) when we are having these issues.

We are testing with bigger drives currently to see if we can find a place where w don't exhaust the credits of the volumes.  We have also had better performance with the provisioned iops volumes (guaranteed IO).

Comment 3 Elvir Kuric 2015-10-09 14:50:55 UTC
I also did some aws work with in correlation with docker. I can state above behaviour for case when docker is installed and started ( and on t*.micro instances it will start to use loop lvm by default, as there is not by default storage device docker-storage-setup can use ) and if run intensive I/O test (fio) inside docker container, then docker will hang and it will eventually come back after some time but it is really slow.

simple commands, for case when loop lvm is used ( while running i/o test inside container ) as 

docker images 
docker ps 
docker ps -a 

takes long time, and have very slow response

When used loop lvm ( this is what docker will get on t*.micro ec2 instances) it takes 21 min to save 1.4 GB docker image 

# time docker save r7perf > r7perfedited.tar 
real	21m15.156s
user	0m0.257s
sys	0m1.748s


In case I attach general purpose SSD disk to instance and then use that disk for docker storage backend ( configuring /etc/sysconfig/docker-storage-setup to use this device ) above issues related to docker commands response are not generally visible. I was able to get better response, docker is faster, docker ps|images does not hang, etc. 

docker save for same size image in this case is 

time docker save r7perf > r7perfmodified.tar 

real	10m8.090s
user	0m0.277s
sys	0m1.247s

Comment 5 Daniel Walsh 2015-12-01 19:33:52 UTC
Runcom, have  you been able to look at this?

Comment 6 Antonio Murdaca 2015-12-01 19:55:28 UTC
I'm still investigating it. I'm also seeing better performance on SSD drives and EBS with provisioned IOPS but I'm still struggling finding how to speed up this (becouse AWS is also involved in the sense it limits IOPS for their normal EBS drives)

Comment 7 Colin Walters 2015-12-01 20:39:30 UTC
This seems like it's basically expected...the AWS guidelines say GP EBS is for "development/test" and Provisioned EBS is for "Critical business applications that require sustained IOPS performance".

It's not just that Provisioned is "better" performance - it's that it's *consistent*, or not burst.

Wouldn't this have also affected Online v2?

This gets into an important point in that what we really want for more than 2-3 machines is clustered images.  It doesn't make sense to fully store the complete copy of each image unpacked on every node.  Think something like overlayfs on top of NFS/CephFS/GlusterFS.

Comment 8 Jeremy Eder 2015-12-02 17:33:35 UTC
It's definitely expected...Thomas is trying to take advantage of the burst performance though, since it's much better than provisioned iops.  The "problem" is that once you exhaust credits, docker goes sideways.

Thomas mentioned trying to game a way to monitor credits and shift work around to try and stay within their credit limits.

Colin,

Overlay itself does significantly less IO than device mapper.  It would be awesome if that were somehow usable for the Ops guys.  I think we're blocked on SELinux integration there?

In terms of image placement, agreed -- Glance's caching behavior and policies need to be duplicated in Kube/Docker world.

Comment 9 Colin Walters 2015-12-02 18:23:24 UTC
Doing less IO with overlayfs just delays the performance cliff, it wouldn't actually solve this, right?

I could imagine one way to use GP volumes here would be to reschedule idle pods to nodes with low credits.

Also possibly trashing volumes when they run low on credits and provisioning new ones?  The observation here is we don't actually care about persistence for the image volumes.  The cost then would be redoing the I/O to write the images again.
Which gets back to not using EBS for this, but doing something more NFS like which I think would work much better in a cluster in general.

Comment 10 Jeremy Eder 2015-12-02 18:27:00 UTC
Yes, delays the inevitable :/

From an efficiency standpoint, which is high on the openshift.com hit-list, overlay has that and page cache sharing as it's advantages.

Comment 11 Daniel Walsh 2016-02-22 19:02:00 UTC
Is there an action item here? Other then if you have this problem investigate overlayfs?  We are working to make improvements in standard file systems, but they are a ways a way.

Comment 14 Daniel Walsh 2016-08-19 20:18:02 UTC
Overlayfs support is getting closer. Now available on Rawhide.

Comment 15 Daniel Walsh 2016-10-18 13:24:41 UTC
Overlay is being back ported to RHEL7.  Should be in RHEL7.4 release.  Also Major work is moving forward on read-only containers support for devicemapper.

Comment 16 Eric Rich 2017-02-21 20:44:42 UTC
If I am not mistaken this is still an issue with docker 1.12

https://github.com/docker/docker/issues/28183
https://github.com/docker/docker/issues/25993

Comment 17 Daniel Walsh 2017-06-30 14:49:45 UTC
I am going to mark this as fixed in RHEL7.4.

Comment 19 Luwen Su 2017-07-25 03:46:03 UTC
As overlay/overlay2 is supported with selinux in docker-1.12.6-48.git0fdc778.el7.x86_64, move to verified.

Comment 21 errata-xmlrpc 2017-08-02 00:11:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2344