Bug 1376822

Summary: Write performance of hawkular services running in docker container is 2 times worst compared to VM
Product: [JBoss] Middleware Manager Reporter: Filip Brychta <fbrychta>
Component: OtherAssignee: Paul Gier <pgier>
Status: CLOSED WONTFIX QA Contact: Filip Brychta <fbrychta>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0.0 TP2CC: dajohnso, hrupp, jfrey, jhardy, mfoley, mmahoney, obarenbo, pgier
Target Milestone: ---Keywords: Documentation, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: hawkular
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-04 15:36:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Filip Brychta 2016-09-16 13:55:29 UTC
Description of problem:
Setup 1:
VM with 2 CPUs and 4 GB of memory running cassandra and hawkular services directly

Setup 2:
VM with 2 CPUs and 4 GB of memory running cassandra and hawkular services docker containers

Write performance of setup 1 is 2 times better than setup 2.


Version-Release number of selected component (if applicable):
Hawkular services DR01

How reproducible:
Always

Steps to Reproduce:
1. create 2 same VMs
2. start cassandra and hawkular services on VM1 (directly using upstream zip build)
3. start cassandra and hawkular services docker containers on VM2
4. run write performance tests against each VM

Actual results:
Throughput on VM1 is 2 times better than on VM2

Expected results:
It is expected that docker containers bring some overheads but it should be investigated if such a big difference is acceptable.

Additional info:
iostat shows that both VMs are using 100% of CPU but VM2 is spending more time on system operations

VM1:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          81.84    7.95    8.25    1.25    0.15    0.55

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.90    0.00    5.59     0.00  2080.72   743.86     2.85  508.71    0.00  508.71  23.48  13.14
dm-0              0.00     0.00    0.00    6.19     0.00  2080.72   671.87     2.96  477.47    0.00  477.47  21.21  13.14
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00



VM2:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          68.93    6.12   21.97    0.24    2.50    0.24

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda              44.80     4.90    6.50    6.40   205.20  1378.80   245.58     0.82   63.54    4.51  123.50   9.43  12.17
dm-0              0.00     0.00    0.10   10.50     0.40  1378.80   260.23     0.94   88.28  169.00   87.51  10.37  10.99
dm-1              0.00     0.00   51.20    0.00   204.80     0.00     8.00     0.10    1.91    1.91    0.00   0.23   1.18
dm-2              0.00     0.00    0.30    0.10     1.20     0.40     8.00     0.02   42.25   56.33    0.00  42.25   1.69
dm-3              0.00     0.00    0.30    0.10     1.20     0.40     8.00     0.02   44.00   58.67    0.00  44.00   1.76
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Comment 3 Paul Gier 2016-09-29 20:38:29 UTC
I came across some info related to the poor performance using the default docker network config which uses a virtual bridge (docker0) to route requests between containers (https://github.com/docker/docker/issues/7857).

When starting the containers, can you try setting them to connect directly to the host network (--net=host) instead of the default settings?

docker run --name hawkular-cassandra --net=host -d -e CASSANDRA_START_RPC=true brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/jboss/cassandra:latest

docker run -d --net=host -e CASSANDRA_NODES=localhost -e HAWKULAR_BACKEND=remote -e DB_TIMEOUT=20 -p 8080:8080 -p 8443:8443 -p 9990:9990 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/hawkular/hawkular-services:latest

Also note that when using the host network directly, docker will not automatically update firewall config, so you may need to manually open port 8080 to connect to hawkular-services from a remote host.

firewall-cmd --zone=public --add-port=8080/tcp

If necessary, you can later undo the firewall change using this:

firewall-cmd --zone=public --remove-port=8080/tcp

Comment 4 Filip Brychta 2016-10-03 16:40:39 UTC
Retested with --net=host without any significant performance improvement.

Also retested on more powerful VMs with 4 CPUs.
Performance difference (container vs. direct zip installation) in this case was only 25%

Comment 5 Mike Foley 2016-10-06 15:30:22 UTC
Next action item is for QE to baseline write disk i/o  on VM and container ...to determine if disk i/o not involving HawkularServices is fundamentally different in and out of a containr.  

Results will be documented here.  Will be discussed at the next performance call.  And next action item / iteration determined.

Comment 6 Mike Foley 2016-10-10 15:01:44 UTC
adding "documentation" keyword.  the output of this bugzilla will likely need to be communicated in some customer-facing documentation or guidance.

Comment 7 Heiko W. Rupp 2016-10-10 15:18:28 UTC
"
It's using the loop device.  please use the docker thin pool according to these docs:
https://access.redhat.com/documentation/en/red-hat-enterprise-linux-atomic-host/7/paged/getting-started-with-containers/chapter-7-managing-storage-with-docker-formatted-containers

There's even a warning in the docker info output.
"