Bug 1432722

Summary: Improved container support required for OSP
Product: Red Hat Enterprise Linux 7 Reporter: Andrew Beekhof <abeekhof>
Component: pacemakerAssignee: Andrew Beekhof <abeekhof>
Status: CLOSED ERRATA QA Contact: Udi Shkalim <ushkalim>
Severity: high Docs Contact:
Priority: urgent    
Version: 7.3CC: abeekhof, cfeist, cluster-maint, dciabrin, fdinitto, kgaillot, michele, mnovacek, ushkalim
Target Milestone: rcKeywords: TechPreview
Target Release: 7.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pacemaker-1.1.16-10.el7 Doc Type: No Doc Update
Doc Text:
Documenting the pcs syntax for this new feature (Bug 1433016) will be sufficient.
Story Points: ---
Clone Of:
: 1433016 (view as bug list) Environment:
Last Closed: 2017-08-01 17:54:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1433016, 1435481    

Description Andrew Beekhof 2017-03-16 02:45:03 UTC
Description of problem:

The improved support for container in upstream pacemaker is needed for OSP12 in order to manage galera and rabbit as a containerized service.

Comment 2 Ken Gaillot 2017-03-23 22:34:35 UTC
Support for this feature has been merged upstream

Comment 4 Ken Gaillot 2017-03-28 15:19:48 UTC
QA: No special testing will be needed for this BZ, as it will be implied by testing the new pcs syntax in Bug 1433016.

Comment 5 Ken Gaillot 2017-03-30 23:49:10 UTC
QA: For completeness, I will outline a test procedure without pcs here, though testing the pcs syntax will be sufficient to test this bz. A test build is not available yet.

Given the current focus on containers everywhere, I'm sure QA has already started discussions about what host OS + container OS combinations to test generally. For this bz, latest RHEL 7.4 nightly in both is probably a good choice. In my testing, I've been using RHEL 7.3 host and CentOS 7.3 containers (with 7.4 pacemaker build in both).

1. Configure a Pacemaker cluster of at least two cluster nodes (and no Pacemaker Remote nodes). You'll need about 450MB free disk space on each node. 

2. On every node:
2a. Install docker. QA should use whatever RH ships. In my personal testing, I've been using the upstream repo:

# yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
# yum install docker-ce
# systemctl enable --now docker

2b, Pull a base image to work with. In my personal testing, I've been using CentOS 7:

# docker pull centos:centos7

2c. Create some infrastructure for the tests:

# mkdir -p /root/bz1432722 \
  /var/log/pacemaker/bundles/httpd-bundle-{0,1,2} \
  /var/local/containers/httpd-bundle-{0,1,2}
# for i in 0 1 2; do cat >/var/local/containers/httpd-bundle-$i/index.html <<EOF
<html>
<head><title>Bundle test</title></head>
<body>
<h1>httpd-bundle-$i @ $(hostname)</h1>
</body>
</html>
EOF
done

2d. Put a copy of the pacemaker-cli, pacemaker-libs, pacemaker, pacemaker-cluster-libs, and pacemaker-remote RPMs for this BZ into /root/bz1432722. (This will more or less simulate what users who pull a 7.4 GA base image will get. However this is not strictly necessary, and any base image with a version of pacemaker that supports Pacemaker Remote should work.)

2e. Create a Dockerfile for testing (replace centos:centos7 with your base image). Here I'm using apache as an example of a service to be containerized:

# cat >/root/bz1432722/Dockerfile <<EOF
FROM centos:centos7

COPY pacemaker*.rpm ./
RUN yum update -y
RUN yum install -y httpd bind-utils curl lsof wget which
RUN yum install -y ./pacemaker*.rpm resource-agents
CMD rm -f pacemaker*.rpm
EOF

3. On every node, build a custom image. This step should be repeated if during testing you need to switch out the pacemaker packages or change the Dockerfile:

3a. Build the image:
# cd /root/bz1432722
# docker rmi pcmktest:http
# docker build -t pcmktest:http .

3b. If desired, verify that the image was created:

# docker images # output should look something like:
REPOSITORY          TAG                 IMAGE ID            CREATED              SIZE
pcmktest            http                aab04ad64ab0        About a minute ago   412 MB
centos              centos7             98d35105a391        12 days ago          192 MB

3c. At least in my testing, building triggers a docker "waiting for lo to become free" bug. Reboot the node to avoid this possibility.

4. From any one node, start the cluster, and configure a bundle using the test image. Replace the IP address with something appropriate (three sequential IPs need to be available):

# pcs cluster start --all --wait
# cibadmin --modify --allow-create --scope resources -X '<bundle id="httpd-bundle">
  <docker image="pcmktest:http" replicas="3" options="--log-driver=journald" />
  <network ip-range-start="192.168.122.131" host-interface="eth0" host-netmask="24">
    <port-mapping id="httpd-port" port="80"/>
  </network>
  <storage>
    <storage-mapping id="httpd-root"
      source-dir-root="/var/local/containers"
      target-dir="/var/www/html"
      options="rw"/>
    <storage-mapping id="httpd-logs"
      source-dir-root="/var/log/pacemaker/bundles"
      target-dir="/etc/httpd/logs"
      options="rw"/>
  </storage>
  <primitive class="ocf" id="httpd" provider="heartbeat" type="apache"/>
</bundle>'

5. Test away. Three containers should come up, and apache should be reachable at the specified IPs. This feature is tech preview, and not all things you'd expect to do with a regular resource are implemented for bundles yet. But it will be worthwhile to test as much as possible and list what works and what doesn't. You can also modify the bundle configuration to try different values. Follow Bug 1435481 for the documentation; upstream documentation will be available soon as well. The docker instances will be named like httpd-bundle-docker-0, so you can use standard docker commands with that (e.g. docker inspect or docker exec).

Comment 7 Ken Gaillot 2017-05-04 14:38:48 UTC
Known issues to be addressed separately:
- Bug 1447903
- Bug 1447916
- Bug 1447918
- Bug 1447951

Comment 8 Damien Ciabrini 2017-06-21 13:58:48 UTC
Additionally, Michele Baldessari and myself are testing the new features from this build since a month now, so I can say that's it's working as expected for us.

We're following different instructions [1] to deploy a cluster with containerized ocf resources, and this is the result:

[root@rhelz ~]# crm_mon -1
Stack: corosync
Current DC: rhelz (version 1.1.16-11.el7-94ff4df) - partition with quorum
Last updated: Wed Jun 21 09:48:16 2017
Last change: Wed Jun 21 09:29:42 2017 by root via cibadmin on rhelz

4 nodes configured
16 resources configured

Online: [ rhelz ]
GuestOnline: [ galera-bundle-0@rhelz rabbitmq-bundle-0@rhelz redis-bundle-0@rhelz ]

Active resources:

 Docker container: rabbitmq-bundle [192.168.24.1:8787/rhosp12/openstack-rabbitmq-docker:2017-06-19.1]
   rabbitmq-bundle-0    (ocf::heartbeat:rabbitmq-cluster):      Started rhelz
 Docker container: galera-bundle [192.168.24.1:8787/rhosp12/openstack-mariadb-docker:2017-06-19.1]
   galera-bundle-0      (ocf::heartbeat:galera):        Master rhelz
 Docker container: redis-bundle [192.168.24.1:8787/rhosp12/openstack-redis-docker:2017-06-19.1]
   redis-bundle-0       (ocf::heartbeat:redis): Master rhelz
 ip-192.168.122.254     (ocf::heartbeat:IPaddr2):       Started rhelz
 ip-192.168.122.250     (ocf::heartbeat:IPaddr2):       Started rhelz
 ip-192.168.122.249     (ocf::heartbeat:IPaddr2):       Started rhelz
 ip-192.168.122.253     (ocf::heartbeat:IPaddr2):       Started rhelz
 ip-192.168.122.247     (ocf::heartbeat:IPaddr2):       Started rhelz
 ip-192.168.122.248     (ocf::heartbeat:IPaddr2):       Started rhelz
 Docker container: haproxy-bundle [192.168.24.1:8787/rhosp12/openstack-haproxy-docker:2017-06-19.1]
   haproxy-bundle-docker-0      (ocf::heartbeat:docker):        Started rhelz


The resources marked as "Docker container" are containerized ocf resource managed by pacemaker. 


We also verified on multi-node Openstack overclouds that the feature is also working as expected on multi-node clusters.

[1] https://github.com/dciabrin/undercloud_ha_containers

Comment 10 errata-xmlrpc 2017-08-01 17:54:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1862