Bug 1490905 - openshift-ansible CNS deployment fails when Docker storage is overlay2
Summary: openshift-ansible CNS deployment fails when Docker storage is overlay2
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.6.1
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 3.7.0
Assignee: Scott Dodson
QA Contact: Wenkai Shi
URL:
Whiteboard: aos-scalability-37
: 1493705 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-12 12:54 UTC by Tero Ahonen
Modified: 2017-11-28 22:10 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
If overlay2 storage was used then the device mapper kernel modules may not have been loaded on a host which prevents the gluster storage system from working properly. The installer now ensures that when gluster is used the dm_thin_pool, dm_snapshot, and dm_mirror modules are loaded.
Clone Of:
Environment:
Last Closed: 2017-11-28 22:10:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Files created by cns deployer (4.85 KB, application/zip)
2017-09-12 12:54 UTC, Tero Ahonen
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Tero Ahonen 2017-09-12 12:54:44 UTC
Created attachment 1324893 [details]
Files created by cns deployer

Description of problem:
Installing 3.6 OCP with GlusterFS backend for registry fails. 


Version-Release number of selected component (if applicable):
atomic-openshift-utils-3.6.173.0.21-2.git.0.44a4038.el7.noarch
Kernel: Linux node01 3.10.0-693.2.1.el7.x86_64 #1 SMP Fri Aug 11 04:58:43 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

Docker storage is overlay2 NOT device mapper

All VMs that targeted to run gluster have brick /dev/sdc available


How reproducible:

Following glusterfs related sections in inventory file

[OSEv3:children]
masters
nodes
etcd
glusterfs
glusterfs_registry
...
openshift_hosted_registry_selector='region=infra'
openshift_hosted_registry_replicas=3
openshift_hosted_registry_storage_kind=glusterfs
...
[glusterfs]
svlnxocpa11 glusterfs_ip=172.30.238.124 glusterfs_devices='[ "/dev/sdc"]'
svlnxocpa12 glusterfs_ip=172.30.238.125 glusterfs_devices='[ "/dev/sdc"]'
svlnxocpa13 glusterfs_ip=172.30.238.126 glusterfs_devices='[ "/dev/sdc"]'

[glusterfs_registry]
svlnxocpi11 glusterfs_ip=172.30.238.121 glusterfs_devices='[ "/dev/sdc"]'
svlnxocpi12 glusterfs_ip=172.30.238.122 glusterfs_devices='[ "/dev/sdc"]'
svlnxocpi13 glusterfs_ip=172.30.238.123 glusterfs_devices='[ "/dev/sdc"]'


Steps to Reproduce:
1. install openshift ansible inventory containing glusterfs configuration
2. 
3.

Actual results:
TASK [openshift_storage_glusterfs : Load heketi topology] *****************************************
changed: [svlnxocpm11]
 
TASK [openshift_storage_glusterfs : Create heketi DB volume] **************************************
fatal: [svlnxocpm11]: FAILED! => {
    "changed": true,
    "cmd": [
        "oc",
        "rsh",
        "--namespace=glusterfs",
        "deploy-heketi-storage-1-6r1mb",
        "heketi-cli",
        "-s",
        "http://localhost:8080",
        "--user",
        "admin",
        "--secret",
        "kEccFnOh+tkyM/zKcDSNynXleSUqycmZsX2tcQ4uNnM=",
        "setup-openshift-heketi-storage",
        "--listfile",
        "/tmp/heketi-storage.json"
    ],
    "delta": "0:00:10.669560",
    "end": "2017-09-12 12:04:19.411283",
    "failed": true,
    "rc": 255,
    "start": "2017-09-12 12:04:08.741723"
}
 
STDERR:
 
Error: Unable to execute command on glusterfs-storage-lbwrr:   /usr/sbin/modprobe failed: 1
  thin: Required device-mapper target(s) not detected in your kernel.
  Run `lvcreate --help' for more information.
command terminated with exit code 255



Expected results:
GlusterFS would be installed and registry installed using CNS as persistent storage

Additional info:

Attached file contents were created on masters[0] /tmp/openshift-glusterfs-ansible-hUg7cw  

/tmp/heketi-storage.json file was not present on deployer container, masters[0] or in bastion host from where installation was executed.

Installed OCP without CNS and it works fine.

Description of problem:

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 3 Tero Ahonen 2017-09-13 09:14:26 UTC
Tested running cns-deploy manually and got the same problem.

Managed to fix problem by running 

modprobe dm_thin_pool on all nodes running glusterfs pods.

Then CNS deployed and works ok.

Comment 4 Jose A. Rivera 2017-09-13 16:29:18 UTC
Okay, that's what I anticipated.

At this time there is not much that we can do to resolve this in cns-deploy. We have provided an advisory message at the beginning of the deployment that all nodes running GlusterFS need to have a certain set of kernel modules running: https://github.com/gluster/gluster-kubernetes/blob/master/deploy/gk-deploy#L535-L538

When cns-deploy is deprecated in favor of the openshift-ansible installer, this will be addressed.

Comment 5 Scott Dodson 2017-09-25 15:15:33 UTC
*** Bug 1493705 has been marked as a duplicate of this bug. ***

Comment 8 Wenkai Shi 2017-11-02 05:37:48 UTC
Verified with version openshift-ansible-3.7.0-0.189.0.git.0.d497c5e.el7, CNS deployment succeed when Docker storage is overlay2.

Comment 12 errata-xmlrpc 2017-11-28 22:10:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.