Bug 1504535 - Deploy cfme failed when using external NFS [NEEDINFO]
Summary: Deploy cfme failed when using external NFS
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.7.z
Assignee: Tim Bielawa
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-20 06:32 UTC by Gaoyun Pei
Modified: 2017-11-28 22:18 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: An undefined variable was used in a task. Consequence: The undefined variable caused a jinja template evaluation error which would crash the installation. Fix: The undefined variable has been removed and replaced with more informative error text. Result: The playbook does not error out for external nfs storage class installations.
Clone Of:
Environment:
Last Closed: 2017-11-28 22:18:08 UTC
Target Upstream Version:
fsimonce: needinfo? (scweiss)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift-ansible/pull/5974 None None None 2017-11-01 14:41:13 UTC
Red Hat Product Errata RHSA-2017:3188 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Gaoyun Pei 2017-10-20 06:32:34 UTC
Description of problem:
Installer failed when creating cfme with an external NFS server at running task "Ensure the CFME App PV is created", the error was: 'openshift_management_nfs_server' is undefined


Version-Release number of the following components:
openshift-ansible-3.7.0-0.167.0.git.0.0e34535.el7.noarch.rpm

How reproducible:

Steps to Reproduce:
1. With the following options added into ansible inventory file in addition, run playbooks/byo/config.yml
openshift_management_install_management=true
openshift_management_app_template=miq-template
openshift_management_storage_class=nfs_external
openshift_management_storage_nfs_external_hostname=openshift-144.x.com
openshift_management_storage_nfs_base_dir=/nfsshare/cfme-test



Actual results:
TASK [openshift_management : Check if the CFME DB PV has been created] *********
Friday 20 October 2017  06:17:20 +0000 (0:00:02.023)       1:05:31.660 ******** 
ok: [openshift-128.lab.sjc.redhat.com] => {"changed": false, "failed": false, "results": {"cmd": "/usr/bin/oc get pv miq-db -o json -n openshift-management", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): persistentvolumes \"miq-db\" not found\n", "stdout": ""}, "state": "list"}

TASK [openshift_management : Ensure the CFME App PV is created] ****************
Friday 20 October 2017  06:17:22 +0000 (0:00:02.040)       1:05:33.700 ******** 
fatal: [openshift-128.lab.sjc.redhat.com]: FAILED! => {"failed": true, "msg": "The task includes an option with an undefined variable. The error was: 'openshift_management_nfs_server' is undefined\n\nThe error appears to have been in '/home/slave5/workspace/Launch-Environment-Flexy/private-openshift-ansible/roles/openshift_management/tasks/storage/create_nfs_pvs.yml': line 47, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Ensure the CFME App PV is created\n  ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'openshift_management_nfs_server' is undefined"}
	to retry, use: --limit @/home/slave5/workspace/Launch-Environment-Flexy/private-openshift-ansible/playbooks/byo/config.retry



Expected results:

Additional info:

Comment 5 Tim Bielawa 2017-10-20 16:00:17 UTC
That's strange that you got that far without it erroring. The validation steps should have noticed something was wrong.

I just reproduced this error on my local workstation. Should be a quick fix.

Comment 6 Tim Bielawa 2017-10-20 16:02:05 UTC
Confirmed. I have identified the issue and the fix will be shortly coming.

Comment 7 Tim Bielawa 2017-10-20 20:35:03 UTC
Fix submitted in my other cleanup PR for CFME

https://github.com/openshift/openshift-ansible/pull/5793

Comment 9 Gaoyun Pei 2017-10-25 10:05:06 UTC
Still reproducible with openshift-ansible-3.7.0-0.178.0.git.0.27a1039.el7.noarch.rpm.

PR https://github.com/openshift/openshift-ansible/pull/5793 not merged yet.

Comment 10 Tim Bielawa 2017-10-31 18:31:59 UTC
PR is merged. Please try again.

Comment 11 Scott Dodson 2017-10-31 19:15:47 UTC
$ git tag --contains ac62ea0066934877f94e99bda6ec53a9c03ababb
openshift-ansible-3.7.0-0.178.2
openshift-ansible-3.7.0-0.182.0
openshift-ansible-3.7.0-0.183.0
openshift-ansible-3.7.0-0.184.0
openshift-ansible-3.7.0-0.185.0
openshift-ansible-3.7.0-0.186.0
openshift-ansible-3.7.0-0.187.0
openshift-ansible-3.7.0-0.188.0

Comment 12 Gaoyun Pei 2017-11-01 08:46:10 UTC
Tried with openshift-ansible-3.7.0-0.188.0.git.0.aebb674.el7.noarch.rpm, still fails as below:

...

TASK [openshift_management : Ensure we save the external NFS server] ***********
Wednesday 01 November 2017  04:13:17 +0000 (0:00:00.033)       1:05:17.982 **** 
ok: [openshift-119.lab.sjc.redhat.com] => {"ansible_facts": {"openshift_management_nfs_server": "openshift-144.lab.sjc.redhat.com"}, "changed": false, "failed": false}

TASK [openshift_management : Failed NFS server detection] **********************
Wednesday 01 November 2017  04:13:17 +0000 (0:00:00.049)       1:05:18.031 **** 
fatal: [openshift-119.lab.sjc.redhat.com]: FAILED! => {"failed": true, "msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'oo_nfs_to_config'\n\nThe error appears to have been in '/home/slave6/workspace/Launch-Environment-Flexy/private-openshift-ansible/roles/openshift_management/tasks/storage/nfs_server.yml': line 23, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Failed NFS server detection\n  ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'dict object' has no attribute 'oo_nfs_to_config'"}

Comment 13 Tim Bielawa 2017-11-01 12:48:55 UTC
Reproduced this successfully. The error is now caused by me using an undefined variable in the error message. It's funny, because that task wouldn't even have ran.

Comment 14 Tim Bielawa 2017-11-01 14:41:13 UTC
I have pushed a fix for this bug

https://github.com/openshift/openshift-ansible/pull/5974

Comment 15 Gaoyun Pei 2017-11-07 11:06:12 UTC
Tried this with openshift-ansible-3.7.0-0.196.0.git.0.27cd7ec.el7.noarch for PR#5974 has been merged in.

Deploying cfme with an external NFS server by the installation playbook is working well now, will move this bug as verified once it's changed to ON_QA, thanks!


With the following parameters set in ansible inventory file:
openshift_management_install_beta=true
openshift_management_app_template=miq-template
openshift_management_storage_class=nfs_external
openshift_management_storage_nfs_external_hostname=openshift-x.x.com
openshift_management_storage_nfs_base_dir=/nfsshare/cfme-test

Run cfme deployment playbook:
ansible-playbook -i host -v /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-management/config.yml

After playbook finished, check cfme pod status:
[root@host-192-168-2-144 ~]# oc get pod
NAME                 READY     STATUS    RESTARTS   AGE
httpd-1-pd4g9        1/1       Running   0          17m
manageiq-0           1/1       Running   0          17m
memcached-1-q26sh    1/1       Running   0          17m
postgresql-1-xw59h   1/1       Running   0          17m

Check the pv used in miq and psql pod and the data created on NFS directory:
[root@host-192-168-2-144 ~]# oc rsh manageiq-0
sh-4.2# df -h
Filesystem                                                    Size  Used Avail Use% Mounted on
overlay                                                        59G  7.0G   52G  12% /
tmpfs                                                         3.9G     0  3.9G   0% /dev
tmpfs                                                         3.9G     0  3.9G   0% /sys/fs/cgroup
openshift-x.x.com:/nfsshare/cfme-test/miq-app    8.8G  1.5G  7.4G  17% /persistent
/dev/mapper/rhel-root                                          59G  7.0G   52G  12% /etc/hosts
shm                                                            64M     0   64M   0% /dev/shm
tmpfs                                                         3.9G   16K  3.9G   1% /run/secrets/kubernetes.io/serviceaccount
sh-4.2# ls /persistent/server-d
server-data/   server-deploy/ 
sh-4.2# ls /persistent/*
/persistent/server-data:
var

/persistent/server-deploy:
backup	log


[root@host-192-168-2-144 ~]# oc rsh postgresql-1-xw59h
sh-4.2$ df -h
Filesystem                                                   Size  Used Avail Use% Mounted on
overlay                                                       59G  6.9G   52G  12% /
tmpfs                                                        3.9G     0  3.9G   0% /dev
tmpfs                                                        3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/mapper/rhel-root                                         59G  6.9G   52G  12% /etc/hosts
shm                                                           64M  4.0K   64M   1% /dev/shm
openshift-144.lab.sjc.redhat.com:/nfsshare/cfme-test/miq-db  8.8G  1.5G  7.4G  17% /var/lib/pgsql/data
tmpfs                                                        3.9G   16K  3.9G   1% /run/secrets/kubernetes.io/serviceaccount
sh-4.2$ ls /var/lib/pgsql/data/*
PG_VERSION  pg_clog	  pg_hba.conf	 pg_logical    pg_replslot   pg_stat	  pg_tblspc    postgresql.auto.conf  postmaster.pid
base	    pg_commit_ts  pg_ident.conf  pg_multixact  pg_serial     pg_stat_tmp  pg_twophase  postgresql.conf
global	    pg_dynshmem   pg_log	 pg_notify     pg_snapshots  pg_subtrans  pg_xlog      postmaster.opts

Comment 16 Scott Dodson 2017-11-07 13:50:56 UTC
Gaoyun, thanks for verifying it!

Comment 17 Tim Bielawa 2017-11-07 14:18:38 UTC
Woo Hoo!

Comment 18 Gaoyun Pei 2017-11-08 01:58:46 UTC
Mark this bug as verified according to Comment 15

Comment 21 errata-xmlrpc 2017-11-28 22:18:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.