Bug 1504535

Summary: Deploy cfme failed when using external NFS
Product: OpenShift Container Platform Reporter: Gaoyun Pei <gpei>
Component: InstallerAssignee: Tim Bielawa <tbielawa>
Status: CLOSED ERRATA QA Contact: Gaoyun Pei <gpei>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.7.0CC: aos-bugs, bazulay, fsimonce, jokerman, mmccomas, scweiss, sdodson
Target Milestone: ---   
Target Release: 3.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: An undefined variable was used in a task. Consequence: The undefined variable caused a jinja template evaluation error which would crash the installation. Fix: The undefined variable has been removed and replaced with more informative error text. Result: The playbook does not error out for external nfs storage class installations.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-28 22:18:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gaoyun Pei 2017-10-20 06:32:34 UTC
Description of problem:
Installer failed when creating cfme with an external NFS server at running task "Ensure the CFME App PV is created", the error was: 'openshift_management_nfs_server' is undefined


Version-Release number of the following components:
openshift-ansible-3.7.0-0.167.0.git.0.0e34535.el7.noarch.rpm

How reproducible:

Steps to Reproduce:
1. With the following options added into ansible inventory file in addition, run playbooks/byo/config.yml
openshift_management_install_management=true
openshift_management_app_template=miq-template
openshift_management_storage_class=nfs_external
openshift_management_storage_nfs_external_hostname=openshift-144.x.com
openshift_management_storage_nfs_base_dir=/nfsshare/cfme-test



Actual results:
TASK [openshift_management : Check if the CFME DB PV has been created] *********
Friday 20 October 2017  06:17:20 +0000 (0:00:02.023)       1:05:31.660 ******** 
ok: [openshift-128.lab.sjc.redhat.com] => {"changed": false, "failed": false, "results": {"cmd": "/usr/bin/oc get pv miq-db -o json -n openshift-management", "results": [{}], "returncode": 0, "stderr": "Error from server (NotFound): persistentvolumes \"miq-db\" not found\n", "stdout": ""}, "state": "list"}

TASK [openshift_management : Ensure the CFME App PV is created] ****************
Friday 20 October 2017  06:17:22 +0000 (0:00:02.040)       1:05:33.700 ******** 
fatal: [openshift-128.lab.sjc.redhat.com]: FAILED! => {"failed": true, "msg": "The task includes an option with an undefined variable. The error was: 'openshift_management_nfs_server' is undefined\n\nThe error appears to have been in '/home/slave5/workspace/Launch-Environment-Flexy/private-openshift-ansible/roles/openshift_management/tasks/storage/create_nfs_pvs.yml': line 47, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Ensure the CFME App PV is created\n  ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'openshift_management_nfs_server' is undefined"}
	to retry, use: --limit @/home/slave5/workspace/Launch-Environment-Flexy/private-openshift-ansible/playbooks/byo/config.retry



Expected results:

Additional info:

Comment 5 Tim Bielawa 2017-10-20 16:00:17 UTC
That's strange that you got that far without it erroring. The validation steps should have noticed something was wrong.

I just reproduced this error on my local workstation. Should be a quick fix.

Comment 6 Tim Bielawa 2017-10-20 16:02:05 UTC
Confirmed. I have identified the issue and the fix will be shortly coming.

Comment 7 Tim Bielawa 2017-10-20 20:35:03 UTC
Fix submitted in my other cleanup PR for CFME

https://github.com/openshift/openshift-ansible/pull/5793

Comment 9 Gaoyun Pei 2017-10-25 10:05:06 UTC
Still reproducible with openshift-ansible-3.7.0-0.178.0.git.0.27a1039.el7.noarch.rpm.

PR https://github.com/openshift/openshift-ansible/pull/5793 not merged yet.

Comment 10 Tim Bielawa 2017-10-31 18:31:59 UTC
PR is merged. Please try again.

Comment 11 Scott Dodson 2017-10-31 19:15:47 UTC
$ git tag --contains ac62ea0066934877f94e99bda6ec53a9c03ababb
openshift-ansible-3.7.0-0.178.2
openshift-ansible-3.7.0-0.182.0
openshift-ansible-3.7.0-0.183.0
openshift-ansible-3.7.0-0.184.0
openshift-ansible-3.7.0-0.185.0
openshift-ansible-3.7.0-0.186.0
openshift-ansible-3.7.0-0.187.0
openshift-ansible-3.7.0-0.188.0

Comment 12 Gaoyun Pei 2017-11-01 08:46:10 UTC
Tried with openshift-ansible-3.7.0-0.188.0.git.0.aebb674.el7.noarch.rpm, still fails as below:

...

TASK [openshift_management : Ensure we save the external NFS server] ***********
Wednesday 01 November 2017  04:13:17 +0000 (0:00:00.033)       1:05:17.982 **** 
ok: [openshift-119.lab.sjc.redhat.com] => {"ansible_facts": {"openshift_management_nfs_server": "openshift-144.lab.sjc.redhat.com"}, "changed": false, "failed": false}

TASK [openshift_management : Failed NFS server detection] **********************
Wednesday 01 November 2017  04:13:17 +0000 (0:00:00.049)       1:05:18.031 **** 
fatal: [openshift-119.lab.sjc.redhat.com]: FAILED! => {"failed": true, "msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'oo_nfs_to_config'\n\nThe error appears to have been in '/home/slave6/workspace/Launch-Environment-Flexy/private-openshift-ansible/roles/openshift_management/tasks/storage/nfs_server.yml': line 23, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Failed NFS server detection\n  ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'dict object' has no attribute 'oo_nfs_to_config'"}

Comment 13 Tim Bielawa 2017-11-01 12:48:55 UTC
Reproduced this successfully. The error is now caused by me using an undefined variable in the error message. It's funny, because that task wouldn't even have ran.

Comment 14 Tim Bielawa 2017-11-01 14:41:13 UTC
I have pushed a fix for this bug

https://github.com/openshift/openshift-ansible/pull/5974

Comment 15 Gaoyun Pei 2017-11-07 11:06:12 UTC
Tried this with openshift-ansible-3.7.0-0.196.0.git.0.27cd7ec.el7.noarch for PR#5974 has been merged in.

Deploying cfme with an external NFS server by the installation playbook is working well now, will move this bug as verified once it's changed to ON_QA, thanks!


With the following parameters set in ansible inventory file:
openshift_management_install_beta=true
openshift_management_app_template=miq-template
openshift_management_storage_class=nfs_external
openshift_management_storage_nfs_external_hostname=openshift-x.x.com
openshift_management_storage_nfs_base_dir=/nfsshare/cfme-test

Run cfme deployment playbook:
ansible-playbook -i host -v /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-management/config.yml

After playbook finished, check cfme pod status:
[root@host-192-168-2-144 ~]# oc get pod
NAME                 READY     STATUS    RESTARTS   AGE
httpd-1-pd4g9        1/1       Running   0          17m
manageiq-0           1/1       Running   0          17m
memcached-1-q26sh    1/1       Running   0          17m
postgresql-1-xw59h   1/1       Running   0          17m

Check the pv used in miq and psql pod and the data created on NFS directory:
[root@host-192-168-2-144 ~]# oc rsh manageiq-0
sh-4.2# df -h
Filesystem                                                    Size  Used Avail Use% Mounted on
overlay                                                        59G  7.0G   52G  12% /
tmpfs                                                         3.9G     0  3.9G   0% /dev
tmpfs                                                         3.9G     0  3.9G   0% /sys/fs/cgroup
openshift-x.x.com:/nfsshare/cfme-test/miq-app    8.8G  1.5G  7.4G  17% /persistent
/dev/mapper/rhel-root                                          59G  7.0G   52G  12% /etc/hosts
shm                                                            64M     0   64M   0% /dev/shm
tmpfs                                                         3.9G   16K  3.9G   1% /run/secrets/kubernetes.io/serviceaccount
sh-4.2# ls /persistent/server-d
server-data/   server-deploy/ 
sh-4.2# ls /persistent/*
/persistent/server-data:
var

/persistent/server-deploy:
backup	log


[root@host-192-168-2-144 ~]# oc rsh postgresql-1-xw59h
sh-4.2$ df -h
Filesystem                                                   Size  Used Avail Use% Mounted on
overlay                                                       59G  6.9G   52G  12% /
tmpfs                                                        3.9G     0  3.9G   0% /dev
tmpfs                                                        3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/mapper/rhel-root                                         59G  6.9G   52G  12% /etc/hosts
shm                                                           64M  4.0K   64M   1% /dev/shm
openshift-144.lab.sjc.redhat.com:/nfsshare/cfme-test/miq-db  8.8G  1.5G  7.4G  17% /var/lib/pgsql/data
tmpfs                                                        3.9G   16K  3.9G   1% /run/secrets/kubernetes.io/serviceaccount
sh-4.2$ ls /var/lib/pgsql/data/*
PG_VERSION  pg_clog	  pg_hba.conf	 pg_logical    pg_replslot   pg_stat	  pg_tblspc    postgresql.auto.conf  postmaster.pid
base	    pg_commit_ts  pg_ident.conf  pg_multixact  pg_serial     pg_stat_tmp  pg_twophase  postgresql.conf
global	    pg_dynshmem   pg_log	 pg_notify     pg_snapshots  pg_subtrans  pg_xlog      postmaster.opts

Comment 16 Scott Dodson 2017-11-07 13:50:56 UTC
Gaoyun, thanks for verifying it!

Comment 17 Tim Bielawa 2017-11-07 14:18:38 UTC
Woo Hoo!

Comment 18 Gaoyun Pei 2017-11-08 01:58:46 UTC
Mark this bug as verified according to Comment 15

Comment 21 errata-xmlrpc 2017-11-28 22:18:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Comment 22 Red Hat Bugzilla 2023-09-14 04:10:30 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days