Bug 1572552

Summary: CNS as docker registry backend storage installation failed with 'dict object' has no attribute 'fsGroup' error
Product: OpenShift Container Platform Reporter: Wenkai Shi <weshi>
Component: InstallerAssignee: Jose A. Rivera <jarrpa>
Status: CLOSED CURRENTRELEASE QA Contact: Wenkai Shi <weshi>
Severity: medium Docs Contact:
Priority: high    
Version: 3.10.0CC: agladkov, antonio, aos-bugs, bleanhar, jokerman, mmccomas, sdodson, wmeng
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-18 20:42:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
oc get pods
none
oc get -o yaml pod docker-registry-1-5dfcb
none
oc describe pod docker-registry-1-5dfcb
none
ansible-playbook -i inventory.ini playbooks/deploy_cluster.yml
none
inventory.ini none

Description Wenkai Shi 2018-04-27 10:16:48 UTC
Description of problem:
CNS as docker registry backend storage installation failed with 'dict object' has no attribute 'fsGroup' error.

Version-Release number of the following components:
openshift-ansible-3.10.0-0.30.0.git.0.4f02952.el7

How reproducible:
100%

Steps to Reproduce:
1. Install OCP, CNS as docker-registry backend storage.
# cat hosts
...
openshift_hosted_registry_storage_kind=glusterfs
...
[glusterfs_registry]
gulsterfs1.example.com glusterfs_devices="['/dev/vsda']"
glusterfs2.example.com glusterfs_devices="['/dev/vsda']"
glusterfs3.example.com glusterfs_devices="['/dev/vsda']"
...

2.
3.

Actual results:
# ansible-playbook -i hosts -v /usr/share/ansible/openshift-ansible/playbooks/prerequisites.yml
...
# ansible-playbook -i hosts -v /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml
...
TASK [openshift_hosted : Determine registry fsGroup] ***************************
Friday 27 April 2018  05:53:37 -0400 (0:00:32.866)       0:19:49.765 ********** 
fatal: [master.example.com]: FAILED! => {"failed": true, "msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'fsGroup'\n\nThe error appears to have been in '/home/slave3/workspace/Launch Environment Flexy/private-openshift-ansible/roles/openshift_hosted/tasks/storage/glusterfs.yml': line 24, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Determine registry fsGroup\n  ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'dict object' has no attribute 'fsGroup'"}
...
Failure summary:


  1. Hosts:    master.example.com
     Play:     Poll for hosted pod deployments
     Task:     Determine registry fsGroup
     Message:  The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'fsGroup'
               
               The error appears to have been in '/usr/share/ansible/openshift-ansible/roles/openshift_hosted/tasks/storage/glusterfs.yml': line 24, column 3, but may
               be elsewhere in the file depending on the exact syntax problem.
               
               The offending line appears to be:
               
               
               - name: Determine registry fsGroup
                 ^ here
               
               exception type: <class 'ansible.errors.AnsibleUndefinedVariable'>
               exception: 'dict object' has no attribute 'fsGroup'

...
Expected results:
Should pass here.

Additional info:

Comment 3 Jose A. Rivera 2018-04-27 13:45:43 UTC
Could you provide your full inventory file as well as the output of "oc get -o yaml" and "oc describe" for one of the registry pods?

Comment 6 Jose A. Rivera 2018-04-30 11:55:56 UTC
Finally had a good look at this... is this consistently reproducible? Does it reproduce in 3.9 or 3.7?

Comment 7 Wenkai Shi 2018-05-02 04:50:17 UTC
(In reply to Jose A. Rivera from comment #6)
> Finally had a good look at this... is this consistently reproducible? Does
> it reproduce in 3.9 or 3.7?

Try with openshift-ansible-3.7.44-1.git.0.dbb912c.el7 and openshift-ansible-3.9.27-1.git.0.52e35b5.el7, it doesn't reproduce.

Take another shoot in openshift-ansible-3.10.0-0.31.0.git.0.9f771c7.el7, it reproduce.

Comment 8 Antonio Guillen 2018-05-02 21:27:27 UTC
Created attachment 1430335 [details]
oc get pods

Comment 9 Antonio Guillen 2018-05-02 21:29:04 UTC
Created attachment 1430336 [details]
oc get -o yaml pod docker-registry-1-5dfcb

Comment 10 Antonio Guillen 2018-05-02 21:30:29 UTC
Created attachment 1430337 [details]
oc describe pod docker-registry-1-5dfcb

Comment 11 Antonio Guillen 2018-05-02 21:32:51 UTC
Created attachment 1430338 [details]
ansible-playbook -i inventory.ini playbooks/deploy_cluster.yml

Comment 12 Antonio Guillen 2018-05-02 21:39:50 UTC
Created attachment 1430341 [details]
inventory.ini

Comment 13 Antonio Guillen 2018-05-02 21:45:36 UTC
The same problem. See attachments.

# oc version
oc v3.10.0-alpha.0+428152d-936
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://openshift.guillen.io:8443
openshift v3.10.0-alpha.0+428152d-936
kubernetes v1.10.0+b81c8f8

Comment 15 Jose A. Rivera 2018-05-16 12:48:57 UTC
Sorry for the delay, I've been traveling a lot lately.

Similarly sorry to ask fro more info, but could you try again on 3.9 (which should succeed) and grab the output of "oc get -o yaml" and "oc describe" for one of the registry pods? I currently don't have immediate access to an OCP environment to test myself. Mostly I'm looking to see if 3.9 reports an "fsGroup" field.

Scott, any immediate ideas if something has changed in the hosted registry from 3.9 to 3.10 that might explain this?

Comment 17 Jose A. Rivera 2018-05-17 13:10:51 UTC
I was afraid of this. In the 3.9 output, the registry pod has:

  securityContext:
    fsGroup: 1000000000

But in the 3.10 output the securityContext field is blank.

I'll try to look into this, hopefully someone else can chime in with more wisdom in the meantime.

Comment 18 Scott Dodson 2018-05-18 02:13:31 UTC
Is the following configured in 3.9 but not 3.10 on the nodes in /etc/origin/node/node-config.yaml

volumeConfig:
  localQuota:
    perFSGroup

I'm not aware of any changes in registry actual deployment.

Comment 19 Scott Dodson 2018-05-18 02:14:21 UTC
Alexi,

Any ideas on why the registry would have fsGroup set in 3.9 but not 3.10?

Comment 20 Alexey Gladkov 2018-05-21 13:08:23 UTC
I don't know. We didn't change it.

Comment 23 Jose A. Rivera 2018-05-30 18:49:10 UTC
Scott, I checked the node-config templates and the volumeConfig stanza is the same between 3.9 and 3.10. Any other ideas?

Comment 24 Scott Dodson 2018-05-30 19:52:17 UTC
There were some volumeConfig changes that were recently added to 3.10 to account for changes in Origin. Lets retest with the latest openshift-ansible.

https://github.com/openshift/openshift-ansible/pull/8450

Which is included in openshift-ansible-3.10.0-0.51.0 and later.

Comment 25 Wenkai Shi 2018-06-04 02:37:08 UTC
Verified with version openshift-ansible-3.10.0-0.54.0.git.0.537c485.el7, it doesn't appear now.

Comment 26 Antonio Guillen 2018-06-04 17:40:30 UTC
I also tried it and it works as expected with the version openshift-ansible-3.10.0-0.58.0-2-g73079a70f