Bug 1462087

Summary: Unable to mask service etcd_container
Product: OpenShift Container Platform Reporter: Gaoyun Pei <gpei>
Component: InstallerAssignee: Giuseppe Scrivano <gscrivan>
Status: CLOSED ERRATA QA Contact: Gaoyun Pei <gpei>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.6.0CC: aos-bugs, jokerman, mmccomas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-10 05:28:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1461662    
Bug Blocks:    

Description Gaoyun Pei 2017-06-16 07:28:39 UTC
Description of problem:
When running migration from previous containerized etcd to system container etcd, installer failed when trying to mask etcd_container service:

For containerized etcd installation, etcd container service file was created as /etc/systemd/system/etcd_container.service
https://github.com/openshift/openshift-ansible/blob/openshift-ansible-3.6.112-1/roles/etcd/tasks/main.yml#L21
In etcd system_container.yaml, it will try to mask the etcd_container service
https://github.com/openshift/openshift-ansible/blob/openshift-ansible-3.6.112-1/roles/etcd/tasks/system_container.yml#L39

[root@ip-172-18-4-95 ~]# systemctl status etcd_container
● etcd_container.service - The Etcd Server container
   Loaded: loaded (/etc/systemd/system/etcd_container.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2017-06-15 23:24:10 EDT; 3h 12min ago
 Main PID: 14689 (docker-current)
   Memory: 5.9M
   CGroup: /system.slice/etcd_container.service
           └─14689 /usr/bin/docker-current run --name etcd_container --rm -v /var/lib/etcd/:/var/lib/etcd/:z -v /etc/etcd:/etc/etcd:ro --env-file=/etc/etcd/etcd.conf --ne...

[root@ip-172-18-4-95 ~]# ls -al /etc/systemd/system/etcd_container.service
-rw-r--r--. 1 root root 576 Jun 15 22:58 /etc/systemd/system/etcd_container.service

[root@ip-172-18-4-95 ~]# ls -al /usr/lib/systemd/system/etcd_container.service
ls: cannot access /usr/lib/systemd/system/etcd_container.service: No such file or directory

[root@ip-172-18-4-95 ~]# systemctl stop etcd_container

[root@ip-172-18-4-95 ~]# systemctl disable etcd_container
Removed symlink /etc/systemd/system/docker.service.wants/etcd_container.service.

[root@ip-172-18-4-95 ~]# systemctl mask etcd_container
Failed to execute operation: Invalid argument


Version-Release number of selected component (if applicable):
openshift-ansible-3.6.112-1.git.0.1ce58b5.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1.Setup a containerized ocp-3.6 cluster, etcd docker container is running and etcd_container service is running.

2.Add use_etcd_system_container=true into ansible inventory file, run installation playbook again

Actual results:
TASK [etcd : Disable etcd_container] *******************************************
fatal: [ec2-52-206-163-36.compute-1.amazonaws.com]: FAILED! => {
    "changed": false, 
    "failed": true, 
    "failed_when_result": true
}

MSG:

Unable to mask service etcd_container: Failed to execute operation: Invalid argument


Expected results:


Additional info:

Comment 1 Giuseppe Scrivano 2017-06-20 08:50:23 UTC
I've created a PR here:

https://github.com/openshift/openshift-ansible/pull/4503

Comment 3 Gaoyun Pei 2017-06-26 04:17:04 UTC
Met with failure when installing etcd system container, the same error with https://bugzilla.redhat.com/show_bug.cgi?id=1461662#c6

TASK [etcd : Install or Update Etcd system container package] ******************
fatal: [qe-gpei-etcd-sc-etcd-1.0626-35y.qe.rhcloud.com]: FAILED! => {
    "changed": false, 
    "failed": true, 
    "module_stderr": "Shared connection to qe-gpei-etcd-sc-etcd-1.0626-35y.qe.rhcloud.com closed.\r\n", 
    "module_stdout": "Traceback (most recent call last):\r\n  File \"/tmp/ansible_NykF4B/ansible_module_oc_atomic_container.py\", line 214, in <module>\r\n    main()\r\n  File \"/tmp/ansible_NykF4B/ansible_module_oc_atomic_container.py\", line 202, in main\r\n    if atomic_version < StrictVersion('1.17.2'):\r\n  File \"/usr/lib64/python2.7/distutils/version.py\", line 140, in __cmp__\r\n    compare = cmp(self.version, other.version)\r\nAttributeError: StrictVersion instance has no attribute 'version'\r\n"
}

MSG:

MODULE FAILURE

Comment 4 Gaoyun Pei 2017-06-29 07:58:22 UTC
Verify this bug with openshift-ansible-3.6.126.1-1.git.0.41d2313.el7.noarch

Now installer will remove etcd_container service file directly instead of trying to mask etcd_container service. 

TASK [etcd : Check etcd system container package] ******************************
changed: [qe-gpei-36-con-rhel-2-etcd-1.0629-xhf.qe.rhcloud.com]

TASK [etcd : Unmask etcd service] **********************************************
ok: [qe-gpei-36-con-rhel-2-etcd-1.0629-xhf.qe.rhcloud.com]

TASK [etcd : Disable etcd_container] *******************************************
changed: [qe-gpei-36-con-rhel-2-etcd-1.0629-xhf.qe.rhcloud.com]

TASK [etcd : Remove etcd_container.service] ************************************
changed: [qe-gpei-36-con-rhel-2-etcd-1.0629-xhf.qe.rhcloud.com]

Comment 6 errata-xmlrpc 2017-08-10 05:28:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716