Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1404193

Summary: excluded docker packages should not be removed when update atomic-docker-excluder
Product: OpenShift Container Platform Reporter: liujia <jiajliu>
Component: Cluster Version OperatorAssignee: Troy Dawson <tdawson>
Status: CLOSED ERRATA QA Contact: liujia <jiajliu>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.4.0CC: aos-bugs, jiajliu, jokerman, mmccomas
Target Milestone: ---   
Target Release: 3.4.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
An error in the atomic-openshift-docker-excluder package led to packages being removed from the exclusion list when upgraded. This error has been resolved ensuring that the proper packages are excluded from yum operations.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-31 20:19:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description liujia 2016-12-13 10:01:11 UTC
Description of problem:
After upgrade atomic-docker-excluder-3.3 to 3.4 with excluded packages listed in yum.conf, packages which in new version docker-excluder's package_list have been removed.

before update:
# cat /etc/yum.conf | grep exclude
exclude= docker*1.20*  docker*1.19*  docker*1.18*  docker*1.17*  docker*1.16*  docker*1.15*  docker*1.14*  docker*1.13* 

after update:
# cat /etc/yum.conf | grep exclude
exclude=


Version-Release number of selected component (if applicable):
atomic-openshift-docker-excluder-3.4.0.35-1.git.0.86b11df.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. install ocp3.3 and docker-excluder-3.3
2. upgrade ocp3.3 to 3.4
3. update docker-excluder-3.3 to 3.4
#yum update atomic-openshift-docker-excluder

Actual results:
All docker related packages excluded before update have been removed.

Expected results:
If user exclude docker packages before update, then the excluded packages should be updated to new version's excluder requires.
If user unexclude docker packages before update, then original config should be kept. 

Additional info:
The same problem happened for other version(3.3,3.2,etc).

Comment 1 Tim Bielawa 2016-12-13 21:27:00 UTC
I can not reproduce this error using the "Steps to Reproduce" in your report. After step 1 your "before update:" results are different than mine.

- Fresh EC2 instance registered with subscription manager

> # oc version
> oc v3.3.1.7
> ...
> 
> # yum -y install atomic-openshift-docker-excluder atomic-openshift-excluder
> ...
> 
> # rpm -q atomic-openshift-docker-excluder atomic-openshift-excluder
> atomic-openshift-docker-excluder-3.3.1.7-1.git.0.0988966.el7.noarch
> atomic-openshift-excluder-3.3.1.7-1.git.0.0988966.el7.noarch
> 
> # grep exclude /etc/yum.conf 
> excludes=
> excludes=

I do not have `openshift_additional_repos` set to any value in my inventory. I am not sure how to reproduce the error you described. Please advise.

Comment 2 liujia 2016-12-14 02:21:28 UTC
Hi Tim

Your "before update:" is different from mine because there is a bug in 3.3(1403696) which has been fixed in 3.4. So u can workaround it as following:
1) install atomic-openshift-docker-excluder(should not install atomic-openshift-excluder before because bug1403655)
2) edit /usr/sbin/atomic-openshift-docker-excluder(refer to https://github.com/openshift/ose/pull/499)
-    sed -i 's|\(obsoletes=.*\)|\1\nexcludes=|' "${CONF_FILE}"
+    sed -i 's|\(installonly_limit=.*\)|\1\nexclude=|' "${CONF_FILE}"
3) run "atomic-openshift-docker-excluder exclude"

Then u can do next steps to reproduce.

Comment 3 Tim Bielawa 2016-12-14 17:39:03 UTC
liujia, great. Following those instructions worked for me

> [root@m01 ~]# atomic-openshift-docker-excluder exclude
> Adding exclude= to /etc/yum.conf
> [root@m01 ~]# grep exclude /etc/yum.conf 
> exclude= docker*1.20*  docker*1.19*  docker*1.18*  docker*1.17*  
> docker*1.16*  docker*1.15*  docker*1.14*  docker*1.13* 

Working on reproducing the rest of the issue now.

Comment 4 Tim Bielawa 2016-12-14 18:08:33 UTC
This does indeed reproduce

PRE UPGRADE:

> [root@m01 ~]# grep exclude /etc/yum.conf
> exclude= docker*1.20*  docker*1.19*  docker*1.18*  docker*1.17* 
> docker*1.16*  docker*1.15*  docker*1.14*  docker*1.13*

> [root@m01 ~]# oc version
> oc v3.3.1.7

> [root@m01 ~]# date
> Wed Dec 14 12:52:19 EST 2016

Then apply upgrade playbook

> [root@m01 ~]# oc version && date
> oc v3.4.0.36+ca20a16
> Wed Dec 14 13:05:33 EST 2016

Current docker excluder:

> [root@m01 ~]# grep exclude /etc/yum.conf
> exclude= docker*1.20*  docker*1.19*  docker*1.18*  docker*1.17*  
> docker*1.16*  docker*1.15*  docker*1.14*  docker*1.13*

> [root@m01 ~]# rpm -q atomic-openshift-docker-excluder.noarch
> atomic-openshift-docker-excluder-3.3.1.7-1.git.0.0988966.el7.noarch

Upgrade docker excluder....

> [root@m01 ~]# rpm -q atomic-openshift-docker-excluder.noarch
> atomic-openshift-docker-excluder-3.4.0.36-1.git.0.ca20a16.el7.noarch
> [root@m01 ~]# grep exclude /etc/yum.conf
> exclude=

Comment 5 Tim Bielawa 2016-12-14 20:57:01 UTC
This behavior is caused by the order in which `rpm` evaluates package scripts during upgrades. Package scripts are first ran **FOR THE NEW (UPGRADE) PACKAGE** and then **For the Old Package**:

1. Run the %pre section of the RPM being installed.
2. Install the files that the RPM provides.
3. Run the %post section of the RPM.
4. Run the %preun of the old package.
5. Delete any old files not overwritten by the newer version. (This step deletes files that the new package does not require.)
6. Run the %postun hook of the old package.

* Source [1]. Verify yourself with "rpm -U -vv atomic-openshift-docker-excluder-3.4.0.36-1.git.0.ca20a16.el7.noarch.rpm" and read the upgrade process step by step: https://gist.github.com/tbielawa/f74b34103dec4195c8c484c735c0ab6f

----

1: N/A: The new excluder package does not have a %pre (install) section

2. The updated excluder files (/usr/sbin/atomic-openshift-docker-excluder) are installed on the filesystem

3: The %post (install) section of the new rpm is ran
> atomic-openshift-docker-excluder exclude

4: The %preun script of the **old** package is ran
> /usr/sbin/atomic-openshift-docker-excluder unexclude

5: N/A

6: N/A: Old package has no %postun script

---- 

What does this mean to us?

3: The new rpm's post-installation script is ran. This happens when you install a package for the first time, or upgrade an existing package. This means that the line

> exclude= docker*1.20 docker*1.19* ....

Is added to yum.conf (In reality, since this is an upgrade and presumably you have already ran the 'exclude' command, that line **is already present**)

4: The OLD rpm's pre-uninstall script is ran. This action runs the excluders **unexclude** command (all those packages we just added to the exclude list are removed).

---

In effect RPM is:

- Ensuring that the packages are present in the "exclude=" list.

- Running the old rpms uninstall process which results in the excluders unexclude command REMOVING all of the packages we wanted listed in the exclude list.


We need to patch the order these are being ran or add some logic to the excluder script.


* [1] http://www.ibm.com/developerworks/library/l-rpm2/

Comment 6 Scott Dodson 2016-12-16 16:45:55 UTC
Troy,

We came to the conclusion that we should probably make the excluder script ensure that the excludes for docker is correct on install regardless of current state and then only fire the postun scriptlet when $1 == 0.

Comment 7 Troy Dawson 2016-12-16 18:03:46 UTC
Scott,
The problem with your statement is that the incoming docker-excluder script has no way of knowing what the previous docker-excluder script was excluding.  Adding that type of smarts into the scripts will most likely create more bugs.

Comment #5 left out %posttrans, which get's ran after everything.

I propose we leave postun as it is, and change %post to %posttrans

I ran some quick tests and it looks like this does the job.

Comment 8 Troy Dawson 2017-01-04 15:06:06 UTC
For OCP (for this bug) this pull request will fix the problem.
https://github.com/openshift/ose/pull/525

Will get the fix upstream.

Comment 10 liujia 2017-01-22 07:52:20 UTC
Version:
atomic-openshift-utils-3.4.55-1.git.0.9cb1f40.el7.noarch

Steps:
1. Install atomic-openshift-docker-excluder-3.3.1.9-1.git.0.a7f5265.el7.noarch
2. Update docker-excluder to 3.4
#yum update atomic-openshift-docker-excluder

Result:
->before update
exclude= docker*1.20*  docker*1.19*  docker*1.18*  docker*1.17*  docker*1.16*  docker*1.15*  docker*1.14*  docker*1.13*
->after update
exclude= docker*1.20*  docker*1.19*  docker*1.18*  docker*1.17*  docker*1.16*  docker*1.15*  docker*1.14*  docker*1.13*

Verify the bug and change the status.

Comment 12 errata-xmlrpc 2017-01-31 20:19:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0218