Bug 1827000

Summary: CSV stuck in installing phase
Product: OpenShift Container Platform Reporter: Jian Zhang <jiazha>
Component: OLMAssignee: Alexander Greene <agreene>
OLM sub component: OLM QA Contact: kuiwang
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: dsover, krizza, nhale, vdinh
Version: 4.4Keywords: UpcomingSprint
Target Milestone: ---   
Target Release: 4.4.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: OLM allows users to specify volumes and volumeMounts via the subscriptionConfig field within a Subscription. Using this feature updates the deployment defined in the CSV. Occasionally, OLM would not have the subscription created for a CSV in its cache, and the CSV would be placed in the "installing phase" without creating the deployment with the volumes / volumeMounts defined in the Subscription. OLM would then be unable to move the CSV into the "Succeeded phase" because the calculated deployment hash would not equal the actual deployment hash on the deployment. This error would not be resolved because OLM does not update/recreate the deployment in the "installing phase", and the issue would persist until 5 minutes passed when OLM would re-sync CSVs. Consequence: OLM would occasionally be delayed while installing CSVs. Fix: If OLM encounters a deployment hash error when installing a CSV, OLM will recreate the deployment. Result: OLM is no longer delayed by an incorrect deployment hash.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-21 10:31:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1826443    
Bug Blocks:    

Description Jian Zhang 2020-04-23 01:24:27 UTC
-------This bug is a clone of bug 1826443-----------

Description of problem:
If OLM doesn't have the subscription in its cache when it initially creates a deployment, the CSV will hang on "Installing" until it reaches the five-minute timeout, then succeed on a retry. OLM should be updated to roll out the deployment upon noticing the issue.

Version-Release number of selected component (if applicable):
4.4

How reproducible:
Frequently - not always.

Steps to Reproduce:
The easiest way to reproduce this issue is to run the TestCreateNewSubscriptionWithPodConfig e2e-test.

Actual results:
The CSV is stuck in the `Installing` phase for 5 minutes.

Expected results:
The CSV is reaches the `Succeeded` phase quickly.


Additional info:

Comment 4 Vu Dinh 2020-05-28 17:37:35 UTC
Attempted to auto cherry-pick but it failed due to conflicts. This requires manually cherry-pick for backport PR.

Comment 10 kuiwang 2020-07-13 06:56:44 UTC
verify it on 4.4. LGTM

---
kuiwang@Kuis-MacBook-Pro operator-lifecycle-manager % git remote -v             
origin	git:operator-framework/operator-lifecycle-manager.git (fetch)
origin	git:operator-framework/operator-lifecycle-manager.git (push)
kuiwang@Kuis-MacBook-Pro operator-lifecycle-manager % git fetch origin release-4.4:release-4.4
remote: Enumerating objects: 138, done.
remote: Counting objects: 100% (138/138), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 232 (delta 135), reused 135 (delta 135), pack-reused 94
Receiving objects: 100% (232/232), 51.15 KiB | 158.00 KiB/s, done.
Resolving deltas: 100% (157/157), completed with 56 local objects.
From github.com:operator-framework/operator-lifecycle-manager
 * [new branch]        release-4.4 -> release-4.4
 * [new tag]           0.15.0      -> 0.15.0
 * [new tag]           0.15.1      -> 0.15.1
   3f7675c6..0326066b  release-4.4 -> origin/release-4.4
kuiwang@Kuis-MacBook-Pro operator-lifecycle-manager % git checkout release-4.4
Switched to branch 'release-4.4'
kuiwang@Kuis-MacBook-Pro operator-lifecycle-manager % git branch                
  master
* release-4.4
kuiwang@Kuis-MacBook-Pro operator-lifecycle-manager % git log -n 1
commit 0326066b8b2f87a1feec4c2bd9fdc8622ef5e1a0 (HEAD -> release-4.4, origin/release-4.4)
Merge: 7df9d667 cce4a36f
Author: OpenShift Merge Robot <openshift-merge-robot.github.com>
Date:   Sat Jul 11 03:34:46 2020 +0200

    Merge pull request #1566 from awgreene/bz-bp
    
    Bug 1827000: Pod Config Deployment Hash Error
kuiwang@Kuis-MacBook-Pro operator-lifecycle-manager % oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-07-12-055624   True        False         8m16s   Cluster version is 4.4.0-0.nightly-2020-07-12-055624
kuiwang@Kuis-MacBook-Pro operator-lifecycle-manager % oc get pod -n openshift-operator-lifecycle-manager
NAME                                READY   STATUS    RESTARTS   AGE
catalog-operator-76cc448b4d-lcc4n   1/1     Running   0          30m
olm-operator-5b49f587b9-5hhmv       1/1     Running   0          30m
packageserver-95d967c-5kww7         1/1     Running   0          24m
packageserver-95d967c-jvfwb         1/1     Running   0          23m
kuiwang@Kuis-MacBook-Pro operator-lifecycle-manager % oc exec olm-operator-5b49f587b9-5hhmv -n openshift-operator-lifecycle-manager -- olm --version
OLM version: 0.14.2
git commit: 0326066b8b2f87a1feec4c2bd9fdc8622ef5e1a0


kuiwang@Kuis-MacBook-Pro operator-lifecycle-manager % for i in `seq 10`; do  go test -mod=vendor -count=1 -failfast -tags=local -covermode=count -coverpkg ./pkg/controller/...  -test.v -test.timeout 30m  ./test/e2e/... -test.run='TestCreateNewSubscriptionWithPodConfig' -kubeconfig=${KUBECONFIG:-~/.kube/config} -namespace=openshift-operator-lifecycle-manager -olmNamespace=openshift-operator-lifecycle-manager; sleep 10 ;done
time="2020-07-13T14:40:42+08:00" level=info msg="Loading kube client config from path \"/Users/kuiwang/work/bin/jb/kubeconf/kubeconfig\""
=== RUN   TestCreateNewSubscriptionWithPodConfig
time="2020-07-13T14:40:42+08:00" level=info msg="Loading kube client config from path \"/Users/kuiwang/work/bin/jb/kubeconf/kubeconfig\""
time="2020-07-13T14:40:42+08:00" level=info msg="Loading kube client config from path \"/Users/kuiwang/work/bin/jb/kubeconf/kubeconfig\""
waiting for catalog pod podconfig-catsrc-hgrjn to be available (for sync)
catalog podconfig-catsrc-hgrjn pod with address podconfig-catsrc-hgrjn.openshift-operator-lifecycle-manager.svc:50051
time="2020-07-13T14:41:07+08:00" level=info msg="Loading kube client config from path \"/Users/kuiwang/work/bin/jb/kubeconf/kubeconfig\""
time="2020-07-13T14:41:07+08:00" level=info msg="Loading kube client config from path \"/Users/kuiwang/work/bin/jb/kubeconf/kubeconfig\""
--- PASS: TestCreateNewSubscriptionWithPodConfig (31.82s)
    util_test.go:489: Creating catalog source podconfig-catsrc-hgrjn in namespace openshift-operator-lifecycle-manager...
    util_test.go:494: Catalog source podconfig-catsrc-hgrjn created
    subscription_e2e_test.go:352: 14:40:47.6599:  (): nil
    subscription_e2e_test.go:352: 14:40:48.6577:  (): nil
    subscription_e2e_test.go:352: 14:40:49.6543:  (): nil
    subscription_e2e_test.go:352: 14:40:50.6555:  (): nil
    subscription_e2e_test.go:352: 14:40:51.6515:  (): nil
    subscription_e2e_test.go:352: 14:40:52.6597:  (): nil
    subscription_e2e_test.go:352: 14:40:53.6567:  (): nil
    subscription_e2e_test.go:352: 14:40:54.6591:  (): nil
    subscription_e2e_test.go:352: 14:40:55.6598:  (): nil
    subscription_e2e_test.go:352: 14:40:56.6564: UpgradePending (nginx-b): &ObjectReference{Kind:InstallPlan,Namespace:openshift-operator-lifecycle-manager,Name:install-zmxb9,UID:917d5972-0abd-4b80-8d7e-a18c3251efb4,APIVersion:operators.coreos.com/v1alpha1,ResourceVersion:29914,FieldPath:,}
    subscription_e2e_test.go:352: 14:40:57.683: UpgradePending (nginx-b): &ObjectReference{Kind:InstallPlan,Namespace:openshift-operator-lifecycle-manager,Name:install-zmxb9,UID:917d5972-0abd-4b80-8d7e-a18c3251efb4,APIVersion:operators.coreos.com/v1alpha1,ResourceVersion:29914,FieldPath:,}
    subscription_e2e_test.go:352: 14:40:58.6725: UpgradePending (nginx-b): &ObjectReference{Kind:InstallPlan,Namespace:openshift-operator-lifecycle-manager,Name:install-zmxb9,UID:917d5972-0abd-4b80-8d7e-a18c3251efb4,APIVersion:operators.coreos.com/v1alpha1,ResourceVersion:29914,FieldPath:,}
    subscription_e2e_test.go:352: 14:40:59.6998: AtLatestKnown (nginx-b): &ObjectReference{Kind:InstallPlan,Namespace:openshift-operator-lifecycle-manager,Name:install-zmxb9,UID:917d5972-0abd-4b80-8d7e-a18c3251efb4,APIVersion:operators.coreos.com/v1alpha1,ResourceVersion:29914,FieldPath:,}
    csv_e2e_test.go:257: Installing (InstallWaiting): installing: waiting for deployment podconfig-dep-b6jmp to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available...
    csv_e2e_test.go:257: Installing (InstallWaiting): installing: waiting for deployment podconfig-dep-b6jmp to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available...
    csv_e2e_test.go:257: Installing (InstallWaiting): installing: waiting for deployment podconfig-dep-b6jmp to become ready: Waiting for rollout to finish: 1 old replicas are pending termination...
    csv_e2e_test.go:257: Installing (InstallWaiting): installing: waiting for deployment podconfig-dep-b6jmp to become ready: Waiting for rollout to finish: 1 old replicas are pending termination...
    csv_e2e_test.go:257: Installing (InstallWaiting): installing: waiting for deployment podconfig-dep-b6jmp to become ready: Waiting for rollout to finish: 1 old replicas are pending termination...
    csv_e2e_test.go:257: Succeeded (InstallSucceeded): install strategy completed with no errors
    util_test.go:457: Deleting config map podconfig-catsrc-hgrjn-configmap...
    util_test.go:450: Deleting catalog source podconfig-catsrc-hgrjn...
    util_test.go:406: cleaning up any remaining non persistent resources...
    util_test.go:421: 0 csvs remaining
    util_test.go:428: 0 installplans remaining
    util_test.go:435: 0 subs remaining
    util_test.go:442: 0 catalogs remaining
PASS
coverage: 0.1% of statements in ./pkg/controller/...
ok  	github.com/operator-framework/operator-lifecycle-manager/test/e2e	33.482s	coverage: 0.1% of statements in ./pkg/controller/...
?   	github.com/operator-framework/operator-lifecycle-manager/test/e2e/data/kiali-image-registry	[no test files]
?   	github.com/operator-framework/operator-lifecycle-manager/test/e2e/wait	[no test files]
time="2020-07-13T14:41:28+08:00" level=info msg="Loading kube client config from path \"/Users/kuiwang/work/bin/jb/kubeconf/kubeconfig\""
=== RUN   TestCreateNewSubscriptionWithPodConfig
time="2020-07-13T14:41:29+08:00" level=info msg="Loading kube client config from path \"/Users/kuiwang/work/bin/jb/kubeconf/kubeconfig\""
time="2020-07-13T14:41:29+08:00" level=info msg="Loading kube client config from path \"/Users/kuiwang/work/bin/jb/kubeconf/kubeconfig\""
catalog podconfig-catsrc-7rjwj pod with address podconfig-catsrc-7rjwj.openshift-operator-lifecycle-manager.svc:50051
time="2020-07-13T14:41:56+08:00" level=info msg="Loading kube client config from path \"/Users/kuiwang/work/bin/jb/kubeconf/kubeconfig\""
time="2020-07-13T14:41:56+08:00" level=info msg="Loading kube client config from path \"/Users/kuiwang/work/bin/jb/kubeconf/kubeconfig\""
--- PASS: TestCreateNewSubscriptionWithPodConfig (34.25s)
...
--- PASS: TestCreateNewSubscriptionWithPodConfig (39.59s)
    util_test.go:489: Creating catalog source podconfig-catsrc-n8tlt in namespace openshift-operator-lifecycle-manager...
    util_test.go:494: Catalog source podconfig-catsrc-n8tlt created
    subscription_e2e_test.go:352: 14:48:40.5442:  (): nil
    subscription_e2e_test.go:352: 14:48:41.5722:  (): nil
    subscription_e2e_test.go:352: 14:48:42.551:  (): nil
    subscription_e2e_test.go:352: 14:48:43.5865:  (): nil
    subscription_e2e_test.go:352: 14:48:44.6414:  (): nil
    subscription_e2e_test.go:352: 14:48:45.5639:  (): nil
    subscription_e2e_test.go:352: 14:48:46.5332:  (): nil
    subscription_e2e_test.go:352: 14:48:47.554:  (): nil
    subscription_e2e_test.go:352: 14:48:48.5606:  (): nil
    subscription_e2e_test.go:352: 14:48:49.5412:  (): nil
    subscription_e2e_test.go:352: 14:48:50.5758:  (): nil
    subscription_e2e_test.go:352: 14:48:51.9233:  (): nil
    subscription_e2e_test.go:352: 14:48:52.5702:  (): nil
    subscription_e2e_test.go:352: 14:48:53.5504: UpgradePending (nginx-b): &ObjectReference{Kind:InstallPlan,Namespace:openshift-operator-lifecycle-manager,Name:install-bdgr7,UID:3293ad0d-798c-4bb1-bd80-7192fe74cb76,APIVersion:operators.coreos.com/v1alpha1,ResourceVersion:35241,FieldPath:,}
    subscription_e2e_test.go:352: 14:48:55.2881: UpgradePending (nginx-b): &ObjectReference{Kind:InstallPlan,Namespace:openshift-operator-lifecycle-manager,Name:install-bdgr7,UID:3293ad0d-798c-4bb1-bd80-7192fe74cb76,APIVersion:operators.coreos.com/v1alpha1,ResourceVersion:35241,FieldPath:,}
    subscription_e2e_test.go:352: 14:48:56.4987: AtLatestKnown (nginx-b): &ObjectReference{Kind:InstallPlan,Namespace:openshift-operator-lifecycle-manager,Name:install-bdgr7,UID:3293ad0d-798c-4bb1-bd80-7192fe74cb76,APIVersion:operators.coreos.com/v1alpha1,ResourceVersion:35241,FieldPath:,}
    csv_e2e_test.go:257: Installing (InstallWaiting): installing: waiting for deployment podconfig-dep-w8tr6 to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available...
    csv_e2e_test.go:257: Installing (InstallSucceeded): waiting for install components to report healthy
    csv_e2e_test.go:257: Installing (InstallWaiting): installing: waiting for deployment podconfig-dep-w8tr6 to become ready: Waiting for rollout to finish: 1 old replicas are pending termination...
    csv_e2e_test.go:257: Installing (InstallWaiting): installing: waiting for deployment podconfig-dep-w8tr6 to become ready: Waiting for rollout to finish: 1 old replicas are pending termination...
    csv_e2e_test.go:257: Installing (InstallWaiting): installing: waiting for deployment podconfig-dep-w8tr6 to become ready: Waiting for rollout to finish: 1 old replicas are pending termination...
    csv_e2e_test.go:257: Succeeded (InstallSucceeded): install strategy completed with no errors
    util_test.go:457: Deleting config map podconfig-catsrc-n8tlt-configmap...
    util_test.go:450: Deleting catalog source podconfig-catsrc-n8tlt...
    util_test.go:406: cleaning up any remaining non persistent resources...
    util_test.go:421: 0 csvs remaining
    util_test.go:428: 0 installplans remaining
    util_test.go:435: 0 subs remaining
    util_test.go:442: 0 catalogs remaining
PASS
coverage: 0.1% of statements in ./pkg/controller/...
ok  	github.com/operator-framework/operator-lifecycle-manager/test/e2e	41.591s	coverage: 0.1% of statements in ./pkg/controller/...
?   	github.com/operator-framework/operator-lifecycle-manager/test/e2e/data/kiali-image-registry	[no test files]
?   	github.com/operator-framework/operator-lifecycle-manager/test/e2e/wait	[no test files]


---

Comment 12 errata-xmlrpc 2020-07-21 10:31:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2913