Bug 1608837 - import-vm-apb keeps creating deprovision namespaces & pods
Summary: import-vm-apb keeps creating deprovision namespaces & pods
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Installation
Version: 1.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 1.3
Assignee: Ryan Hallisey
QA Contact: Qixuan Wang
URL:
Whiteboard:
Depends On: 1637010
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-26 11:23 UTC by Vatsal Parekh
Modified: 2018-12-05 18:57 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-12-05 18:56:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:3776 0 None None None 2018-12-05 18:57:10 UTC

Description Vatsal Parekh 2018-07-26 11:23:12 UTC
Description of problem:
if a import-vm-apb had a failure, it kept creating deprovisioning namespaces and pods, and as pods also seemed to be having error, it kept creating more deprovisioning pods
resulting in creating 2000+ projects with errors

Version-Release number of selected component (if applicable):
CNV1.1(KubeVirt 0.6.2)
import-vm-apb (0.6.2-1)

How reproducible:


Steps to Reproduce:
1. try to import a vm with a name that is already there (so it fails)
2.
3.

Actual results:
Keeps creating so many deprovisioning pods and projects

Expected results:


Additional info:

Comment 1 Qixuan Wang 2018-07-26 11:27:51 UTC
Here is a similar bug: https://bugzilla.redhat.com/show_bug.cgi?id=1570603 , APBs got fixed by reducing the deprovision frequency.

Comment 2 Piotr Kliczewski 2018-07-30 08:34:26 UTC
Vatsal, Please retest when BZ #1570603 will be released.

Comment 3 Nelly Credi 2018-07-31 12:55:22 UTC
@Vastal, lets take the latest ocp d/s and verify

Comment 4 Vatsal Parekh 2018-08-02 12:37:16 UTC
(In reply to Nelly Credi from comment #3)
> @Vastal, lets take the latest ocp d/s and verify

tested this again on `oc v3.10.27`

I still see the de-provisioning chain happening
The error in the provisioning apb is due to this https://bugzilla.redhat.com/show_bug.cgi?id=1608842

Comment 5 Qixuan Wang 2018-08-21 11:55:51 UTC
Version:
openshift v3.10.27
kubevirt 1.1

Now if an import-vm-apb provision fails, it keeps creating provisioning (not deprovision) namespaces and cdi pods. The interval is about 10 minutes. Once new CNV released I will test it again.

BTW, could you please link a fixed PR? Thanks. 


Here is test result:

[root@cnv-executor-qwang-814-ds-master1 ~]# oc get namespace
NAME                                     STATUS    AGE
default                                  Active    7d
glusterfs                                Active    7d
golden-images                            Active    7d
kube-public                              Active    7d
kube-service-catalog                     Active    7d
kube-system                              Active    7d
localregistry-import-vm-apb-prov-4wslw   Active    1h
localregistry-import-vm-apb-prov-7kcdt   Active    2h
localregistry-import-vm-apb-prov-9hp7x   Active    1h
localregistry-import-vm-apb-prov-9qj5m   Active    1h
localregistry-import-vm-apb-prov-dpr7k   Active    1h
localregistry-import-vm-apb-prov-fsxkk   Active    2h
localregistry-import-vm-apb-prov-hlf5t   Active    2h
localregistry-import-vm-apb-prov-hrj57   Active    2h
localregistry-import-vm-apb-prov-m9vtp   Active    1h
localregistry-import-vm-apb-prov-z4mtj   Active    1h
localregistry-import-vm-apb-prov-zk2kl   Active    1h
management-infra                         Active    7d
openshift                                Active    7d
openshift-ansible-service-broker         Active    7d
openshift-infra                          Active    7d
openshift-logging                        Active    7d
openshift-node                           Active    7d
openshift-sdn                            Active    7d
openshift-template-service-broker        Active    7d
openshift-web-console                    Active    7d
qwang-1                                  Active    2h

[root@cnv-executor-qwang-814-ds-master1 ~]# oc get pod -n qwang-1
NAME                                   READY     STATUS    RESTARTS   AGE
importer-vm-qwang-1-vm-disk-01-29dbb   0/1       Pending   0          2h
importer-vm-qwang-1-vm-disk-01-d6c94   0/1       Pending   0          1h
importer-vm-qwang-1-vm-disk-01-hjqvv   0/1       Pending   0          2h
importer-vm-qwang-1-vm-disk-01-ksld8   0/1       Pending   0          2h
importer-vm-qwang-1-vm-disk-01-lgngp   0/1       Pending   0          1h
importer-vm-qwang-1-vm-disk-01-mhhvw   0/1       Pending   0          1h
importer-vm-qwang-1-vm-disk-01-t8pst   0/1       Pending   0          2h
importer-vm-qwang-1-vm-disk-01-tx6k2   0/1       Pending   0          1h
importer-vm-qwang-1-vm-disk-01-wbq94   0/1       Pending   0          1h
importer-vm-qwang-1-vm-disk-01-xtdwl   0/1       Pending   0          1h
importer-vm-qwang-1-vm-disk-01-zv6fc   0/1       Pending   0          1h

Comment 6 Qixuan Wang 2018-09-03 10:38:25 UTC
Version:
openshift v3.10.27
kubevirt 1.2

[cloud-user@cnv-executor-qwang-93-master1 ~]$ oc get namespace
NAME                                STATUS    AGE
default                             Active    3h
dh-import-vm-apb-depr-5fghl         Active    2h
dh-import-vm-apb-depr-5kw7j         Active    2h
dh-import-vm-apb-depr-6tcmz         Active    2h
dh-import-vm-apb-depr-7wfpc         Active    2h
dh-import-vm-apb-depr-88jzw         Active    2h
dh-import-vm-apb-depr-8kff4         Active    2h
dh-import-vm-apb-depr-92w8d         Active    2h
dh-import-vm-apb-depr-9sv42         Active    2h
dh-import-vm-apb-depr-dlc9w         Active    2h
dh-import-vm-apb-depr-dvpn8         Active    2h
dh-import-vm-apb-depr-fn84v         Active    2h
dh-import-vm-apb-depr-gf88q         Active    2h
dh-import-vm-apb-depr-jc2cd         Active    2h
dh-import-vm-apb-depr-k6ghf         Active    2h
dh-import-vm-apb-depr-pwfp7         Active    2h
dh-import-vm-apb-depr-r76kx         Active    2h
dh-import-vm-apb-depr-r77wg         Active    2h
dh-import-vm-apb-depr-tjxqr         Active    2h
dh-import-vm-apb-depr-ttpfj         Active    2h
dh-import-vm-apb-depr-vrvjr         Active    5s
dh-import-vm-apb-depr-xchxg         Active    2h
dh-import-vm-apb-depr-zksnt         Active    2h
dh-import-vm-apb-depr-zs94p         Active    2h
dh-import-vm-apb-prov-jwltj         Active    2h
dh-import-vm-apb-prov-qtmk9         Active    11s

Comment 7 Qixuan Wang 2018-09-03 10:51:11 UTC
Wait for OpenShift v3.11+ to test it.

Comment 8 Qixuan Wang 2018-09-18 11:48:26 UTC
Same phenomenon on OCP 3.11, CNV 1.2, upstream import-vm-apb. Need to test d/s import-vm-apb-1.2-3

Comment 10 Qixuan Wang 2018-10-08 10:49:42 UTC
Tested on OCP 3.11, CNV 1.2, import-vm-apb-1.2-3


If I set the following values to false, these annoying provision/deprovision chain won't keep displaying. The retry processes show in one or two temporary namespaces and they are deleted after a while. Although we can't see these namespaces using "oc get namespace", their status can still be traced by "oc get namespace -w". Since these flags can be switched by admin, in this case, I can verify the bug. However, imaging users try to look into their pods, the provision/deprovision pods lifecycle are short, is there any way to provide a more stable debuging environment?    


# oc get cm broker-config -n openshift-ansible-service-broker -o yaml
keep_namespace: false
keep_namespace_on_error: false


[root@cnv-executor-qwang-108-master1 ~]# oc get namespace | grep brew
[root@cnv-executor-qwang-108-master1 ~]#
[root@cnv-executor-qwang-108-master1 ~]# oc get namespace -w
NAME                                STATUS    AGE
brew-import-vm-apb-prov-rhg9n   Terminating   6s
brew-import-vm-apb-depr-sm5zf   Active    0s
brew-import-vm-apb-depr-sm5zf   Active    0s
brew-import-vm-apb-prov-rhg9n   Terminating   12s
brew-import-vm-apb-prov-rhg9n   Terminating   12s
brew-import-vm-apb-depr-sm5zf   Terminating   7s
brew-import-vm-apb-depr-sm5zf   Terminating   13s
brew-import-vm-apb-depr-sm5zf   Terminating   13s
brew-import-vm-apb-prov-g89mq   Active    0s
brew-import-vm-apb-prov-g89mq   Active    0s
brew-import-vm-apb-prov-g89mq   Terminating   8s
brew-import-vm-apb-prov-g89mq   Terminating   14s
brew-import-vm-apb-prov-g89mq   Terminating   14s
brew-import-vm-apb-depr-qnm5l   Active    0s
brew-import-vm-apb-depr-qnm5l   Active    0s
brew-import-vm-apb-depr-qnm5l   Terminating   7s
brew-import-vm-apb-depr-qnm5l   Terminating   13s
brew-import-vm-apb-depr-qnm5l   Terminating   13s
brew-import-vm-apb-prov-vgzs7   Active    0s
brew-import-vm-apb-prov-vgzs7   Active    0s
brew-import-vm-apb-prov-vgzs7   Terminating   8s
brew-import-vm-apb-prov-vgzs7   Terminating   14s
brew-import-vm-apb-prov-vgzs7   Terminating   14s
brew-import-vm-apb-depr-cp7fm   Active    0s
brew-import-vm-apb-depr-cp7fm   Active    0s
brew-import-vm-apb-depr-cp7fm   Terminating   6s
brew-import-vm-apb-depr-cp7fm   Terminating   12s
brew-import-vm-apb-depr-cp7fm   Terminating   12s
brew-import-vm-apb-prov-jxrmz   Active    0s
brew-import-vm-apb-prov-jxrmz   Active    0s
brew-import-vm-apb-prov-jxrmz   Terminating   8s
brew-import-vm-apb-prov-jxrmz   Terminating   13s
brew-import-vm-apb-prov-jxrmz   Terminating   13s
brew-import-vm-apb-depr-w7p6s   Active    0s
brew-import-vm-apb-depr-w7p6s   Active    0s
brew-import-vm-apb-depr-w7p6s   Terminating   8s
brew-import-vm-apb-depr-w7p6s   Terminating   13s
brew-import-vm-apb-depr-w7p6s   Terminating   13s
brew-import-vm-apb-prov-n2l25   Active    0s
brew-import-vm-apb-prov-n2l25   Active    1s
brew-import-vm-apb-prov-n2l25   Terminating   8s
brew-import-vm-apb-prov-n2l25   Terminating   13s
brew-import-vm-apb-prov-n2l25   Terminating   13s
brew-import-vm-apb-depr-vnkdn   Active    0s
brew-import-vm-apb-depr-vnkdn   Active    0s
brew-import-vm-apb-depr-vnkdn   Terminating   7s
brew-import-vm-apb-depr-vnkdn   Terminating   12s
brew-import-vm-apb-depr-vnkdn   Terminating   12

Comment 11 Piotr Kliczewski 2018-10-08 13:20:17 UTC
I created BZ #1637010 to track this issue.

Comment 18 Federico Simoncelli 2018-10-15 20:51:20 UTC
Piotr please fill in the "Fixed In Version" field.

Comment 24 Piotr Kliczewski 2018-10-22 13:01:18 UTC
Based on the information provided in BZ #1637010. It looks like we need to change the setting in our deployment.

Comment 25 Ryan Hallisey 2018-11-06 13:52:02 UTC
How is downstream deploying?  keep_namespaces_on_error needs to be set to false.

Comment 26 Ryan Hallisey 2018-11-07 16:24:38 UTC
I don't think this is a bug.  By default, the broker should deploy with keep_namespaces_on_error: false

Comment 30 Nelly Credi 2018-11-22 15:51:41 UTC
@Qixuan, can you please verify?

Comment 31 Qixuan Wang 2018-11-26 14:59:45 UTC
openshift v3.11.43
brew-pulp: cnv-tech-preview/import-vm-apb v1.3.0
Ansible Service Broker Version: 1.4.1


Take provision failure as an example. 

If keep_namespaces_on_error: false (default deployment). Only one ephemeral provision namespace we can see. The thing is, its lifetime is too short to let me come into this namespace and pod to see what happened during ansible steps. 

If keep_namespaces_on_error: true. Each retry provision namespace is kept. At least I can check the ansible steps in any provision pod. Seems the retry interval is around 1 minute, better than before (about 10 seconds). Do we have a timeout or frequency limit for the retry? Any compromise needed here?     

@Ryan,(In reply to Ryan Hallisey from comment #28)
> If the flag is set and we're still seeing namespaces, then the
> bug should be tagged for the broker.

keep_namespaces_on_error set to true or false,which one is "the flag is set"?

Comment 32 Ryan Hallisey 2018-11-27 13:43:28 UTC
"flag is set" would be keep_namespaces_on_error = false

Comment 33 Ryan Hallisey 2018-11-27 13:45:10 UTC
> If keep_namespaces_on_error: false (default deployment). Only one ephemeral provision namespace we can see. The thing is, its lifetime is too short to let me come into this namespace and pod to see what happened during ansible steps. 

This is the expected behavior.  I think we can close this.  If there's an issue with the import-vm-apb, a new bug should be filed.

Comment 35 errata-xmlrpc 2018-12-05 18:56:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:3776


Note You need to log in before you can comment on or make changes to this bug.