Description of problem: if a import-vm-apb had a failure, it kept creating deprovisioning namespaces and pods, and as pods also seemed to be having error, it kept creating more deprovisioning pods resulting in creating 2000+ projects with errors Version-Release number of selected component (if applicable): CNV1.1(KubeVirt 0.6.2) import-vm-apb (0.6.2-1) How reproducible: Steps to Reproduce: 1. try to import a vm with a name that is already there (so it fails) 2. 3. Actual results: Keeps creating so many deprovisioning pods and projects Expected results: Additional info:
Here is a similar bug: https://bugzilla.redhat.com/show_bug.cgi?id=1570603 , APBs got fixed by reducing the deprovision frequency.
Vatsal, Please retest when BZ #1570603 will be released.
@Vastal, lets take the latest ocp d/s and verify
(In reply to Nelly Credi from comment #3) > @Vastal, lets take the latest ocp d/s and verify tested this again on `oc v3.10.27` I still see the de-provisioning chain happening The error in the provisioning apb is due to this https://bugzilla.redhat.com/show_bug.cgi?id=1608842
Version: openshift v3.10.27 kubevirt 1.1 Now if an import-vm-apb provision fails, it keeps creating provisioning (not deprovision) namespaces and cdi pods. The interval is about 10 minutes. Once new CNV released I will test it again. BTW, could you please link a fixed PR? Thanks. Here is test result: [root@cnv-executor-qwang-814-ds-master1 ~]# oc get namespace NAME STATUS AGE default Active 7d glusterfs Active 7d golden-images Active 7d kube-public Active 7d kube-service-catalog Active 7d kube-system Active 7d localregistry-import-vm-apb-prov-4wslw Active 1h localregistry-import-vm-apb-prov-7kcdt Active 2h localregistry-import-vm-apb-prov-9hp7x Active 1h localregistry-import-vm-apb-prov-9qj5m Active 1h localregistry-import-vm-apb-prov-dpr7k Active 1h localregistry-import-vm-apb-prov-fsxkk Active 2h localregistry-import-vm-apb-prov-hlf5t Active 2h localregistry-import-vm-apb-prov-hrj57 Active 2h localregistry-import-vm-apb-prov-m9vtp Active 1h localregistry-import-vm-apb-prov-z4mtj Active 1h localregistry-import-vm-apb-prov-zk2kl Active 1h management-infra Active 7d openshift Active 7d openshift-ansible-service-broker Active 7d openshift-infra Active 7d openshift-logging Active 7d openshift-node Active 7d openshift-sdn Active 7d openshift-template-service-broker Active 7d openshift-web-console Active 7d qwang-1 Active 2h [root@cnv-executor-qwang-814-ds-master1 ~]# oc get pod -n qwang-1 NAME READY STATUS RESTARTS AGE importer-vm-qwang-1-vm-disk-01-29dbb 0/1 Pending 0 2h importer-vm-qwang-1-vm-disk-01-d6c94 0/1 Pending 0 1h importer-vm-qwang-1-vm-disk-01-hjqvv 0/1 Pending 0 2h importer-vm-qwang-1-vm-disk-01-ksld8 0/1 Pending 0 2h importer-vm-qwang-1-vm-disk-01-lgngp 0/1 Pending 0 1h importer-vm-qwang-1-vm-disk-01-mhhvw 0/1 Pending 0 1h importer-vm-qwang-1-vm-disk-01-t8pst 0/1 Pending 0 2h importer-vm-qwang-1-vm-disk-01-tx6k2 0/1 Pending 0 1h importer-vm-qwang-1-vm-disk-01-wbq94 0/1 Pending 0 1h importer-vm-qwang-1-vm-disk-01-xtdwl 0/1 Pending 0 1h importer-vm-qwang-1-vm-disk-01-zv6fc 0/1 Pending 0 1h
Version: openshift v3.10.27 kubevirt 1.2 [cloud-user@cnv-executor-qwang-93-master1 ~]$ oc get namespace NAME STATUS AGE default Active 3h dh-import-vm-apb-depr-5fghl Active 2h dh-import-vm-apb-depr-5kw7j Active 2h dh-import-vm-apb-depr-6tcmz Active 2h dh-import-vm-apb-depr-7wfpc Active 2h dh-import-vm-apb-depr-88jzw Active 2h dh-import-vm-apb-depr-8kff4 Active 2h dh-import-vm-apb-depr-92w8d Active 2h dh-import-vm-apb-depr-9sv42 Active 2h dh-import-vm-apb-depr-dlc9w Active 2h dh-import-vm-apb-depr-dvpn8 Active 2h dh-import-vm-apb-depr-fn84v Active 2h dh-import-vm-apb-depr-gf88q Active 2h dh-import-vm-apb-depr-jc2cd Active 2h dh-import-vm-apb-depr-k6ghf Active 2h dh-import-vm-apb-depr-pwfp7 Active 2h dh-import-vm-apb-depr-r76kx Active 2h dh-import-vm-apb-depr-r77wg Active 2h dh-import-vm-apb-depr-tjxqr Active 2h dh-import-vm-apb-depr-ttpfj Active 2h dh-import-vm-apb-depr-vrvjr Active 5s dh-import-vm-apb-depr-xchxg Active 2h dh-import-vm-apb-depr-zksnt Active 2h dh-import-vm-apb-depr-zs94p Active 2h dh-import-vm-apb-prov-jwltj Active 2h dh-import-vm-apb-prov-qtmk9 Active 11s
Wait for OpenShift v3.11+ to test it.
Same phenomenon on OCP 3.11, CNV 1.2, upstream import-vm-apb. Need to test d/s import-vm-apb-1.2-3
Tested on OCP 3.11, CNV 1.2, import-vm-apb-1.2-3 If I set the following values to false, these annoying provision/deprovision chain won't keep displaying. The retry processes show in one or two temporary namespaces and they are deleted after a while. Although we can't see these namespaces using "oc get namespace", their status can still be traced by "oc get namespace -w". Since these flags can be switched by admin, in this case, I can verify the bug. However, imaging users try to look into their pods, the provision/deprovision pods lifecycle are short, is there any way to provide a more stable debuging environment? # oc get cm broker-config -n openshift-ansible-service-broker -o yaml keep_namespace: false keep_namespace_on_error: false [root@cnv-executor-qwang-108-master1 ~]# oc get namespace | grep brew [root@cnv-executor-qwang-108-master1 ~]# [root@cnv-executor-qwang-108-master1 ~]# oc get namespace -w NAME STATUS AGE brew-import-vm-apb-prov-rhg9n Terminating 6s brew-import-vm-apb-depr-sm5zf Active 0s brew-import-vm-apb-depr-sm5zf Active 0s brew-import-vm-apb-prov-rhg9n Terminating 12s brew-import-vm-apb-prov-rhg9n Terminating 12s brew-import-vm-apb-depr-sm5zf Terminating 7s brew-import-vm-apb-depr-sm5zf Terminating 13s brew-import-vm-apb-depr-sm5zf Terminating 13s brew-import-vm-apb-prov-g89mq Active 0s brew-import-vm-apb-prov-g89mq Active 0s brew-import-vm-apb-prov-g89mq Terminating 8s brew-import-vm-apb-prov-g89mq Terminating 14s brew-import-vm-apb-prov-g89mq Terminating 14s brew-import-vm-apb-depr-qnm5l Active 0s brew-import-vm-apb-depr-qnm5l Active 0s brew-import-vm-apb-depr-qnm5l Terminating 7s brew-import-vm-apb-depr-qnm5l Terminating 13s brew-import-vm-apb-depr-qnm5l Terminating 13s brew-import-vm-apb-prov-vgzs7 Active 0s brew-import-vm-apb-prov-vgzs7 Active 0s brew-import-vm-apb-prov-vgzs7 Terminating 8s brew-import-vm-apb-prov-vgzs7 Terminating 14s brew-import-vm-apb-prov-vgzs7 Terminating 14s brew-import-vm-apb-depr-cp7fm Active 0s brew-import-vm-apb-depr-cp7fm Active 0s brew-import-vm-apb-depr-cp7fm Terminating 6s brew-import-vm-apb-depr-cp7fm Terminating 12s brew-import-vm-apb-depr-cp7fm Terminating 12s brew-import-vm-apb-prov-jxrmz Active 0s brew-import-vm-apb-prov-jxrmz Active 0s brew-import-vm-apb-prov-jxrmz Terminating 8s brew-import-vm-apb-prov-jxrmz Terminating 13s brew-import-vm-apb-prov-jxrmz Terminating 13s brew-import-vm-apb-depr-w7p6s Active 0s brew-import-vm-apb-depr-w7p6s Active 0s brew-import-vm-apb-depr-w7p6s Terminating 8s brew-import-vm-apb-depr-w7p6s Terminating 13s brew-import-vm-apb-depr-w7p6s Terminating 13s brew-import-vm-apb-prov-n2l25 Active 0s brew-import-vm-apb-prov-n2l25 Active 1s brew-import-vm-apb-prov-n2l25 Terminating 8s brew-import-vm-apb-prov-n2l25 Terminating 13s brew-import-vm-apb-prov-n2l25 Terminating 13s brew-import-vm-apb-depr-vnkdn Active 0s brew-import-vm-apb-depr-vnkdn Active 0s brew-import-vm-apb-depr-vnkdn Terminating 7s brew-import-vm-apb-depr-vnkdn Terminating 12s brew-import-vm-apb-depr-vnkdn Terminating 12
I created BZ #1637010 to track this issue.
Piotr please fill in the "Fixed In Version" field.
Based on the information provided in BZ #1637010. It looks like we need to change the setting in our deployment.
How is downstream deploying? keep_namespaces_on_error needs to be set to false.
I don't think this is a bug. By default, the broker should deploy with keep_namespaces_on_error: false
@Qixuan, can you please verify?
openshift v3.11.43 brew-pulp: cnv-tech-preview/import-vm-apb v1.3.0 Ansible Service Broker Version: 1.4.1 Take provision failure as an example. If keep_namespaces_on_error: false (default deployment). Only one ephemeral provision namespace we can see. The thing is, its lifetime is too short to let me come into this namespace and pod to see what happened during ansible steps. If keep_namespaces_on_error: true. Each retry provision namespace is kept. At least I can check the ansible steps in any provision pod. Seems the retry interval is around 1 minute, better than before (about 10 seconds). Do we have a timeout or frequency limit for the retry? Any compromise needed here? @Ryan,(In reply to Ryan Hallisey from comment #28) > If the flag is set and we're still seeing namespaces, then the > bug should be tagged for the broker. keep_namespaces_on_error set to true or false,which one is "the flag is set"?
"flag is set" would be keep_namespaces_on_error = false
> If keep_namespaces_on_error: false (default deployment). Only one ephemeral provision namespace we can see. The thing is, its lifetime is too short to let me come into this namespace and pod to see what happened during ansible steps. This is the expected behavior. I think we can close this. If there's an issue with the import-vm-apb, a new bug should be filed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:3776