Bug 1768535

Summary: Builds stalled after migration
Product: Migration Toolkit for Containers Reporter: Sergio <sregidor>
Component: GeneralAssignee: Dylan Murray <dymurray>
Status: CLOSED WONTFIX QA Contact: Sergio <sregidor>
Severity: low Docs Contact: Avital Pinnick <apinnick>
Priority: low    
Version: 1.3.0CC: chezhang, jmatthew, rpattath, xjiang
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1779690 1831615 (view as bug list) Environment:
Last Closed: 2022-10-14 19:34:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1779690, 1831615    
Bug Blocks:    

Description Sergio 2019-11-04 16:31:05 UTC
Description of problem:
When an application that has been built using openshift BuildConfig is migrated, a build is triggered after the migration and it is stuck in the target cluster.

Version-Release number of selected component (if applicable):
Target cluster:
OCP 4.2
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-11-01-115323   True        False         12h     Cluster version is 4.2.0-0.nightly-2019-11-01-115323

Source cluster:
OCP 4.1
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-11-01-212340   True        False         12h     Cluster version is 4.1.0-0.nightly-2019-11-01-212340



Controller:
      image: registry.redhat.io/rhcam-1-0/openshift-migration-controller-rhel8:v1.0
      imageID: registry.redhat.io/rhcam-1-0/openshift-migration-controller-rhel8@sha256:fe81b226dd2f79541fac9f2ba8766086d2d93fcd2f8ca0f7efbb86e3ff1f1f42

Velero:
      image: registry.redhat.io/rhcam-1-0/openshift-migration-velero-rhel8:v1.0
      imageID: registry.redhat.io/rhcam-1-0/openshift-migration-velero-rhel8@sha256:e4afec9bf56e75fc7fda793a31a7ed21fa87babf1abd779ef2865085c6cc3449
      image: registry.redhat.io/rhcam-1-0/openshift-migration-plugin-rhel8:v1.0
      imageID: registry.redhat.io/rhcam-1-0/openshift-migration-plugin-rhel8@sha256:f8c18177972624a209cd20277f844d885abd243161b50f96f7a37636b7d2f042


How reproducible:
Always

Steps to Reproduce:
1. $ oc new-project cakephp
2. $ oc new-app cakephp-mysql-persistent
3. Migrate the app

Actual results:
$ oc get pods
NAME                                  READY   STATUS      RESTARTS   AGE
cakephp-mysql-persistent-1-build      0/1     Init:0/2    0          2m16s
cakephp-mysql-persistent-1-deploy     0/1     Completed   0          2m16s
cakephp-mysql-persistent-1-hook-pre   0/1     Completed   0          2m7s
cakephp-mysql-persistent-1-lqp8p      1/1     Running     0          88s
mysql-1-88bl9                         1/1     Running     0          2m11s
mysql-1-deploy                        0/1     Completed   0          2m28s

$ oc describe pod cakephp-mysql-persistent-1-build | grep Warning
  Warning  FailedMount  98s                  kubelet, ip-10-0-62-108.us-east-2.compute.internal  Unable to mount volumes for pod "cakephp-mysql-persistent-1-build_cakephp(fa364038-ff18-11e9-a1d6-020df951b8fc)": timeout expired waiting for volumes to attach or mount for pod "cakephp"/"cakephp-mysql-persistent-1-build". list of unmounted volumes=[build-proxy-ca-bundles]. list of unattached volumes=[buildcachedir buildworkdir builder-dockercfg-hg79t-push builder-dockercfg-hg79t-pull build-system-configs build-ca-bundles build-proxy-ca-bundles container-storage-root build-blob-cache builder-token-qjxhz]
  Warning  FailedMount  93s (x9 over 3m41s)  kubelet, ip-10-0-62-108.us-east-2.compute.internal  MountVolume.SetUp failed for volume "build-proxy-ca-bundles" : configmaps "cakephp-mysql-persistent-1-global-ca" not found

$ oc get cm
NAME                                    DATA   AGE
cakephp-mysql-persistent-1-ca           1      4m
cakephp-mysql-persistent-1-sys-config   0      4m

There is no "cakephp-mysql-persistent-1-global-ca" config map.


Expected results:
The application should be deployed normally, and no build should be stuck.


Additional info:

It seems that now "global-ca" config map is created when a build is executed in 4.2. Since it's not like that in previous versions, the map is not there after the migration and the result is that the build is stuck.

It seems related to: https://bugzilla.redhat.com/show_bug.cgi?id=1745192

Comment 1 Sergio 2019-11-19 15:13:01 UTC
When the migration is done from a 4.2 cluster to a 4.2 cluster, the result of this bug is a failure in the build because of wrong certificates. It seems that we are migrating the configmap with the certificates of the source cluster.

$ oc get pods
NAME                             READY   STATUS      RESTARTS   AGE
jenkins-1-5pzgd                  1/1     Running     0          35m
jenkins-1-deploy                 0/1     Completed   0          35m
mongodb-1-deploy                 0/1     Completed   0          35m
mongodb-1-s74jb                  1/1     Running     0          34m
nodejs-mongodb-example-1-build   0/1     Error       0          34m
(python2_virtual_env) [fedora@preserve-appmigration-workmachine work]$ oc logs nodejs-mongodb-example-1-build
Caching blobs under "/var/cache/blobs".
Warning: Pull failed, retrying in 5s ...
Warning: Pull failed, retrying in 5s ...
Warning: Pull failed, retrying in 5s ...
error: build error: After retrying 2 times, Pull image still failed due to error: while pulling "docker://image-registry.openshift-image-registry.svc:5000/openshift/nodejs" as "image-registry.openshift-image-registry.svc:5000/openshift/nodejs": Error initializing source docker://image-registry.openshift-image-registry.svc:5000/openshift/nodejs:latest: pinging docker registry returned: Get https://image-registry.openshift-image-registry.svc:5000/v2/: x509: certificate signed by unknown authority

Comment 3 Dylan Murray 2019-12-16 17:07:34 UTC
I can confirm that I have reproduced this bug. The build fails with

error: build error: After retrying 2 times, Pull image still failed due to error: while pulling "docker://image-registry.openshift-image-registry.svc:5000/openshift/php" as "image-registry.openshift-image-registry.svc:5000/openshift/php": Error initializing source docker://image-registry.openshift-image-registry.svc:5000/openshift/php:latest: pinging docker registry returned: Get https://image-registry.openshift-image-registry.svc:5000/v2/: x509: certificate signed by unknown authority


After migration.

Comment 4 John Matthews 2019-12-16 17:26:50 UTC
This didn't make it in time for our z-stream release of 1.0.1, aligning to 4.3.0 to go out in our next 1.1.0 release in ~end Jan.

Comment 5 John Matthews 2019-12-16 17:27:36 UTC
*** Bug 1779690 has been marked as a duplicate of this bug. ***

Comment 6 John Matthews 2020-07-27 16:05:43 UTC
*** Bug 1831615 has been marked as a duplicate of this bug. ***

Comment 7 Erik Nelson 2021-04-07 20:58:59 UTC
Sergio, could you confirm this remains an issue with our current release?

Comment 9 John Matthews 2022-10-14 19:34:59 UTC
Please feel free to reopen this bug if you believe it is still relevant.

Comment 10 Red Hat Bugzilla 2023-09-18 00:18:10 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days