1801095 – Failed to force the deployment to use local name lookup

Bug 1801095 - Failed to force the deployment to use local name lookup

Summary: Failed to force the deployment to use local name lookup

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	openshift-controller-manager
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.4.0
Assignee:	Maciej Szulik
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:	workloads
Depends On:	1805155
Blocks:
TreeView+	depends on / blocked

Reported:	2020-02-10 09:16 UTC by zhou ying
Modified:	2020-05-13 21:57 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-05-13 21:57:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
deployment and rs (1.14 KB, application/gzip) 2020-02-11 08:08 UTC, zhou ying	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift apiserver-library-go pull 23	None	closed	Bug 1801095: add all supported versions for pod mutators	2020-10-12 12:30:14 UTC
Github	openshift apiserver-library-go pull 24	None	closed	Bug 1801095: Fix imagepolicyresolve plugin to resolve when enabled on an existing object	2020-10-12 12:30:14 UTC
Github	openshift origin pull 24530	None	closed	Bug 1805155: Fix image resolve plugin on updates and add tests	2020-10-12 12:30:14 UTC
Github	openshift origin pull 24571	None	closed	[release-4.4] Bug 1801095: Fix image resolve plugin on updates and add tests	2020-10-12 12:30:15 UTC
Red Hat Product Errata	RHBA-2020:0581	None	None	None	2020-05-13 21:57:17 UTC

Description zhou ying 2020-02-10 09:16:48 UTC

Description of problem:
Failed to force the deployment to use local name lookup

Version-Release number of selected component (if applicable):
[root@dhcp-140-138 ~]# oc version 
Client Version: 4.4.0-0.nightly-2020-02-10-035806
Kubernetes Version: v1.17.1

How reproducible:
always

Steps to Reproduce:
1. Create ImageStream:
  `oc tag openshift/deployment-example:v1 --source=docker app:v1`
2. Create deploy to use the Imagestream:
  `oc create deployment app --image=app:v1`
3. Set deployment to use local image lookup
  `oc set image-lookup deployment/app`


Actual results:
3.Deploy failed with error:
[root@dhcp-140-138 ~]#  oc describe deployment/app
Name:                   app
Namespace:              zhouy
CreationTimestamp:      Mon, 10 Feb 2020 16:46:13 +0800
Labels:                 app=app
Annotations:            deployment.kubernetes.io/revision: 2879
Selector:               app=app
Replicas:               1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:       app=app
  Annotations:  alpha.image.policy.openshift.io/resolve-names: *
  Containers:
   app:
    Image:        app:v1
    Port:         <none>
    Host Port:    <none>
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      False   MinimumReplicasUnavailable
  Progressing    True    NewReplicaSetCreated
OldReplicaSets:  app-858f464854 (1/1 replicas created), app-b797c6d6d (1/1 replicas created)

`oc get event`
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_app-db87ffc99-cnf8j_zhouy_f02be028-d947-4306-81d3-e9a39ea32faa_0(4da67f3b9df5232326a7df46ddabfc5aba08241e2c0b70d9c76d6ab87ec2cd8e): Multus: error adding pod to network "openshift-sdn": delegateAdd: cannot set "openshift-sdn" interface name to "eth0": validateIfName: no net namespace /proc/725663/ns/net found: failed to Statfs "/proc/725663/ns/net": no such file or directory

Expected results:
3. Pod running


Additional info:

Comment 1 Tomáš Nožička 2020-02-10 17:14:46 UTC

can you get the yaml for the deployment, replicaset and the pod?

Comment 2 zhou ying 2020-02-11 08:08:59 UTC

Created attachment 1662384 [details]
deployment and rs

Comment 3 zhou ying 2020-02-11 08:09:48 UTC

The deploy will burst and old pod will be deleted , so it's hard for me to get the pod's yaml.

Comment 4 Tomáš Nožička 2020-02-11 09:49:23 UTC

Is it creating RSs in a loop? it looks to me this way by deployment.kubernetes.io/revision: 2879 and from the dump.

also the RS has injected image that is already resolved to:
image: openshift/deployment-example@sha256:c505b916f7e5143a356ff961f2c21aee40fbd2cd906c1e3feeb8d5e978da284b

I thought that image plugin is suppose to resolve on Pod level...

Wonder if it is broken on previous releases too

Comment 5 Tomáš Nožička 2020-02-12 16:31:41 UTC

The image resolve admission plugin works (opened https://github.com/openshift/origin/pull/24530 to prove it) but `oc set image-lookup deploy/app` sets the annotation only for the template not itself which is what triggers the deepequal hotloop.

Comment 6 Ricardo Maraschini 2020-02-13 12:08:44 UTC

OC client sets the annotation only on the template for at least 3 years: 

https://github.com/openshift/oc/blame/master/pkg/cli/set/imagelookup.go#L236-L251

I wonder what else might have changed to make this stop working, I will check on 4.2 to see if it is working there.

Comment 7 Ricardo Maraschini 2020-02-13 13:14:49 UTC

1. We don't have the Deployment recreation loop on 4.2.
2. The annotation is created only on template when on 4.2(See comment-5) as it is in the 4.4.
3. It seems like there is no difference between the created Deployments from different revisions during the loop on 4.4(that is weird).
4. There is a constant increase on Deployment's `status.collisionCount` during the loop on 4.4.

Comment 8 Ricardo Maraschini 2020-02-13 13:16:18 UTC

I have patched oc to also add the annotation on the deployment, together with the template. That does not solved.

Comment 9 Ricardo Maraschini 2020-02-13 13:27:22 UTC

oc patch deploy/redis -p '{"spec":{"template":{"metadata":{"annotations":{"alpha.image.policy.openshift.io/resolve-names":"*"}}}}}' --type=merge

or 

oc patch deploy/redis -p '{"metadata":{"annotations":{"alpha.image.policy.openshift.io/resolve-names":"*"}},"spec":{"template":{"metadata":{"annotations":{"alpha.image.policy.openshift.io/resolve-names":"*"}}}}}' --type=merge

both cause the same problem.

Comment 11 Maciej Szulik 2020-02-13 14:20:39 UTC

This is a problem with mutators in ImagePolicy admission, working on a fix atm.

Comment 13 Tomáš Nožička 2020-02-20 08:46:41 UTC

fyi, I have found out that it doesn't matter if the annotation is on the object it self or the template, if the object is being registered for resolve. The real issue here is that the admission was incorrectly skipping updates that were enabling the resolve. More details on the referenced PRs. Also CronJobs weren't registered which we fixed as well.

Comment 15 zhou ying 2020-02-25 05:27:49 UTC

Confirmed with latest payload:4.4.0-0.nightly-2020-02-24-105333 , can't reproduce the issue now.

Comment 17 errata-xmlrpc 2020-05-13 21:57:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581

Note You need to log in before you can comment on or make changes to this bug.