Bug 1777337

Summary: Build pod label truncation can cause build failure
Product: OpenShift Container Platform Reporter: Mark McLoughlin <markmc>
Component: BuildAssignee: Gabe Montero <gmontero>
Status: CLOSED ERRATA QA Contact: wewang <wewang>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.11.0CC: aos-bugs, gmontero, obulatov, wzheng
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: build label generation/validation was not fully conforming to k8s expectations Consequence: builds could fail with certain build config names with invalid label errors Fix: build controller and build apiserver now use complete k8s validation routines to ensure any added build labels will meet k8s label criteria Result: builds with any valid build config name will not fail because of invalid build label values
Story Points: ---
Clone Of:
: 1804934 (view as bug list) Environment:
Last Closed: 2020-05-04 11:17:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1804934    

Description Mark McLoughlin 2019-11-27 12:53:57 UTC
Description of problem:

Trying to do a build named cluster-kube-controller-manager-operator-4.3.0.ipv6-2019-11-27-001 is failing with this error:

Error creating: Pod "cluster-kube-controller-manager-operator-4.3.0.ipv6-2019-11-27-0001-build" is invalid: metadata.labels: Invalid value: "cluster-kube-controller-manager-operator-4.3.0.ipv6-2019-11-27-": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')


Version-Release number of selected component (if applicable):

This is on the ci.openshift.org cluster:

$ oc version
Client Version: openshift-clients-4.2.0-201910041700
Kubernetes Version: v1.11.0+d4cacc0


Expected results:

The label truncation happens here:

// LabelValue returns a string to use as a value for the Build                                                                                                                                        
// label in a pod. If the length of the string parameter exceeds                                                                                                                                      
// the maximum label length, the value will be truncated.                                                                                                                                             
func LabelValue(name string) string {
        if len(name) <= validation.DNS1123LabelMaxLength {
                return name
        }
        return name[:validation.DNS1123LabelMaxLength]
}



If a label must end with an alphanum, then a simple truncation improvement could be to walk back from :63 to the first alphanum

Comment 3 wewang 2020-02-12 12:16:38 UTC
Now expectedOutput: "cluster-kube-controller-manager-operator-4", I counted it, it has 43 characters, not 63 characters, and when I set expectedOutput: "cluster-kube-controller-manager-operator-4.3.0.ipv6-2019-11-27-0001-build", then unit test still pass, it should fail, right? if I am wrong, please correct me, thanks.

Comment 4 wewang 2020-02-12 12:21:43 UTC
Tested in version, error exists
4.4.0-0.nightly-2020-02-11-200858

Steps:
1. Using bc:
apiVersion: build.openshift.io/v1
kind: BuildConfig
metadata:
  name: cluster-kube-controller-manager-operator-4.3.0.ipv6-2019-11-27-001
spec:
  postCommit: {}
  resources: {}
  runPolicy: Serial
  source:
    type: Git
    git:
      uri: https://github.com/openshift/ruby-hello-world.git
  strategy:
    sourceStrategy:
      from:
        kind: ImageStreamTag
        name: ruby:2.5
        namespace: openshift
    type: Source

2. [root@wangwen ~]# oc start-build cluster-kube-controller-manager-operator-4.3.0.ipv6-2019-11-27-001
The Build "cluster-kube-controller-manager-operator-4.3.0.ipv6-2019-11-27-001-1" is invalid: metadata.labels: Invalid value: "cluster-kube-controller-manager-operator-4.3.0.ipv6-2019-11-27-": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')

Comment 5 Gabe Montero 2020-02-12 15:21:39 UTC
RE #Comment 3 @Wen

When I made the change you described, the unit test failed for me:

=== RUN   TestLabelValue
--- FAIL: TestLabelValue (0.00s)
    util_test.go:32: tc do-not-end-with-hyphen got cluster-kube-controller-manager-operator-4 for cluster-kube-controller-manager-operator-4.3.0.ipv6-2019-11-27-0001-build instead of cluster-kube-controller-manager-operator-4.3.0.ipv6-2019-11-27-0001-build
FAIL


With changing the section in unit_test.go to 

		{
			name:           "do-not-end-with-hyphen",
			input:          "cluster-kube-controller-manager-operator-4.3.0.ipv6-2019-11-27-0001-build",
			expectedOutput: "cluster-kube-controller-manager-operator-4.3.0.ipv6-2019-11-27-0001-build",
		},

Also, on the expected correct value of cluster-kube-controller-manager-operator-4 remember that the length is not the only factor.
As the error message in #Comment 4 notes, you cannot have "-", "_", nor "." in addition to the length.

So that eliminates the entire section ".3.0.ipv6-2019-11-27-0001-build" from the original string.

I am waiting for my dev cluster to come up and will try your BC from #Comment 4 when it is ready.

Comment 6 Gabe Montero 2020-02-12 15:53:58 UTC
Yep looks like there are some openshift-apiserver analogous changes that we'll need as well

I suspect our CI did not hit them because our CI instantiates build objects directly vs. hitting the buildconfigs instantiate endpoint

Moving back to POST to work on that scenario.

Comment 9 wewang 2020-02-21 08:07:54 UTC
There's no vaild 4.5 nightly build payload until now, will wait for it.

Comment 10 wewang 2020-03-09 03:26:17 UTC
Verified in version:
4.5.0-0.nightly-2020-03-06-190457

Steps are same with 4.4 related bug.

Comment 12 errata-xmlrpc 2020-05-04 11:17:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581