Bug 1928805

Summary: subctl e2e fails on first test, but message is misleading
Product: Red Hat Advanced Cluster Management for Kubernetes Reporter: Noam Manos <nmanos>
Component: SubmarinerAssignee: tpanteli
Status: CLOSED ERRATA QA Contact: Noam Manos <nmanos>
Severity: low Docs Contact: Christopher Dawson <cdawson>
Priority: low    
Version: rhacm-2.2CC: mkolesni, smattar, tfreger, tpanteli
Target Milestone: ---Flags: smattar: rhacm-2.2.z+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-04 19:31:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Noam Manos 2021-02-15 15:28:33 UTC
Description of problem:
subctl e2e asumes kubeconfig context name is the same as cluster ID, while it may be different, and in that case - fails.

Version-Release number of selected component (if applicable):
Submariner 0.8.1

How reproducible:
Always

Steps to Reproduce:

https://qe-jenkins-csb-skynet.cloud.paas.psi.redhat.com/job/Maintenance/job/debug_job/1136/Test-Report/


Actual results:

$ oc  config view

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
https://api.pkomarov-cluster-a.devcluster.openshift.com:6443
  name: api-pkomarov-cluster-a-devcluster-openshift-com:6443
- cluster:
    certificate-authority-data: DATA+OMITTED
https://api.pkomarov-cluster-a.devcluster.openshift.com:6443
  name: pkomarov-cluster-a
contexts:
- context:
    cluster: api-pkomarov-cluster-a-devcluster-openshift-com:6443
    namespace: default
    user: master/api-pkomarov-cluster-a-devcluster-openshift-com:6443
  name: default/api-pkomarov-cluster-a-devcluster-openshift-com:6443/master
- context:
    cluster: pkomarov-cluster-a
    namespace: default
    user: admin
  name: pkomarov-cluster-a
current-context: pkomarov-cluster-a
kind: Config
preferences: {}
users:
- name: admin
  user:
    client-certificate-data: REDACTED
    client-key-data: REDACTED
- name: master/api-pkomarov-cluster-a-devcluster-openshift-com:6443
  user:
    token: REDACTED

$ oc  config view

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
https://api.default-cl2.devcluster.openshift.com:6443
  name: api-default-cl2-devcluster-openshift-com:6443
- cluster:
    certificate-authority-data: DATA+OMITTED
https://api.default-cl2.devcluster.openshift.com:6443
  name: default-cl2
contexts:
- context:
    cluster: api-default-cl2-devcluster-openshift-com:6443
    namespace: default
    user: master/api-default-cl2-devcluster-openshift-com:6443
  name: default/api-default-cl2-devcluster-openshift-com:6443/master
- context:
    cluster: default-cl2
    namespace: default
    user: admin
  name: pkomarov-cluster-b
current-context: pkomarov-cluster-b
kind: Config
preferences: {}
users:
- name: admin
  user:
    client-certificate-data: REDACTED
    client-key-data: REDACTED
- name: master/api-default-cl2-devcluster-openshift-com:6443
  user:
    token: REDACTED
- name: ocp_usr/api-default-cl2-devcluster-openshift-com:6443
  user: {}


$ subctl verify --only service-discovery,connectivity --verbose /mnt/skynet-data/pkomarov-env/pkomarov-cluster-a/auth/kubeconfig /mnt/skynet-data/pkomarov-env/ocpup/.config/cl2/auth/kubeconfig
 
Performing the following verifications: service-discovery, connectivity
Running Suite: Submariner E2E suite
===================================
Random Seed: 1613399570
Will run 32 of 34 specs

STEP: Creating kubernetes clients
STEP: Setting cluster ID "pkomarov-cluster-a" for kube context name "pkomarov-cluster-a"
STEP: Setting cluster ID "default-cl2" for kube context name "default-cl2"
STEP: Creating lighthouse clients
[] E2E failed

subctl version: v0.8.1

Failure [245.390 seconds]
[BeforeSuite] BeforeSuite 
/go/src/github.com/submariner-io/submariner-operator/vendor/github.com/submariner-io/shipyard/test/e2e/e2e.go:24

  Failed to find Clusters to detect if Globalnet is enabled. No Cluster found
  Unexpected error:
      <*errors.errorString | 0xc000404200>: {
          s: "timed out waiting for the condition",
      }
      timed out waiting for the condition
  occurred

  /go/src/github.com/submariner-io/submariner-operator/vendor/github.com/submariner-io/shipyard/test/e2e/framework/framework.go:458
------------------------------

Ran 32 of 0 Specs in 245.390 seconds
FAIL! -- 0 Passed | 32 Failed | 0 Pending | 0 Skipped


Expected results:
subctl e2e framework (shipyard) should use the correct kubeconfig context name:
(for cluster B the name is: "pkomarov-cluster-b" and not "default-cl2"

Comment 1 tpanteli 2021-02-15 19:18:04 UTC
The error “Failed to find Clusters to detect if Globalnet is enabled. No Cluster found” means that no Cluster resource was found in ClusterA (the first cluster passed in which I assume was "pkomarov-cluster-a"). This is not related to the assertion in the issue title ("assumes kubeconfig context name is the same as cluster ID"). It actually no longer assumes that, ie it obtains the cluster ID from the SUBMARINER_CLUSTERID env var of the DaemonSet Spec for the gateway. This is reflected in the message "STEP: Setting cluster ID "default-cl2" for kube context name "default-cl2"", which indicates the kube context name and obtained cluster ID are one and the same for clusterB. However the cluster ID/name is only used for display in messages except for one case in LH E2E where it's used to obtain the health check IP.

Comment 2 Noam Manos 2021-02-16 12:46:27 UTC
But "STEP: Setting cluster ID "default-cl2" for kube context name "default-cl2"" is indicating that e2e evaluates wrong data -
There's no such context name "default-cl2", but only cluster id "default-cl2":

- context:
    cluster: default-cl2
    namespace: default
    user: admin
  name: pkomarov-cluster-b

Comment 3 tpanteli 2021-02-16 12:54:02 UTC
The context name comes from what you pass in on the command line, presumably /mnt/skynet-data/pkomarov-env/ocpup/.config/cl2/auth/kubeconfig.

Comment 4 tpanteli 2021-02-16 13:18:01 UTC
Actually it is correct, ie extracts the Cluster field from the current context in the config file. TestContext.ClusterIDs is intended to be the cluster ID/name and not the context name as it's used for display in output messages. When running e2e from the make target, TestContext.ClusterIDs is initialized to the context name passed in.

Comment 5 tpanteli 2021-02-16 13:29:47 UTC
(In reply to tpanteli from comment #4)
> Actually it is correct, ie extracts the Cluster field from the current
> context in the config file. TestContext.ClusterIDs is intended to be the
> cluster ID/name and not the context name as it's used for display in output
> messages. When running e2e from the make target, TestContext.ClusterIDs is
> initialized to the context name passed in.

To clarify, the functionality is correct, ie TestContext.ClusterIDs is set correctly, but the message is misleading. The format params for the message are reversed although in this case it doesn't matter:

    By(fmt.Sprintf("Setting cluster ID %q for kube context name %q", TestContext.ClusterIDs[i], envVar.Value))

Also we shouldn't print the message if both values are the same.

Comment 6 tpanteli 2021-02-16 16:24:33 UTC
(In reply to tpanteli from comment #5)
> (In reply to tpanteli from comment #4)
> > Actually it is correct, ie extracts the Cluster field from the current
> > context in the config file. TestContext.ClusterIDs is intended to be the
> > cluster ID/name and not the context name as it's used for display in output
> > messages. When running e2e from the make target, TestContext.ClusterIDs is
> > initialized to the context name passed in.
> 
> To clarify, the functionality is correct, ie TestContext.ClusterIDs is set
> correctly, but the message is misleading. The format params for the message
> are reversed although in this case it doesn't matter:
> 
>     By(fmt.Sprintf("Setting cluster ID %q for kube context name %q",
> TestContext.ClusterIDs[i], envVar.Value))
> 
> Also we shouldn't print the message if both values are the same.

Submitted https://github.com/submariner-io/shipyard/pull/456

Comment 7 Noam Manos 2021-02-21 10:09:26 UTC
The root cause of the failure was Submariner installation failure, which lead to E2E fail on first test.
The message printed was misleading, there should probably be a Ginkgo "BeforeTest" step, that verifies that submariner is uninstall
(e.g. as subctl show all would return "Submariner is not installed" - e2e should do the same).

Comment 8 Noam Manos 2021-02-21 10:11:31 UTC
* stet that verifies that submariner is not installed

Comment 15 errata-xmlrpc 2021-05-04 19:31:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHEA: Submariner 0.8 - bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:1500