Bug 1685338 - Distinguish between "could not talk to Cincinnati" and "Cincinnati did not find my channel/version in the graph"
Summary: Distinguish between "could not talk to Cincinnati" and "Cincinnati did not fi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.3.0
Assignee: W. Trevor King
QA Contact: liujia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-05 01:39 UTC by jooho lee
Modified: 2020-01-23 11:04 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-23 11:03:45 UTC
Target Upstream Version:


Attachments (Terms of Use)
error snapshot. (55.38 KB, image/png)
2019-03-05 01:39 UTC, jooho lee
no flags Details


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 268 'None' closed Bug 1685338: pkg/cvo: Reason granularity for RemoteFailed 2020-06-29 21:22:46 UTC
Red Hat Product Errata RHBA-2020:0062 None None None 2020-01-23 11:03:59 UTC

Description jooho lee 2019-03-05 01:39:30 UTC
Created attachment 1540796 [details]
error snapshot.

Description of problem:

I build openshift installer with latest source and now successfully deployed.

However, Cluster setting from webconsole show error messages "Could not retrieve updates. Unable to retrieve available updates: unknown version 4.0.0-0.alpha-2019-03-04-190542"

Version-Release number of selected component (if applicable):

$ ./openshift-install-latest version

./openshift-install-latest unreleased-master-490-g0ddac41932688237c599cc8a7231d624a08dfc29



How reproducible:
always

Steps to Reproduce:
1. openshift-install create cluster
2.
3.

Actual results:
 Error Retrieving

Expected results:
No errors

Additional info:

Comment 1 Yadan Pei 2019-03-06 09:16:13 UTC
Hi,

What's the channel of your cluster is configured? 

$ oc get clusterversion -o yaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: 2019-03-05T02:19:56Z
    generation: 4
    name: version
    resourceVersion: "1316722"
    selfLink: /apis/config.openshift.io/v1/clusterversions/version
    uid: 2b13f0aa-3eed-11e9-a87d-067a76ada122
  spec:
    channel: stable-4.0
    clusterID: 2702d9e2-43aa-417e-9fd9-f8dcbc801644
    upstream: https://api.openshift.com/api/upgrades_info/v1/graph

I think if the channel you configured is not reachable then the error message should be expected

Comment 2 jooho lee 2019-03-06 21:49:39 UTC
Hi,

This is my one

~~~

apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: 2019-03-04T23:01:51Z
    generation: 1
    name: version
    namespace: ""
    resourceVersion: "1683918"
    selfLink: /apis/config.openshift.io/v1/clusterversions/version
    uid: 7f410645-3ed1-11e9-8fe5-02c285573cb0
  spec:
    channel: stable-4.0
    clusterID: df5adfd6-fedf-49a3-a171-61c1ac0c142a
    upstream: https://api.openshift.com/api/upgrades_info/v1/graph
  status:
    availableUpdates: null
    conditions:
    - lastTransitionTime: 2019-03-04T23:28:03Z
      message: Done applying 4.0.0-0.alpha-2019-03-04-190542
      status: "True"
      type: Available
    - lastTransitionTime: 2019-03-06T15:17:33Z
      status: "False"
      type: Failing
    - lastTransitionTime: 2019-03-04T23:28:03Z
      message: Cluster version is 4.0.0-0.alpha-2019-03-04-190542
      status: "False"
      type: Progressing
    - lastTransitionTime: 2019-03-04T23:02:33Z
      message: 'Unable to retrieve available updates: unknown version 4.0.0-0.alpha-2019-03-04-190542'
      reason: RemoteFailed
      status: "False"
      type: RetrievedUpdates
    desired:
      image: registry.svc.ci.openshift.org/openshift/origin-release@sha256:933df182b35b1dc179bd0cfba6d3c0e0a15451989e52e950368c92cbd9e38cf2
      version: 4.0.0-0.alpha-2019-03-04-190542
    history:
    - completionTime: 2019-03-04T23:28:03Z
      image: registry.svc.ci.openshift.org/openshift/origin-release@sha256:933df182b35b1dc179bd0cfba6d3c0e0a15451989e52e950368c92cbd9e38cf2
      startedTime: 2019-03-04T23:02:33Z
      state: Completed
      version: 4.0.0-0.alpha-2019-03-04-190542
    observedGeneration: 1
    versionHash: SFCVTlnmGf0=
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
~~~

Comment 3 Samuel Padgett 2019-03-06 23:40:55 UTC
The web console is simply showing the error that's in the ClusterVersion resource. Assigning to the Upgrade component to investigate if the error is expected or not.

Comment 4 Yadan Pei 2019-03-07 10:06:04 UTC
I think this is expected since we can't retrieve updates from server https://api.openshift.com/api/upgrades_info/v1/graph

$ curl https://api.openshift.com/api/upgrades_info/v1/graph   // returns nothing 


and if you change the server to something like this, the errors should go away.

# oc get clusterversion -o json|jq ".items[0].spec"
{
  "channel": "fast",
  "clusterID": "53f97452-9956-4a3c-8260-00c1de2668a1",
  "upstream": "https://openshift-release.svc.ci.openshift.org/graph"    // curl https://openshift-release.svc.ci.openshift.org/graph, where you can get available updates there
}

Comment 5 liujia 2019-03-08 01:50:40 UTC
Correct. 

But there are indeed returns for https://api.openshift.com/api/upgrades_info/v1/graph, you need add --header 'Accept:application/json' for your curl command.

The upstream url should be pointed to a Cincinnati server which include current version node.

As for your cluster, the desired version "4.0.0-0.alpha-2019-03-04-190542" is not included in your default upstream url https://api.openshift.com/api/upgrades_info/v1/graph. So this information is expected.

BTW, even for https://openshift-release.svc.ci.openshift.org/graph, you still can not get avialable update for your cluster due to you seems using an origin version, i'm not quite sure if there is a server for origin client to do upgrade.

Comment 6 Clayton Coleman 2019-03-14 18:54:07 UTC
We probably need to improve two things:

1. The error message
2. How it is presented in the console


The message should be improved as shown to users to "the specified version is not recognized" (communicating that they aren't running "supported" software subtly).

The presentation in the console should more correctly identify this specific error as "you can't upgrade because your current version isn't recognized by the server" and probably less red and more yellow.

Comment 7 Alex Crawford 2019-04-03 23:25:34 UTC
Moving to 4.2. This sort of UI improvement isn't strictly necessary for 4.1 to ship.

Comment 8 Abhinav Dahiya 2019-08-16 16:29:42 UTC
talking with Crawford. this is not required for 4.2

Comment 11 W. Trevor King 2019-11-09 00:12:02 UTC
[1], from way back in April (so in 4.1.0 and everything since) gives us error messages like:

  currently installed version 4.0.0-3 not found in the "test-channel" channel

I think that checks the "improve the error message" bit from comment 6.  We still set the reason to RemoteFailed [2], though, and we need to be more granular than that if the console folks are going to distinguish between "version/channel pair not known to Cincy" and "network down, could not talk to Cincy at all" and such.

[1]: https://github.com/openshift/cluster-version-operator/pull/162
[2]: https://github.com/openshift/cluster-version-operator/blob/59601a89bb7a65cf2fbaa4697d5142091f68534f/pkg/cvo/availableupdates.go#L164

Comment 13 W. Trevor King 2019-11-13 22:18:33 UTC
Verification probably looks like:

* With the default https://api.openshift.com/api/upgrades_info/v1/graph upstream, confirm that the RetrievedUpdates condition has status False and reason VersionNotFound (because the nightly under test is not known to the production Cincinnati instance).
* Set upstream to https://does-not-exist.example.com/ .  Confirm that the RetrievedUpdates condition has status False and reason RemoteFailed.
* Explore as many of the other RetrievedUpdates failure reasons as you like ;).

Comment 14 liujia 2019-11-19 07:46:44 UTC
1) with default upstream for a nightly build
{
    "lastTransitionTime": "2019-11-19T06:51:02Z",
    "message": "Unable to retrieve available updates: currently installed version 4.3.0-0.nightly-2019-11-18-201425 not found in the \"stable-4.3\" channel",
    "reason": "VersionNotFound",
    "status": "False",
    "type": "RetrievedUpdates"
}

2) with correct upstream for a nightly build
{
    "lastTransitionTime": "2019-11-19T07:08:38Z",
    "status": "True",
    "type": "RetrievedUpdates"
}

3) with notfound upstream
{
    "lastTransitionTime": "2019-11-19T07:21:12Z",
    "message": "Unable to retrieve available updates: unexpected HTTP status: 404 Not Found",
    "reason": "ResponseFailed",
    "status": "False",
    "type": "RetrievedUpdates"
}

4) with unavailable upstream
{
    "lastTransitionTime": "2019-11-19T07:21:12Z",
    "message": "Unable to retrieve available updates: Get https://https//does-not-exist.example.com/?arch=amd64&channel=stable-4.3&id=f6159818-42f4-4341-b783-71f732238c66&version=4.3.0-0.nightly-2019-11-18-201425: dial tcp: lookup https on 10.0.0.2:53: no such host",
    "reason": "RemoteFailed",
    "status": "False",
    "type": "RetrievedUpdates"
}

Comment 15 liujia 2019-11-19 07:47:48 UTC
Verified on 4.3.0-0.nightly-2019-11-18-201425

Comment 17 errata-xmlrpc 2020-01-23 11:03:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.