Bug 1820580

Summary:

[4.4] some targets are down with Kuryr network

Product:

OpenShift Container Platform

Reporter:

Junqi Zhao <juzhao>

Component:

Networking

Assignee:

Michał Dulko <mdulko>

Networking sub component:

kuryr

QA Contact:

GenadiC <gcheresh>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

urgent

Priority:

urgent

CC:

bbennett, ltomasbo, mdulko, wjiang

Version:

4.4

Keywords:

TestBlocker

Target Milestone:

---

Target Release:

4.4.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

No Doc Update

Doc Text:

Story Points:

---

Clone Of:

Clones:

1822861 (view as bug list)

Environment:

Last Closed:

2020-05-04 11:48:02 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1822861

Bug Blocks:

Attachments:

Description	Flags
504 error for prometheus API in console	none
all targets are down on Prometheus UI	none

Description Junqi Zhao 2020-04-03 11:48:04 UTC

Created attachment 1675995 [details]
504 error for prometheus API in console

Description of problem:
504 error for prometheus API in console and all targets are down with Kuryr network, see the attached picture. no such error with other network type
# oc get network/cluster -oyaml
apiVersion: config.openshift.io/v1
kind: Network
metadata:
  creationTimestamp: "2020-04-03T02:32:18Z"
  generation: 2
  name: cluster
  resourceVersion: "2607"
  selfLink: /apis/config.openshift.io/v1/networks/cluster
  uid: c12d517e-585d-45bc-bef1-2a566aec0acd
spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  externalIP:
    policy: {}
  networkType: Kuryr
  serviceNetwork:
  - 172.30.0.0/16
status:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: Kuryr
  serviceNetwork:
  - 172.30.0.0/16

all endpoints are down,see the picture, or see from CLI, example:
# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://192.168.0.26:9100/metrics'
curl: (7) Failed connect to 192.168.0.26:9100; Connection timed out

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Login admin UI as cluster admin, check "Home -> Overview" page
2. Check targets in prometheus UI
3.

Actual results:
504 error for prometheus API in console and all targets are down with Kuryr network

Expected results:
no error

Additional info:

Comment 1 Junqi Zhao 2020-04-03 11:49:36 UTC

Created attachment 1675996 [details]
all targets are down on Prometheus UI

Comment 14 Junqi Zhao 2020-04-20 01:17:16 UTC

Will verify it after Bug 1825215 is fixed

Comment 15 Junqi Zhao 2020-04-23 07:26:29 UTC

Tested with 4.4.0-0.nightly-2020-04-21-210658, all targets are UP

Comment 17 errata-xmlrpc 2020-05-04 11:48:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581