Bug 1820580

Summary: [4.4] some targets are down with Kuryr network
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: NetworkingAssignee: MichaƂ Dulko <mdulko>
Networking sub component: kuryr QA Contact: GenadiC <gcheresh>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: bbennett, ltomasbo, mdulko, wjiang
Version: 4.4Keywords: TestBlocker
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1822861 (view as bug list) Environment:
Last Closed: 2020-05-04 11:48:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1822861    
Bug Blocks:    
Attachments:
Description Flags
504 error for prometheus API in console
none
all targets are down on Prometheus UI none

Description Junqi Zhao 2020-04-03 11:48:04 UTC
Created attachment 1675995 [details]
504 error for prometheus API in console

Description of problem:
504 error for prometheus API in console and all targets are down with Kuryr network, see the attached picture. no such error with other network type
# oc get network/cluster -oyaml
apiVersion: config.openshift.io/v1
kind: Network
metadata:
  creationTimestamp: "2020-04-03T02:32:18Z"
  generation: 2
  name: cluster
  resourceVersion: "2607"
  selfLink: /apis/config.openshift.io/v1/networks/cluster
  uid: c12d517e-585d-45bc-bef1-2a566aec0acd
spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  externalIP:
    policy: {}
  networkType: Kuryr
  serviceNetwork:
  - 172.30.0.0/16
status:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: Kuryr
  serviceNetwork:
  - 172.30.0.0/16

all endpoints are down,see the picture, or see from CLI, example:
# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://192.168.0.26:9100/metrics'
curl: (7) Failed connect to 192.168.0.26:9100; Connection timed out

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Login admin UI as cluster admin, check "Home -> Overview" page
2. Check targets in prometheus UI
3.

Actual results:
504 error for prometheus API in console and all targets are down with Kuryr network

Expected results:
no error

Additional info:

Comment 1 Junqi Zhao 2020-04-03 11:49:36 UTC
Created attachment 1675996 [details]
all targets are down on Prometheus UI

Comment 14 Junqi Zhao 2020-04-20 01:17:16 UTC
Will verify it after Bug 1825215 is fixed

Comment 15 Junqi Zhao 2020-04-23 07:26:29 UTC
Tested with 4.4.0-0.nightly-2020-04-21-210658, all targets are UP

Comment 17 errata-xmlrpc 2020-05-04 11:48:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581