Bug 1797894 - [4.5]machineNetwork in noProxy list is flushed by Network-Operator
Summary: [4.5]machineNetwork in noProxy list is flushed by Network-Operator
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.5.0
Assignee: Juan Luis de Sousa-Valadas
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks: 1806403
TreeView+ depends on / blocked
 
Reported: 2020-02-04 07:39 UTC by weiwei jiang
Modified: 2020-07-13 17:14 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When the installer introduced machineNetwork the cluster network operator ignored wasn't modified to add it to proxy.status.noProxy. Consequence: proxy.status.noProxy missing machineNetwork Fix: Add it to proxy.status.noProxy Result: noProxy contains the expected fields.
Clone Of:
: 1805726 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:13:54 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 490 None closed Bug 1797894: Add MachineNetworks to proxy.status.noProxy 2020-08-06 05:01:43 UTC
Github openshift cluster-network-operator pull 503 None closed Bug 1797894: Don't add empty MachineCIDR to noProxy 2020-08-06 05:01:43 UTC
Red Hat Product Errata RHBA-2020:2409 None None None 2020-07-13 17:14:09 UTC

Description weiwei jiang 2020-02-04 07:39:53 UTC
Description of problem: 
When setup cluster with proxy, installation failed due to machine-config operator does not get ready.
And also failed to run commands(logs, exec, port-forward, ) which need api proxyconnect to node. 
After research, found that recently we merge https://github.com/openshift/installer/pull/2829, but Network-Operator is not aware of this. 
so it flushes the noProxy with the one without machineNetwork,  
https://github.com/openshift/cluster-network-operator/blob/master/pkg/util/proxyconfig/no_proxy.go#L29 
 

Known Affects: all api to nodes networking will be blocked, includes the following:
1. Installation will never success
2. all commands need api proxyconnect nodes will failed like oc logs, oc exec, etc
3. some node metrics target should be in RED
 
Version-Release number of the following components: 
./openshift-install 4.4.0-0.nightly-2020-02-03-224632 
built from commit 725b71dce1d41c98e368ad9277e14c7ce9a9cb25 
release image registry.svc.ci.openshift.org/ocp/release@sha256:5a51afee81638f559a92a7a1d910c24af8c4f458ea5baf8075fc3d81cf35f6fe 
 
How reproducible: 
Always 
 
Steps to Reproduce: 
1. Setup a IPI cluster with proxy in install-config.yaml 
2. try to run oc logs 
3. 
 
Actual results: 
$ oc -n openshift-machine-config-operator logs -f machine-config-controller-6965dbc744-bpt98                                                                                                                      
Error from server: Get https://192.168.0.20:10250/containerLogs/openshift-machine-config-operator/machine-config-controller-6965dbc744-bpt98/machine-config-controller?follow=true: proxyconnect tcp: x509: certificate signed by unknown authority 
 
Expected results: 
Should not get such an error. 
 
Additional info:

Comment 1 Johnny Liu 2020-02-04 08:11:35 UTC
Also this this issue in upi on aws install with proxy enabled.

`machineNetwork` filed in install-config.yaml:
proxy:
  httpProxy: http://proxy-user1:xxx@QE_PROXY_PLACEHOLDER:3128
  httpsProxy: http://proxy-user1:xxx@QE_PROXY_PLACEHOLDER:3128
  noProxy: test.no-proxy.com
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  serviceNetwork:
  - 172.30.0.0/16
  networkType: OVNKubernetes
  machineNetwork:
  - cidr: 10.0.0.0/16
Trigger installation, failed.
$ ./openshift-install wait-for install-complete --dir '/home/installer3/workspace/Launch Environment Flexy/workdir/install-dir'
level=info msg="Waiting up to 30m0s for the cluster at https://api.jialiu-25822.qe.devcluster.openshift.com:6443 to initialize..."
level=info msg="Cluster operator insights Disabled is False with : "
level=info msg="Cluster operator machine-config Available is False with : Cluster not available for 4.4.0-0.nightly-2020-02-03-081920"
level=error msg="Cluster operator machine-config Degraded is True with RequiredPoolsFailed: Failed to resync 4.4.0-0.nightly-2020-02-03-081920 because: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: configuration status for pool master is empty: pool is degraded because nodes fail with \"3 nodes are reporting degraded status on sync\": \"Node ip-10-0-61-87.us-east-2.compute.internal is reporting: \\\"machineconfig.machineconfiguration.openshift.io \\\\\\\"rendered-master-fdb913d94892563827998728eb2d3557\\\\\\\" not found\\\", Node ip-10-0-59-238.us-east-2.compute.internal is reporting: \\\"machineconfig.machineconfiguration.openshift.io \\\\\\\"rendered-master-fdb913d94892563827998728eb2d3557\\\\\\\" not found\\\", Node ip-10-0-70-4.us-east-2.compute.internal is reporting: \\\"machineconfig.machineconfiguration.openshift.io \\\\\\\"rendered-master-fdb913d94892563827998728eb2d3557\\\\\\\" not found\\\"\", retrying"
level=fatal msg="failed to initialize the cluster: Cluster operator machine-config is reporting a failure: Failed to resync 4.4.0-0.nightly-2020-02-03-081920 because: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: configuration status for pool master is empty: pool is degraded because nodes fail with \"3 nodes are reporting degraded status on sync\": \"Node ip-10-0-61-87.us-east-2.compute.internal is reporting: \\\"machineconfig.machineconfiguration.openshift.io \\\\\\\"rendered-master-fdb913d94892563827998728eb2d3557\\\\\\\" not found\\\", Node ip-10-0-59-238.us-east-2.compute.internal is reporting: \\\"machineconfig.machineconfiguration.openshift.io \\\\\\\"rendered-master-fdb913d94892563827998728eb2d3557\\\\\\\" not found\\\", Node ip-10-0-70-4.us-east-2.compute.internal is reporting: \\\"machineconfig.machineconfiguration.openshift.io \\\\\\\"rendered-master-fdb913d94892563827998728eb2d3557\\\\\\\" not found\\\"\", retrying"

After installation failure, compare noProxy list between bootstrap and cluster, found some difference.
# sdiff b.log c.log 
.cluster.local							.cluster.local
.svc								.svc
.us-east-2.compute.internal					.us-east-2.compute.internal
10.0.0.0/16						      <
10.128.0.0/14							10.128.0.0/14
127.0.0.1							127.0.0.1
169.254.169.254							169.254.169.254
172.30.0.0/16							172.30.0.0/16
api-int.jialiu-25822.qe.devcluster.openshift.com		api-int.jialiu-25822.qe.devcluster.openshift.com
etcd-0.jialiu-25822.qe.devcluster.openshift.com			etcd-0.jialiu-25822.qe.devcluster.openshift.com
etcd-1.jialiu-25822.qe.devcluster.openshift.com			etcd-1.jialiu-25822.qe.devcluster.openshift.com
etcd-2.jialiu-25822.qe.devcluster.openshift.com			etcd-2.jialiu-25822.qe.devcluster.openshift.com
localhost							localhost
test.no-proxy.com						test.no-proxy.com


b.log is the noProxy list captured by running `env |grep -i proxy`, c.log is the noProxy list captured by running `oc get proxy cluster -o yaml`.

Comment 11 zhaozhanqi 2020-03-09 07:56:10 UTC
Verified this bug on 4.5.0-0.nightly-2020-03-06-190457

# oc get cm cluster-config-v1 -n kube-system -o yaml | grep cidr -A 2
      - cidr: 10.128.0.0/14
        hostPrefix: 23
      machineNetwork:
      - cidr: 10.0.0.0/16
      networkType: OpenShiftSDN
      serviceNetwork:


# oc get proxy cluster -o yaml 
apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  creationTimestamp: "2020-03-09T06:40:15Z"
  generation: 1
  name: cluster
  resourceVersion: "680"
  selfLink: /apis/config.openshift.io/v1/proxies/cluster
  uid: 6d53c4fd-ddc3-4ad1-a6d0-3b3f4f83d5fc
spec:
  httpProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@ec2-3-12-160-4.us-east-2.compute.amazonaws.com:3128
  httpsProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@ec2-3-12-160-4.us-east-2.compute.amazonaws.com:3128
  noProxy: test.no-proxy.com
  trustedCA:
    name: ""
status:
  httpProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@ec2-3-12-160-4.us-east-2.compute.amazonaws.com:3128
  httpsProxy: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@ec2-3-12-160-4.us-east-2.compute.amazonaws.com:3128
  noProxy: .cluster.local,.svc,.us-east-2.compute.internal,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,api-int.zzhao45.qe.devcluster.openshift.com,etcd-0.zzhao45.qe.devcluster.openshift.com,etcd-1.zzhao45.qe.devcluster.openshift.com,etcd-2.zzhao45.qe.devcluster.o

Comment 13 errata-xmlrpc 2020-07-13 17:13:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.