1967956 – [master] Assisted-service deployed on an IPv6 cluster installed with proxy: agentclusterinstall shows error pulling an image from quay.

Bug 1967956 - [master] Assisted-service deployed on an IPv6 cluster installed with proxy: agentclusterinstall shows error pulling an image from quay.

Summary: [master] Assisted-service deployed on an IPv6 cluster installed with proxy: a...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	assisted-installer
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.9.0
Assignee:	vemporop
QA Contact:	Yuri Obshansky
Docs Contact:
URL:
Whiteboard:	AI-Team-Core KNI-EDGE-4.8
Depends On:
Blocks:	1974757
TreeView+	depends on / blocked

Reported:	2021-06-04 14:19 UTC by Alexander Chuzhoy
Modified:	2021-10-18 17:33 UTC (History)
CC List:	5 users (show)
Fixed In Version:	OCP-Metal-v1.0.22.1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1974757 (view as bug list)
Environment:
Last Closed:	2021-10-18 17:32:54 UTC
Target Upstream Version:
Embargoed:
Flags:	alazar: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift assisted-service pull 2026	0	None	open	Bug 1967956: Propagate proxy config to service	2021-06-17 11:14:56 UTC
Red Hat Product Errata	RHSA-2021:3759	0	None	None	None	2021-10-18 17:33:12 UTC

Internal Links: 1974757

Description Alexander Chuzhoy 2021-06-04 14:19:38 UTC

Version: 4.8.0-fc.7

Steps to reproduce:
1. Deploy IPv6 cluster with proxy.

[root@f29-h06-000-r640 ~]# oc get proxy cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  creationTimestamp: "2021-06-02T19:45:33Z"
  generation: 1
  name: cluster
  resourceVersion: "671"
  uid: d0c6c2b5-fb10-4864-975f-3c45afb5ab5f
spec:
  httpProxy: http://[1000::beef]:3128
  httpsProxy: http://[1000::beef]:3128
  noProxy: 1000::/64,.vlan614.rdu2.scalelab.redhat.com,fd01::/48,fd02::/112
  trustedCA:
    name: ""
status:
  httpProxy: http://[1000::beef]:3128
  httpsProxy: http://[1000::beef]:3128
  noProxy: .cluster.local,.svc,.vlan614.rdu2.scalelab.redhat.com,1000::/64,127.0.0.1,api-int.vlan614.rdu2.scalelab.redhat.com,fd01::/48,fd02::/112,localhost



[root@f29-h06-000-r640 ~]# oc get network cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Network
metadata:
  creationTimestamp: "2021-06-02T19:45:33Z"
  generation: 2
  name: cluster
  resourceVersion: "4062"
  uid: 1227b6fd-29c4-45d6-ade7-d154260b3b4a
spec:
  clusterNetwork:
  - cidr: fd01::/48
    hostPrefix: 64
  externalIP:
    policy: {}
  networkType: OVNKubernetes
  serviceNetwork:
  - fd02::/112
status:
  clusterNetwork:
  - cidr: fd01::/48
    hostPrefix: 64
  clusterNetworkMTU: 1400
  networkType: OVNKubernetes
  serviceNetwork:
  - fd02::/112
[root@f29-h06-000-r640 ~]# 



2. Install assisted-service

3. Create agentclusterinstall
[root@f29-h06-000-r640 ~]# oc get agentclusterinstall sno-agent-cluster-install -o json|jq ".spec"
{
  "clusterDeploymentRef": {
    "name": "sno-cluster-deployment"
  },
  "imageSetRef": {
    "name": "4.8"
  },
  "networking": {
    "clusterNetwork": [
      {
        "cidr": "fd01::/48",
        "hostPrefix": 64
      }
    ],
    "machineNetwork": [
      {
        "cidr": "1000::/64"
      }
    ],
    "serviceNetwork": [
      "fd02::/112"
    ]
  },
  "provisionRequirements": {
    "controlPlaneAgents": 1
  },
  "sshPublicKey": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCzwAz3fnZcrca7mY/kVFpQGS2yI1uGd/+t3PMJn/C7Ppj1uIG32ufHkTq+SXh8Zg3xcy9v/Uome1mo3FP7PoGsWms5B9wzbooGhbA3rdph0/NxSzrHO3qcudcJsBM4GVJhcbFfbkzJVCPZQ94O/Y17oKjKuaBz69clPD29BlzKF4xCWzzbJW5Q8Y9tvWvDpCdVBM7VorpAn3MaA95xL6e15douWwwlhdI4dIOk/+8HcfgJnZGyOeLTnLVpjxQaFzTj3ScEud/5yd5wHcICrHH8Fbq419nN7VWjxbMNWUn182mcCCs0RXx2eyYq27yJvgkJS86n09SyLynX6ySqkFXN"
}



4. Create clusterdeployment
[root@f29-h06-000-r640 ~]# oc get cd single-node -o json|jq ".spec"
{
  "baseDomain": "qe.lab.redhat.com",
  "clusterInstallRef": {
    "group": "extensions.hive.openshift.io",
    "kind": "AgentClusterInstall",
    "name": "sno-agent-cluster-install",
    "version": "v1beta1"
  },
  "clusterName": "elvis",
  "controlPlaneConfig": {
    "servingCertificates": {}
  },
  "installed": false,
  "platform": {
    "agentBareMetal": {
      "agentSelector": {
        "matchLabels": {
          "bla": "aaa"
        }
      }
    }
  },
  "pullSecretRef": {
    "name": "pull-secret"
  }
}


5. Check status of agentclusterinstall
oc get agentclusterinstall sno-agent-cluster-install -o json|jq ".status.conditions[0]"
{
  "lastProbeTime": "2021-06-04T14:13:50Z",
  "lastTransitionTime": "2021-06-04T14:13:50Z",
  "message": "The Spec could not be synced due to backend error: command oc adm release info -o template --template '{{.metadata.version}}' --insecure=false quay.io/openshift-release-dev/ocp-release:4.8.0-fc.7-x86_64 exited with non-zero exit code 1: \nerror: unable to read image quay.io/openshift-release-dev/ocp-release:4.8.0-fc.7-x86_64: Get \"https://quay.io/v2/\": dial tcp 3.213.173.170:443: connect: network is unreachable\n",
  "reason": "BackendError",
  "status": "False",
  "type": "SpecSynced"
}


Expected result:
Since the cluster uses a proxy - expected the image to be successfully pulled.
I have no problem running pods with images from quay on that cluster, without mirroring.

Comment 1 vemporop 2021-06-06 11:03:05 UTC

@sasha How do you install assisted service in step #2? It looks like the assisted service deployment does not use the proxy for some reason.

Comment 2 Alexander Chuzhoy 2021-06-07 16:58:49 UTC

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: assisted-service-operator
  namespace: "assisted-installer"
spec:
  channel: "alpha"
  config:
    env:
    - name: IPV6_SUPPORT
      value: "true"
    - name: OPENSHIFT_VERSIONS
      value: '{"4.6":{"display_name":"4.6.16","release_version":"4.6.16","release_image":"quay.io/openshift-release-dev/ocp-release:4.6.16-x86_64","rhcos_image":"https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.6/4.6.8/rhcos-4.6.8-x86_64-live.x86_64.iso","rhcos_rootfs":"https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.6/4.6.8/rhcos-live-rootfs.x86_64.img","rhcos_version":"46.82.202012051820-0","support_level":"production"},"4.7":{"display_name":"4.7.9","release_version":"4.7.9","release_image":"quay.io/openshift-release-dev/ocp-release:4.7.9-x86_64","rhcos_image":"https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.7/4.7.7/rhcos-4.7.7-x86_64-live.x86_64.iso","rhcos_rootfs":"https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.7/4.7.7/rhcos-live-rootfs.x86_64.img","rhcos_version":"47.83.202103251640-0","support_level":"production","default":true},"4.8":{"display_name":"4.8.0-fc.7","release_version":"4.8.0-fc.7","release_image":"quay.io/openshift-release-dev/ocp-release:4.8.0-fc.7-x86_64","rhcos_image":"https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/4.8.0-fc.7/rhcos-4.8.0-fc.7-x86_64-live.x86_64.iso","rhcos_rootfs":"https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/4.8.0-fc.7/rhcos-live-rootfs.x86_64.img","rhcos_version":"48.84.202105281935-0","support_level":"beta"}}'
  installPlanApproval: Automatic
  name: assisted-service-operator
  source: assisted-installer-index
  sourceNamespace: openshift-marketplace
  startingCSV: assisted-service-operator.v0.0.4

Comment 3 vemporop 2021-06-08 11:02:58 UTC

@sasha It's not a disconnected scenario, is it?
@alazar Do we support IPv6 hub cluster in non-disconnected (connected?) environments?

Comment 4 Ronnie Lazar 2021-06-08 11:54:42 UTC

We need to support IPv6, non-disconnected. However the user will need to provide a registry mirror since quay.io does not support IPv6

Comment 5 vemporop 2021-06-08 12:08:17 UTC

@alazar but do we expect it to work with a proxy as in the bug's scenario? That is, a hub cluster talking to quay.io via an HTTP proxy.

Comment 6 Ronnie Lazar 2021-06-08 12:10:01 UTC

If the hub cluster is configured with a Proxy that translate IPv4 to IPv6 than it should work
Maybe the configuration here is wrong

Comment 7 Alexander Chuzhoy 2021-06-08 13:40:52 UTC

(In reply to vemporop from comment #3)
> @sasha It's not a disconnected scenario, is it?
> @alazar Do we support IPv6 hub cluster in non-disconnected
> (connected?) environments?

The cluster is disconnected.

Comment 8 Osher De Paz 2021-06-08 15:29:37 UTC

This is the configuration we're using in our automation:

          'proxy': {'httpProxy': 'http://[1001:db8::1]:3128',
                    'httpsProxy': 'http://[1001:db8::1]:3128',
                    'noProxy': '1001:db8::/120,2003:db8::/112,2002:db8::/53,.test-infra-cluster-assisted-installer.redhat.com'},

It does seem the same so that's real weird
Test-infra does this test on each PR with no issue whatsoever

Comment 9 vemporop 2021-06-10 06:52:18 UTC

@sasha as it's a disconnected cluster, you should have set up image mirroring during the deployment of Assisted Installer. Trying to pull an image from quay.io doesn't make sense in this case.

Comment 10 vemporop 2021-06-15 13:49:50 UTC

@sasha How did you succeed to deploy the operator on the spoke cluster? I can't even reach a running assisted service when trying to use IPv6 with a proxy. It looks like OMP has the same problem of not respecting the cluster-wide proxy.

# oc logs quay-io-ocpmetal-assisted-service-operator-bundle-latest -n assisted-installer
time="2021-06-15T13:42:04Z" level=info msg="adding to the registry" bundles="[quay.io/ocpmetal/assisted-service-operator-bundle:latest]"
time="2021-06-15T13:42:04Z" level=error msg="permissive mode disabled" bundles="[quay.io/ocpmetal/assisted-service-operator-bundle:latest]" error="[error resolving name : failed to do request: Head \"https://quay.io/v2/ocpmetal/assisted-service-operator-bundle/manifests/latest\": dial tcp 34.224.196.162:443: connect: network is unreachable, image \"quay.io/ocpmetal/assisted-service-operator-bundle:latest\": not found]"
Error: [error resolving name : failed to do request: Head "https://quay.io/v2/ocpmetal/assisted-service-operator-bundle/manifests/latest": dial tcp 34.224.196.162:443: connect: network is unreachable, image "quay.io/ocpmetal/assisted-service-operator-bundle:latest": not found]
Usage:
  opm registry add [flags]

Flags:
  -b, --bundle-images strings   comma separated list of links to bundle image
      --ca-file string          the root certificates to use when --container-tool=none; see docker/podman docs for certificate loading instructions
  -c, --container-tool string   tool to interact with container images (save, build, etc.). One of: [none, docker, podman] (default "none")
  -d, --database string         relative path to database file (default "bundles.db")
      --debug                   enable debug logging
  -h, --help                    help for add
      --mode string             graph update mode that defines how channel graphs are updated. One of: [replaces, semver, semver-skippatch] (default "replaces")
      --permissive              allow registry load errors

Global Flags:
      --skip-tls   skip TLS certificate verification for container image registries while pulling bundles or index

Comment 11 Alexander Chuzhoy 2021-06-15 14:23:57 UTC

I appended in the hub's cluster assisted-service-operator.v0.0.4 CSV the following entries to the spec:

    env:
    - name: IPV6_SUPPORT
      value: "true"
    - name: SERVICE_IMAGE
      value: <FQDN>:5000/ocpmetal/assisted-service:latest
    - name: DATABASE_IMAGE
      value: <FQDN>:5000/ocpmetal/postgresql-12-centos7:latest
    - name: AGENT_IMAGE
      value: <FQDN>:5000/ocpmetal/assisted-installer-agent:latest
    - name: CONTROLLER_IMAGE
      value: <FQDN>:5000/ocpmetal/assisted-installer-controller:latest
    - name: INSTALLER_IMAGE
      value: <FQDN>:5000/ocpmetal/assisted-installer:latest

Comment 12 vemporop 2021-06-15 14:46:03 UTC

Ok, let me sum up.

1. You have a cluster running over IPv6.
2. An IPv6 cluster cannot directly access quay.io (or any website for that matter, because in our lab we don't have IPv6 routing outside).
3. Therefore, you installed assisted operator by using a mirror registry.
4. You also have a cluster-wide proxy configured on the cluster.
5. When trying to use the operator for cluster installation (spoke clusters), you wanted it to use the proxy instead of the mirror registry.
6. Assisted service (run by the operator) failed to access quay.io, which it needs to extract some info from the OCP release image.
7. Apparently, the service was not using the proxy.

According to my investigation, this particular bug may be caused by the fact that we don't copy proxy definitions from the operator to service pods. The operator gets these definitions automatically by OLM, but does not propagate them. See https://docs.openshift.com/container-platform/4.7/operators/admin/olm-configuring-proxy-support.html
> Operators must handle setting environment variables for proxy settings in the pods for any managed Operands.

There also seems to be a broader problem of components not using cluster-wide proxy configuration, which you won't encounter when using a mirror registry.

Comment 13 vemporop 2021-06-16 09:15:30 UTC

Related bug https://bugzilla.redhat.com/show_bug.cgi?id=1972417

Comment 14 vemporop 2021-06-17 11:23:05 UTC

Submitted a fix. I was able to install the operator and apply CRs on an IPv6-only cluster using a proxy - without a mirror registry.
I'm not sure why operator installation on IPv6 without a mirror failed before that, but it's OK now and I could test the fix.

Comment 16 Fred Rolland 2021-06-22 11:33:34 UTC

@alazar Don't we want this one in 4.8 also?

Comment 20 errata-xmlrpc 2021-10-18 17:32:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Note You need to log in before you can comment on or make changes to this bug.