We've seen multiple issues with components either failing to work through a corporate proxy that terminates and re-signs TLS connections, or ignoring proxy settings and trying to access the internet directly. I suspect there just isn't a CI job that tests the whole installation / upgrade process for an installation that's typical of some corporate environments: that is, a cluster that's connected through a proxy that uses a custom CA root for traffic inspection. Examples include: - 4.2 installer not pulling images through such a proxy (case 02503426; abandoned in favor of 4.1 install) - CVO not using proxy settings (https://bugzilla.redhat.com/show_bug.cgi?id=1766907) - Image pull during deployment fails with MITM proxy (https://bugzilla.redhat.com/show_bug.cgi?id=1784201) - NTP server settings in a restricted network for OpenShift 4 (https://bugzilla.redhat.com/show_bug.cgi?id=1767669) A test case that installed and upgraded an isolated cluster, connected only via a proxy and using a custom CA root, should help to shake these issues out before release.
Setting target release to 4.4 to perform investigation on the active development branch (will be re-set/cloned where fixes & backports, if any, are required).
Xiaoli has explained that this set of test cases already exists in those which QE runs. Do we have a compelling reason to duplicate it?
It seems the major gap is in upgrading from 4.1 -> 4.2 and adding proxy configuration.
My assumption is that if these issues *were* caught by existing test cases, they wouldn't have made it into the released product. Do you disagree? To clarify, there are at least two gaps: - Proxied operations must be tested on a disconnected cluster. For example, prior to 4.2.13 the cluster version operator just ignored proxy settings and accessed the internet directly. If upgrade-via-proxy was tested, there's no way it was tested from a disconnected cluster because it wouldn't have worked. - Upgrade from 4.1 -> 4.2 and enable proxy settings. - First, this caused the network operator to crashloop. This is resolved in 4.2.13. - Second, the machine config operator isn't configuring the hosts with the bundle listed in the proxy's trustedCA setting, so kubelet isn't able to pull images through a proxy that uses a corporate certificate authority.
Do we have an owner for this? Test infrastructure does not fit it.
Minus upgrade coverage, https://github.com/openshift/release/pull/5308 is attempting to provide the necessary CI coverage. Eric, Can you provide an update on the status of PR 5308? Aside from upgrades, can you verify your PR addresses these use cases? Does a Jira card exist for creating an upgrade test for a proxied environment?
Eric, we need information on this one.
Moving to 4.4z as this is certainly not blocking the release today or tomorrow.
> Can you provide an update on the status of PR 5308? I had been focusing on Logging work to get that out before the feature freeze, so I haven't looked in a while. Last I saw the AWS rehearsal job was still failing due to (I believe) gaps from other teams tests since this was now going to be removing direct egress access... I had opened a bz to track this with the oauth team, but am not sure when it will be addressed. I'll rerun the rehearsal job and see where things are -- it looked like the non-aws platform rehearsal failed due to different issues as the proxy work is restricted to only the AWS scope. > Aside from upgrades, can you verify your PR addresses these use cases? > ignoring proxy settings and trying to access the internet directly https://github.com/openshift/release/pull/5308 covers this use case. > Does a Jira card exist for creating an upgrade test for a proxied environment? Not that I'm aware of.. the only JIRA card was to create the initial blackhole'd VPC job https://issues.redhat.com/browse/DPTP-591 I feel that if there is going to be an upgrade test it would fall on testplatform team to build upon release/5308.
ewolinet, I created separate bugs to address proxy CI coverage for a) each supported provider b) an upgrade job and c) a day-2 config job.
Setting the target to 4.7.