2024773 – [sig-devex][Feature:ImageEcosystem][python][Slow] hot deploy for openshift python image Django example should work with hot deploy

Bug 2024773 - [sig-devex][Feature:ImageEcosystem][python][Slow] hot deploy for openshift python image Django example should work with hot deploy

Summary: [sig-devex][Feature:ImageEcosystem][python][Slow] hot deploy for openshift py...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Samples
Sub Component:
Version:	4.10
Hardware:	s390x
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.9.z
Assignee:	David Peraza
QA Contact:	Jitendar Singh
Docs Contact:
URL:
Whiteboard:
Depends On:	2023238
Blocks:	2028815
TreeView+	depends on / blocked

Reported:	2021-11-18 23:18 UTC by OpenShift BugZilla Robot
Modified:	2021-12-13 12:06 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-12-13 12:06:24 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift origin pull 26622	0	None	open	[release-4.9] Bug 2024773: Skipping Django Test until bug is fixed	2021-11-19 13:56:08 UTC
Red Hat Product Errata	RHBA-2021:5003	0	None	None	None	2021-12-13 12:06:38 UTC

Description OpenShift BugZilla Robot 2021-11-18 23:18:52 UTC

+++ This bug was initially created as a clone of Bug #2023238 +++

Description of problem: The image-ecosystem job is failing with test "[sig-devex][Feature:ImageEcosystem][python][Slow] hot deploy for openshift python image  Django example should work with hot deploy" for all the versions


Version-Release number of selected component (if applicable): 4.10 to 4.6

This failure is noticed on CI of other platforms as well for Image ecosystem Job: https://search.ci.openshift.org/?search=%5C%5Bsig-devex%5C%5D%5C%5BFeature%3AImageEcosystem%5C%5D%5C%5Bpython%5C%5D%5C%5BSlow%5C%5D+hot+deploy+for+openshift+python+image++Django+example+should+work+with+hot+deploy&maxAge=48h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job


How reproducible:

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.10-ocp-image-ecosystem-remote-libvirt-s390x/1460125433574985728

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-image-ecosystem-remote-libvirt-s390x/1460049922429554688

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.8-ocp-image-ecosystem-remote-libvirt-s390x/1460004617323548672


Actual results:
Test Fails

Expected results:
Test should pass

--- Additional comment from gmontero on 2021-11-16 13:35:12 UTC ---

So this seems to be specific to the multi arch / non x86 version of the python image and its "hot deploy" feature.  The rollout following the enablement of hot deploy at https://github.com/openshift/origin/blob/master/test/extended/image_ecosystem/s2i_python.go#L103 is failing.

Looking at https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.10-ocp-image-ecosystem-remote-libvirt-s390x/1460125433574985728:


Nov 15 06:51:01.001: INFO: Log for pod "django-psql-example-2-deploy"/"deployment"
---->
--> Scaling django-psql-example-1 down to zero
--> Scaling django-psql-example-2 to 1 before performing acceptance check
error: update acceptor rejected django-psql-example-2: pods for rc 'e2e-test-s2i-python-rsrdp/django-psql-example-2' took longer than 600 seconds to become available
<----end of log for "django-psql-example-2-deploy"/"deployment"

and the container in the pod associated with that deployment failed with only the following in the pod status:

    containerStatuses:
    - containerID: cri-o://b8556b1da26966ad0307d57635e29b0dfdbae7c240a3ca4da95921330e275b1d
      image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dddb0884de3a16f4066f6764ba636355b2a1ece7c323ec7b18c4fd87ca087c41
      imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dddb0884de3a16f4066f6764ba636355b2a1ece7c323ec7b18c4fd87ca087c41
      lastState: {}
      name: deployment
      ready: false
      restartCount: 0
      started: false
      state:
        terminated:
          containerID: cri-o://b8556b1da26966ad0307d57635e29b0dfdbae7c240a3ca4da95921330e275b1d
          exitCode: 1
          finishedAt: "2021-11-15T06:50:49Z"
          reason: Error
          startedAt: "2021-11-15T06:40:46Z"


The test captures a lot a debug data on failure, but I cannot tell why that container failed.

Sending over to multi arch team.  If Yaakov or someone there can gleam further information from the logs, great.  Otherwise, I suspect they will need to manually try the scenario performed at https://github.com/openshift/origin/blob/master/test/extended/image_ecosystem/s2i_python.go#L58-L106
on the same type of multi arch cluster used in that periodic, and see what has gone amiss with the python image.

--- Additional comment from gmontero on 2021-11-16 13:36:26 UTC ---

According to Stephen Benjamin, these periodics were passing, and then started failing last week around Nov 10 I believe.

--- Additional comment from gmontero on 2021-11-16 16:27:37 UTC ---

The last change to the image was 14 days ago with https://github.com/sclorg/s2i-python-container/pull/480

Assuming a few days for it to get pushed to registry.redhat.io, that could be a culprit.

If so, bet it fails on x86 as well.

--- Additional comment from dmistry on 2021-11-16 22:25:25 UTC ---

yes, it is failing on x86. Doesn't look to be arch specific.

https://prow.ci.openshift.org/job-history/origin-ci-test/pr-logs/directory/pull-ci-openshift-cluster-samples-operator-release-4.9-e2e-aws-image-ecosystem

--- Additional comment from gmontero on 2021-11-18 13:20:16 UTC ---

moving this to samples since it is platform agnostic

the new samples owner (i.e. not me) is engaging with the RHEL SCL team to get them to fix their regression,
and probably will need to skip this test in 4.10/4.9/4.8 if their regression can not be fixed quickly

--- Additional comment from dperaza on 2021-11-18 14:43:40 UTC ---

OK, will skip the test for now until we figure out why this is failing with the python image.

Comment 6 errata-xmlrpc 2021-12-13 12:06:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.11 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:5003

Note You need to log in before you can comment on or make changes to this bug.