Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2023238

Summary: [sig-devex][Feature:ImageEcosystem][python][Slow] hot deploy for openshift python image Django example should work with hot deploy
Product: OpenShift Container Platform Reporter: Surender Yadav <suryadav>
Component: SamplesAssignee: David Peraza <dperaza>
Status: CLOSED ERRATA QA Contact: Jitendar Singh <jitsingh>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.10CC: aos-bugs, dmistry, gmontero, lakshmi.ravichandran1, pbhattac, spandura, stbenjam
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: s390x   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
job=periodic-ci-openshift-multiarch-master-nightly-4.10-ocp-image-ecosystem-aws-arm64=all job=periodic-ci-openshift-multiarch-master-nightly-4.10-ocp-image-ecosystem-remote-libvirt-s390x=all job=periodic-ci-openshift-multiarch-master-nightly-4.10-ocp-image-ecosystem-remote-libvirt-ppc64le=all
Last Closed: 2022-03-10 16:27:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2024773    

Comment 1 Gabe Montero 2021-11-16 13:35:12 UTC
So this seems to be specific to the multi arch / non x86 version of the python image and its "hot deploy" feature.  The rollout following the enablement of hot deploy at https://github.com/openshift/origin/blob/master/test/extended/image_ecosystem/s2i_python.go#L103 is failing.

Looking at https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.10-ocp-image-ecosystem-remote-libvirt-s390x/1460125433574985728:


Nov 15 06:51:01.001: INFO: Log for pod "django-psql-example-2-deploy"/"deployment"
---->
--> Scaling django-psql-example-1 down to zero
--> Scaling django-psql-example-2 to 1 before performing acceptance check
error: update acceptor rejected django-psql-example-2: pods for rc 'e2e-test-s2i-python-rsrdp/django-psql-example-2' took longer than 600 seconds to become available
<----end of log for "django-psql-example-2-deploy"/"deployment"

and the container in the pod associated with that deployment failed with only the following in the pod status:

    containerStatuses:
    - containerID: cri-o://b8556b1da26966ad0307d57635e29b0dfdbae7c240a3ca4da95921330e275b1d
      image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dddb0884de3a16f4066f6764ba636355b2a1ece7c323ec7b18c4fd87ca087c41
      imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dddb0884de3a16f4066f6764ba636355b2a1ece7c323ec7b18c4fd87ca087c41
      lastState: {}
      name: deployment
      ready: false
      restartCount: 0
      started: false
      state:
        terminated:
          containerID: cri-o://b8556b1da26966ad0307d57635e29b0dfdbae7c240a3ca4da95921330e275b1d
          exitCode: 1
          finishedAt: "2021-11-15T06:50:49Z"
          reason: Error
          startedAt: "2021-11-15T06:40:46Z"


The test captures a lot a debug data on failure, but I cannot tell why that container failed.

Sending over to multi arch team.  If Yaakov or someone there can gleam further information from the logs, great.  Otherwise, I suspect they will need to manually try the scenario performed at https://github.com/openshift/origin/blob/master/test/extended/image_ecosystem/s2i_python.go#L58-L106
on the same type of multi arch cluster used in that periodic, and see what has gone amiss with the python image.

Comment 2 Gabe Montero 2021-11-16 13:36:26 UTC
According to Stephen Benjamin, these periodics were passing, and then started failing last week around Nov 10 I believe.

Comment 3 Gabe Montero 2021-11-16 16:27:37 UTC
The last change to the image was 14 days ago with https://github.com/sclorg/s2i-python-container/pull/480

Assuming a few days for it to get pushed to registry.redhat.io, that could be a culprit.

If so, bet it fails on x86 as well.

Comment 5 Gabe Montero 2021-11-18 13:20:16 UTC
moving this to samples since it is platform agnostic

the new samples owner (i.e. not me) is engaging with the RHEL SCL team to get them to fix their regression,
and probably will need to skip this test in 4.10/4.9/4.8 if their regression can not be fixed quickly

Comment 6 David Peraza 2021-11-18 14:43:40 UTC
OK, will skip the test for now until we figure out why this is failing with the python image.

Comment 8 Gabe Montero 2021-11-19 13:55:31 UTC
e2e passed / test skipped ... if need be we'll open a bz on rhel scl python if they take too long to fix

marking verified to facilitate cherrypick process

Comment 9 Lakshmi Ravichandran 2021-11-29 14:27:06 UTC
thank you for the fix.
but we observed the test to fail in older versions of OCP (4.6 - 4.9) incase of both P & Z. could you please have a look?
https://search.ci.openshift.org/?search=Django+example+should+work+with+hot+deploy&maxAge=48h&context=1&type=build-log&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 10 Gabe Montero 2021-11-29 15:03:32 UTC
(In reply to Lakshmi Ravichandran from comment #9)
> thank you for the fix.
> but we observed the test to fail in older versions of OCP (4.6 - 4.9) incase
> of both P & Z. could you please have a look?
> https://search.ci.openshift.org/
> ?search=Django+example+should+work+with+hot+deploy&maxAge=48h&context=1&type=
> build-log&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

yes we know

backporting is in progress for the earlier releases

per the bug dependencies noted above, next on the list is https://bugzilla.redhat.com/show_bug.cgi?id=2024773

Comment 13 errata-xmlrpc 2022-03-10 16:27:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056