Bug 2011654
| Summary: | OC deploys causing error: "Invalid or corrupt jarfile" | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Vinya Nema <vnema> |
| Component: | Build | Assignee: | Adam Kaplan <adam.kaplan> |
| Status: | CLOSED DUPLICATE | QA Contact: | Jitendar Singh <jitsingh> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.6 | CC: | aos-bugs, dwalsh, gmontero, jokerman, pbhattac, spandura, tsweeney |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-10-27 13:00:19 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Vinya Nema
2021-10-07 01:39:14 UTC
OK it took a decent amount of digging in the customer case, but I was able to cobble together a few things:
1) this is a binary build situation:
spec:
failedBuildsHistoryLimit: 5
nodeSelector: null
output:
to:
kind: ImageStreamTag
name: prod-mk-proposalgenerator-messaging:latest
postCommit: {}
resources: {}
runPolicy: Serial
source:
binary: {}
type: Binary
strategy:
sourceStrategy:
from:
kind: ImageStreamTag
name: openjdk18-openshift:fedprod
type: Source
successfulBuildsHistoryLimit: 5
so hence an upload of contents from the local file system, presumably even this jar file in question, seems probable,
especially given the name 'dev1-fedonedocgen-currentview-messaging.jar'
I *doubt* that comes from the openjdk18-openshift the prod-mk-proposalgenerator-messaging imagestream, but seems like it could come from prod-mk-proposalgenerator-messaging imagestream
2) So, a quick refresher on how binary builds works
a) the 'oc start-build' "transfers" the data to the api server, which then streams it to the pod
b) exactly how this is done depends very much on the arguments supplied to the 'oc start-build'
c) for example, based on whether --from-file, --from-dir, --from-archive, or --from-repo is supplied, the 'oc start-build' will vary the upload mechanism
d) golang http and whatever git binary is installed on the local host are potential options for example
3) I see mention of running oc start-build with trace in the customer case, but I see no evidence of that trace being provided, is that correct? If we don't have that trace, we really need it. And again, aside from the trace itself, we need to know the precise list of arguments supplied on the 'oc start-build' invocation.
4) I also need clarification on "cu is able to take the same jar files and use them at other places without any problems". Does that mean they can do 'oc start-build' on some hosts and everything works fine, but on some host the image produced by 'oc start-build' results in the corrupt jar message?
Or do they just mean they copy that jar file and use it successfully in a fashion other than running the prod-mk-proposalgenerator-messaging:latest imagestreamtag in a Pod?
If images produced by 'oc start-build' work from some local systems, but not others, that is a clue that one of the dependencies could be off on that system.
Or, if the version of java used when the jar file works is different than the version of java when the jar file does not, that could be a clue.
5) I did look at the must-gather initially provided, bu saw no mention of the prod-mk-proposalgenerator-messaging build config in the controller manager of api server logs. Now, that said, both are involved in the binary build process. In fact, 'oc start-build' transfer the local data to the api server, who in turn transfers it to the build pod, which then produces the output image. So plenty of opportunities for unexpected items to occur. In particular, we have seen an issue with the 'tar' command that openshift picks up from RHEL impacting builds, as tar is used by the apiserver to stream data over the socket to the build pod. So, go ahead and get must gather along with the 'oc start-build' information I asked for in 3.
6) lastly, I see them mentioned old versions of the deployment. I take it that means deployments that use older versions of the imagestreamtag prod-mk-proposalgenerator-messaging:latest based on the dc yaml they provided.
if so, get them to use 'oc debug' to create test pods using the version of the imagestreamtag that works and ones with the imagestreamtag that fails. In each of those debug pods, do cksum's and ls -la of the jar file in question, so we can compare the results with each other, as well as with the version of the jar file that is currently being picked up by 'oc start-build' for the failing imagestream tags.
OK, lots of data to capture. Good luck.
Is this a potential duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1952929 ? We believed the root cause there was a RHEL 7 (or equivalent) client system using binary builds to stream content to OCP 4.6+, which is based on RHEL 8. However, we didn't explore the issue further since the affected image had reached the end of its support lifecycle. Conceivably. But assuming we are not dealing with EOL versions this time, any suggestions on what from https://bugzilla.redhat.com/show_bug.cgi?id=1952929 we can use to move things along here. Closing this as a duplicate of 1952929. In this issue an `oc` client on a RHEL 7 based system will occasionally corrupt a byte of information when streaming contents to an OpenShift cluster v4.6 or higher. The root cause is that the `oc` client uses the host system's `tar` utility to stream contents to OpenShift, which on 4.6 and higher uses a newer version of tar to unpack the streamed contents. Given that RHEL 7 is currently in the "Maintenance Support 2" phase of its lifecycle and this bug does not have the "Urgent" priority, this issue will likely not be addressed in a future release RHEL 7 and may not be fixed in RHEL 8. Using `oc start-build --from-dir` instead of `--from-file` appears to work around this issue. *** This bug has been marked as a duplicate of bug 1952929 *** |