Bug 2138897
Summary: | Significant performance degradation after step over during debugging | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Simeon Andreev <simeon.andreev> |
Component: | java-11-openjdk | Assignee: | Andrew John Hughes <ahughes> |
Status: | CLOSED MIGRATED | QA Contact: | OpenJDK QA <java-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 9.0 | CC: | adinn, loskutov, mmillson, neugens, sgehwolf |
Target Milestone: | rc | Keywords: | MigratedToJIRA |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-09-12 23:31:53 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Simeon Andreev
2022-10-31 15:43:25 UTC
From the serviceability-dev mailing list: > Hi guys, > > I looked into into this one more time and agree with Chris. > I believe JDK-8187289 is correct and it fixed NotifyFramePop bug (the > fix includes regtest which shows the problem). > In the scenario described after calling NotifyFramePop we have to remain > in interp_only mode until the frame is popped from the stack to properly > clear the request. > Described performance issue is exactly what is JDK-8229012 about. > I suppose it's possible to fix it in the debugger level by eliminating > NotifyFramePop call. > > regards, > Alex So I assume to fix this bug, https://bugs.openjdk.org/browse/JDK-8229012 must be fixed. > So I assume to fix this bug, https://bugs.openjdk.org/browse/JDK-8229012 must be fixed.
Probably. I have been following this upstream and am sure Chris knows what he is doing. I will update here as Chris makes further progress.
Issue JDK-8229012 is not currently progressing upstream, nor is the related issue JDK-6960970 whose fix Chris Plummer expected would also resolve the former bug. There is perhaps still some prospect for an independent resolution of this specific bug, as opposed to a more general one for JDK-6960970. Chris mentioned using a different mechanism than NotifyFramePop to track completion of the call. However, that would require non-trivial changes which i expect it would be hard to justify backporting to jdk17, especially if it involved adding compiler support and return barrier support as mentioned in JDK-6960970. I'm glad to hear that applying the event controller patch you propose above which discounts some frame pop event enabled settings is allowing your test case to complete without causing any follow-on problems. However, I don't see how this is guaranteed to be a correct solution for all cases. Do you have any reason to believe it is correct to other than the heuristic that it fixes this specific test case? (In reply to Andrew Dinn from comment #5) > I'm glad to hear that applying the event controller patch you propose above > which discounts some frame pop event enabled settings is allowing your test > case to complete without causing any follow-on problems. However, I don't > see how this is guaranteed to be a correct solution for all cases. Do you > have any reason to believe it is correct to other than the heuristic that it > fixes this specific test case? I assume you mean this change: diff --git a/src/hotspot/share/prims/jvmtiEventController.cpp b/src/hotspot/share/prims/jvmtiEventController.cpp index 38de6383cc..22d0f40c97 100644 --- a/src/hotspot/share/prims/jvmtiEventController.cpp +++ b/src/hotspot/share/prims/jvmtiEventController.cpp @@ -523,7 +523,7 @@ JvmtiEventControllerPrivate::recompute_thread_enabled(JvmtiThreadState *state) { JvmtiEnvThreadStateIterator it(state); for (JvmtiEnvThreadState* ets = it.first(); ets != NULL; ets = it.next(ets)) { any_env_enabled |= recompute_env_thread_enabled(ets, state); - has_frame_pops |= ets->has_frame_pops(); + //has_frame_pops |= ets->has_frame_pops(); } } We have decided to live with whatever bugs are fixed in https://bugs.openjdk.org/browse/JDK-8187289 as Java 7 and Java 8 had them. The performance degradation from the bug fix is too big for our application. Hopefully a fix is done in the JDK so we don't have to patch out this change. (In reply to Andrew Dinn from comment #5) > I'm glad to hear that applying the event controller patch you propose above > which discounts some frame pop event enabled settings is allowing your test > case to complete without causing any follow-on problems. However, I don't > see how this is guaranteed to be a correct solution for all cases. Do you > have any reason to believe it is correct to other than the heuristic that it > fixes this specific test case? To be clear, we do NOT have any patch, we've just reverted fix from https://bugs.openjdk.org/browse/JDK-8187289 for our JDK build. Whatever was fixed there seem to be of lower impact to us as the fix/performance regression caused by the fix of JDK-8187289. So we still have no real solution for the performance degradation and still wait for a proper fix of https://bugs.openjdk.org/browse/JDK-8229012. (In reply to Andrey Loskutov from comment #7) > To be clear, we do NOT have any patch, we've just reverted fix from > https://bugs.openjdk.org/browse/JDK-8187289 for our JDK build. To be more precise, you cherry-picked one line from that patch to revert. That's not the same as reverting the whole patch and, indeed, is arguably much more risky since it creates a situation that is neither the status quo before the related bug was patched nor the subsequent status determined by the developer of the patch to be an improvement on that original status quo. I would not recommend taking such a step without being able to articulate with, at the least, a fair degree of clarity and certainty how this might i) limit the corrective measures the patch was intended to produce ii) reinstate the problem the patch was meant to fix and iii) risk the introduction of other problems. > Whatever was fixed there seem to be of lower impact to us as the > fix/performance regression caused by the fix of JDK-8187289. How do you know that? The patch explicitly enables propagation of pop frame events through to the logic which decides whether to enter interpreter mode. Disabling propagation might well fit this case. What cases might it not fit? How can you be sure this cannot backfire or, if it can, that the outcome will be of lower impact? > So we still have no real solution for the performance degradation and still > wait for a proper fix of https://bugs.openjdk.org/browse/JDK-8229012. I am afraid so. It does appear from the explanation provided by Chris Plummer in JDK-8229012 that root problem here, i.e. request for a pop frame notify in the initState() call, is almost always redundant and certainly is so in this case. That is very unfortunate. However, there is a case where it is not redundant (single stepping at a return) and I assume that is why it is still present. > To be more precise, you cherry-picked one line from that patch to revert. No, we have reverted the entire commit: Revert "8187289: NotifyFramePop request is not cleared if JVMTI_EVENT_FRAME_POP is disabled" This reverts commit dbb9eb5b9c3ddcbcfaae8ac4ca21760c9c99a15c. > How do you know that? The performance loss of entering interpreter only mode was deemed not acceptable by product owners of relevant areas in our product. Its great that a bug was fixed in the debugger. But the user cannot resort to debugging due to their code running for 10+ minutes instead of 1 minute. It was decided to live with debugging and maybe running into trouble, instead of not being able to debug. (In reply to Simeon Andreev from comment #10) > > To be more precise, you cherry-picked one line from that patch to revert. > > No, we have reverted the entire commit: > > Revert "8187289: NotifyFramePop request is not cleared if > JVMTI_EVENT_FRAME_POP is disabled" > > This reverts commit dbb9eb5b9c3ddcbcfaae8ac4ca21760c9c99a15c. My apologies for misunderstanding the comment in the initial report which only mentioned this one line. (In reply to Andrew Dinn from comment #5) > Issue JDK-8229012 is not currently progressing upstream, nor is the related > issue JDK-6960970 whose fix Chris Plummer expected would also resolve the > former bug. > > There is perhaps still some prospect for an independent resolution of this > specific bug, as opposed to a more general one for JDK-6960970. Chris > mentioned using a different mechanism than NotifyFramePop to track > completion of the call. You mean this comment? https://bugs.openjdk.org/browse/JDK-8229012?focusedCommentId=14474180&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14474180 > However, that would require non-trivial changes > which i expect it would be hard to justify backporting to jdk17, especially > if it involved adding compiler support and return barrier support as > mentioned in JDK-6960970. Let ask it differently: do we have a chance to get working patch for *any* future Java version, and what is the estimated time frame for that? We are kind of forced to update to new Java 17 version to get a crash fix for https://bugzilla.redhat.com/show_bug.cgi?id=2162365, but that would mean that we will not be able to use "official" RH build, and patch the JDK manually again to get the offending commit dbb9eb5b9c3ddcbcfaae8ac4ca21760c9c99a15c out of "our" JDK 17 build. Are there any chances to get *some* solution in a Java 17 (which is still latest LTS release as of today)? (In reply to Andrey Loskutov from comment #12) > Let ask it differently: do we have a chance to get working patch for *any* > future Java version, and what is the estimated time frame for that? > > We are kind of forced to update to new Java 17 version to get a crash fix > for https://bugzilla.redhat.com/show_bug.cgi?id=2162365, but that would mean > that we will not be able to use "official" RH build, and patch the JDK > manually again to get the offending commit > dbb9eb5b9c3ddcbcfaae8ac4ca21760c9c99a15c out of "our" JDK 17 build. > > Are there any chances to get *some* solution in a Java 17 (which is still > latest LTS release as of today)? I am afraid there is no clear-cut answer to that: 1) There is a high chance that one of the upstream Oracle devs responsible for this part of OpenJDK will provide a fix for this problem in the dev branch. Red Hat does not have staff who are actively developing this part of the code base so we are unlikely to come up with a patch without the co-operation of Oracle devs. We cannot specify a time frame for arrival of any patch as those same devs are very busy with many other tasks and this has not been ranked as a high priority issue (P3 -- lowest is P4). So, it might arrive in jdk21 but it might arrive much later. 2) If a patch were to be provided in jdk21 (or even the non-LTS releases jdk22-24) then it might well be possible to backport it to jdk17u. There would certainly be a much stronger case for a backport to 17u than to 11u or 8u as the case for stabilising a backport branch grows significantly as further LTS releases intervene between it and head. There is really no way to quantify the likelihood in the absence of a specific patch. Any decision would require an assessment of the benefit the patch might provide to one or more users traded off against the actual possibility or potential risk of it negatively affecting existing deployments. Clearly, if a patch causes known changes in behaviour other than just to fix the reported problem than that could well count against a backport (n.b. there have been rare cases where the broad usefulness of a backport has dominated). On the other hand, even where there is no evidence of any behavioural change a patch might be rejected because it involves non-trivial modifications to the code or the code design, with a too high consequent risk of accidentally injecting new defects. Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |