Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
It looks like the tlb flush couldn't be delivered, this looks quite unrelated to khugepaged. Could you verify that the NMI watchdog is enabled by checking `grep NMI /proc/interrupts` increasing?
If it's something keeping irqs disabled, I guess similar issues would happen if the system was swapping and IPI had to be delivered for other reasons. Did you try some swapping workload, does that hang or not?
Can we check the source of ghgfs to search for IPI delivery or paths that keeps irq disabled?
The only bug that could lead to high khugepaged utilization was a compaction bug that has been fixed in kernel-2.6.32-169.el6 but I don't see compaction in the above stack traces. It may still be worth trying with a more recent RHEL6.2 kernel just in case it's related to that but it doesn't look like that.
The output of 'grep NMI /proc/interrupts' on r03n32:
NMI: 2816 2812 2811 2811 2811 2811 2810 2810 2811 2812 2810 2810 2811 2811 2810 2810 Non-maskable interrupts
and climbing when there is load on the compute node.
There is no swap space defined on r03n32.
Comment 5RHEL Program Management
2012-05-03 05:28:19 UTC
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.
Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.
Jes,
the problem is still there, although it does not occur every day.
We did not yet upgrade to another kernel version. Which version would you recommend?
Best regards
Dieter
I think this bugzilla can be closed. I can't provide the required information, as I simply don't have them.
I would close it myself, but the system so far does not allow me to do so.