RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2097871 - irqbalance crashes with error "double free or corruption (!prev)"
Summary: irqbalance crashes with error "double free or corruption (!prev)"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: irqbalance
Version: CentOS Stream
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: ltao
QA Contact: Jiri Dluhos
URL:
Whiteboard:
Depends On: 2098635
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-16 19:12 UTC by Andrew Schorr
Modified: 2022-11-15 13:10 UTC (History)
5 users (show)

Fixed In Version: irqbalance-1.9.0-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-11-15 11:18:28 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-125526 0 None None None 2022-06-16 19:20:43 UTC
Red Hat Product Errata RHBA-2022:8328 0 None None None 2022-11-15 11:18:43 UTC

Description Andrew Schorr 2022-06-16 19:12:22 UTC
Description of problem:
The irqbalance service is crashing with an error message
saying "double free or corruption (!prev)"

Version-Release number of selected component (if applicable):
irqbalance-1.8.0-5.el9.x86_64


How reproducible:
It's fairly consistent at the moment.

Steps to Reproduce:
1. systemctl restart irqbalance
2.
3.

Actual results:
It crashes with the error "double free or corruption (!prev)" and dumps core.

Expected results:
It should not crash.

Additional info:
I am using a policyscript. So in /etc/sysconfig/irqbalance, I have:
IRQBALANCE_ARGS=--policyscript=/usr/local/etc/irqbalance_policyscript.sh
The script outputs "ban=true" for a network interrupts where I'd like to
manage the cpu affinity manually.

I don't know whether that's causing the problem. Here's the gdb backtrace rom the core dump:

Reading symbols from /usr/sbin/irqbalance...
Reading symbols from /root/.cache/debuginfod_client/0b54d8d3261a382f9501ab1cd5528948ca591faf/debuginfo...
[New LWP 131033]
[New LWP 141488]

warning: Section `.reg-xstate/131033' in core file too small.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/irqbalance --foreground --policyscript=/usr/local/etc/irqbalance_poli'.
Program terminated with signal SIGABRT, Aborted.

warning: Section `.reg-xstate/131033' in core file too small.
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
[Current thread is 1 (Thread 0x7f42bceb3780 (LWP 131033))]
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007f42bcfd25e3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007f42bcf85d56 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007f42bcf58833 in __GI_abort () at abort.c:79
#4  0x00007f42bcfc65b7 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f42bd0ea59a "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#5  0x00007f42bcfdc63c in malloc_printerr (str=str@entry=0x7f42bd0ed160 "double free or corruption (!prev)") at malloc.c:5536
#6  0x00007f42bcfde37c in _int_free (av=0x7f42bd128c80 <main_arena>, p=0x55a4d83b75c0, have_lock=<optimized out>) at malloc.c:4479
#7  0x00007f42bcfe09c5 in __GI___libc_free (mem=<optimized out>) at malloc.c:3279
#8  0x000055a4d7345f01 in for_each_irq (data=0x0, cb=0x55a4d7342b20 <remove_no_existing_irq>, list=<optimized out>)
    at /usr/src/debug/irqbalance-1.8.0-5.el9.x86_64/classify.c:798
#9  clear_no_existing_irqs () at /usr/src/debug/irqbalance-1.8.0-5.el9.x86_64/classify.c:893
#10 parse_proc_interrupts () at /usr/src/debug/irqbalance-1.8.0-5.el9.x86_64/procinterrupts.c:358
#11 0x000055a4d734733b in scan (data=data@entry=0x0) at /usr/src/debug/irqbalance-1.8.0-5.el9.x86_64/irqbalance.c:316
#12 0x00007f42bd2755a1 in g_timeout_dispatch (source=0x55a4d83c5350, callback=0x55a4d73472c0 <scan>, user_data=0x0) at ../glib/gmain.c:4889
#13 0x00007f42bd274d4f in g_main_dispatch (context=0x55a4d833c230) at ../glib/gmain.c:3337
#14 g_main_context_dispatch (context=0x55a4d833c230) at ../glib/gmain.c:4055
#15 0x00007f42bd2c9608 in g_main_context_iterate.constprop.0 (context=0x55a4d833c230, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>)
    at ../glib/gmain.c:4131
#16 0x00007f42bd274463 in g_main_loop_run (loop=0x55a4d8339710) at ../glib/gmain.c:4329
#17 0x000055a4d7340f0c in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/irqbalance-1.8.0-5.el9.x86_64/irqbalance.c:706
(gdb)

Comment 1 Andrew Schorr 2022-06-16 19:54:34 UTC
From valgrind /usr/sbin/irqbalance --foreground $IRQBALANCE_ARGS:

==142014== Memcheck, a memory error detector
==142014== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==142014== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==142014== Command: /usr/sbin/irqbalance --foreground --policyscript=/usr/local/etc/irqbalance_policyscript.sh
==142014== 
==142014== Invalid read of size 4
==142014==    at 0x10B3A4: compare_ints (classify.c:256)
==142014==    by 0x48C5950: g_list_find_custom (glist.c:927)
==142014==    by 0x10B749: get_irq_info (classify.c:812)
==142014==    by 0x10FD79: parse_proc_interrupts (procinterrupts.c:302)
==142014==    by 0x11133A: scan (irqbalance.c:316)
==142014==    by 0x48CB5A0: g_timeout_dispatch (gmain.c:4889)
==142014==    by 0x48CAD4E: UnknownInlinedFun (gmain.c:3337)
==142014==    by 0x48CAD4E: g_main_context_dispatch (gmain.c:4055)
==142014==    by 0x491F607: g_main_context_iterate.constprop.0 (gmain.c:4131)
==142014==    by 0x48CA462: g_main_loop_run (gmain.c:4329)
==142014==    by 0x10AF0B: main (irqbalance.c:706)
==142014==  Address 0x57fa2e0 is 0 bytes inside a block of size 592 free'd
==142014==    at 0x48470E4: free (vg_replace_malloc.c:872)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:798)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:893)
==142014==    by 0x10FF00: parse_proc_interrupts (procinterrupts.c:358)
==142014==    by 0x10AED0: main (irqbalance.c:694)
==142014==  Block was alloc'd at
==142014==    at 0x4849464: calloc (vg_replace_malloc.c:1328)
==142014==    by 0x10B99E: UnknownInlinedFun (classify.c:269)
==142014==    by 0x10B99E: __add_banned_irq (classify.c:259)
==142014==    by 0x10F6BE: add_new_irq (classify.c:615)
==142014==    by 0x10F842: build_one_dev_entry (classify.c:654)
==142014==    by 0x10FA75: build_dev_irqs (classify.c:743)
==142014==    by 0x10ADD8: UnknownInlinedFun (classify.c:783)
==142014==    by 0x10ADD8: UnknownInlinedFun (irqbalance.c:242)
==142014==    by 0x10ADD8: main (irqbalance.c:664)
==142014== 
==142014== Invalid read of size 4
==142014==    at 0x10CB2E: remove_no_existing_irq (classify.c:865)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:798)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:893)
==142014==    by 0x10FF00: parse_proc_interrupts (procinterrupts.c:358)
==142014==    by 0x11133A: scan (irqbalance.c:316)
==142014==    by 0x48CB5A0: g_timeout_dispatch (gmain.c:4889)
==142014==    by 0x48CAD4E: UnknownInlinedFun (gmain.c:3337)
==142014==    by 0x48CAD4E: g_main_context_dispatch (gmain.c:4055)
==142014==    by 0x491F607: g_main_context_iterate.constprop.0 (gmain.c:4131)
==142014==    by 0x48CA462: g_main_loop_run (gmain.c:4329)
==142014==    by 0x10AF0B: main (irqbalance.c:706)
==142014==  Address 0x57fa51c is 572 bytes inside a block of size 592 free'd
==142014==    at 0x48470E4: free (vg_replace_malloc.c:872)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:798)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:893)
==142014==    by 0x10FF00: parse_proc_interrupts (procinterrupts.c:358)
==142014==    by 0x10AED0: main (irqbalance.c:694)
==142014==  Block was alloc'd at
==142014==    at 0x4849464: calloc (vg_replace_malloc.c:1328)
==142014==    by 0x10B99E: UnknownInlinedFun (classify.c:269)
==142014==    by 0x10B99E: __add_banned_irq (classify.c:259)
==142014==    by 0x10F6BE: add_new_irq (classify.c:615)
==142014==    by 0x10F842: build_one_dev_entry (classify.c:654)
==142014==    by 0x10FA75: build_dev_irqs (classify.c:743)
==142014==    by 0x10ADD8: UnknownInlinedFun (classify.c:783)
==142014==    by 0x10ADD8: UnknownInlinedFun (irqbalance.c:242)
==142014==    by 0x10ADD8: main (irqbalance.c:664)
==142014== 
==142014== Invalid read of size 4
==142014==    at 0x10B3A6: compare_ints (classify.c:256)
==142014==    by 0x48C5950: g_list_find_custom (glist.c:927)
==142014==    by 0x10CB68: UnknownInlinedFun (classify.c:871)
==142014==    by 0x10CB68: remove_no_existing_irq (classify.c:861)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:798)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:893)
==142014==    by 0x10FF00: parse_proc_interrupts (procinterrupts.c:358)
==142014==    by 0x11133A: scan (irqbalance.c:316)
==142014==    by 0x48CB5A0: g_timeout_dispatch (gmain.c:4889)
==142014==    by 0x48CAD4E: UnknownInlinedFun (gmain.c:3337)
==142014==    by 0x48CAD4E: g_main_context_dispatch (gmain.c:4055)
==142014==    by 0x491F607: g_main_context_iterate.constprop.0 (gmain.c:4131)
==142014==    by 0x48CA462: g_main_loop_run (gmain.c:4329)
==142014==    by 0x10AF0B: main (irqbalance.c:706)
==142014==  Address 0x57fa2e0 is 0 bytes inside a block of size 592 free'd
==142014==    at 0x48470E4: free (vg_replace_malloc.c:872)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:798)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:893)
==142014==    by 0x10FF00: parse_proc_interrupts (procinterrupts.c:358)
==142014==    by 0x10AED0: main (irqbalance.c:694)
==142014==  Block was alloc'd at
==142014==    at 0x4849464: calloc (vg_replace_malloc.c:1328)
==142014==    by 0x10B99E: UnknownInlinedFun (classify.c:269)
==142014==    by 0x10B99E: __add_banned_irq (classify.c:259)
==142014==    by 0x10F6BE: add_new_irq (classify.c:615)
==142014==    by 0x10F842: build_one_dev_entry (classify.c:654)
==142014==    by 0x10FA75: build_dev_irqs (classify.c:743)
==142014==    by 0x10ADD8: UnknownInlinedFun (classify.c:783)
==142014==    by 0x10ADD8: UnknownInlinedFun (irqbalance.c:242)
==142014==    by 0x10ADD8: main (irqbalance.c:664)
==142014== 
==142014== Invalid read of size 4
==142014==    at 0x10B3A6: compare_ints (classify.c:256)
==142014==    by 0x48C5950: g_list_find_custom (glist.c:927)
==142014==    by 0x10CB95: UnknownInlinedFun (classify.c:875)
==142014==    by 0x10CB95: remove_no_existing_irq (classify.c:861)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:798)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:893)
==142014==    by 0x10FF00: parse_proc_interrupts (procinterrupts.c:358)
==142014==    by 0x11133A: scan (irqbalance.c:316)
==142014==    by 0x48CB5A0: g_timeout_dispatch (gmain.c:4889)
==142014==    by 0x48CAD4E: UnknownInlinedFun (gmain.c:3337)
==142014==    by 0x48CAD4E: g_main_context_dispatch (gmain.c:4055)
==142014==    by 0x491F607: g_main_context_iterate.constprop.0 (gmain.c:4131)
==142014==    by 0x48CA462: g_main_loop_run (gmain.c:4329)
==142014==    by 0x10AF0B: main (irqbalance.c:706)
==142014==  Address 0x57fa2e0 is 0 bytes inside a block of size 592 free'd
==142014==    at 0x48470E4: free (vg_replace_malloc.c:872)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:798)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:893)
==142014==    by 0x10FF00: parse_proc_interrupts (procinterrupts.c:358)
==142014==    by 0x10AED0: main (irqbalance.c:694)
==142014==  Block was alloc'd at
==142014==    at 0x4849464: calloc (vg_replace_malloc.c:1328)
==142014==    by 0x10B99E: UnknownInlinedFun (classify.c:269)
==142014==    by 0x10B99E: __add_banned_irq (classify.c:259)
==142014==    by 0x10F6BE: add_new_irq (classify.c:615)
==142014==    by 0x10F842: build_one_dev_entry (classify.c:654)
==142014==    by 0x10FA75: build_dev_irqs (classify.c:743)
==142014==    by 0x10ADD8: UnknownInlinedFun (classify.c:783)
==142014==    by 0x10ADD8: UnknownInlinedFun (irqbalance.c:242)
==142014==    by 0x10ADD8: main (irqbalance.c:664)
==142014== 
==142014== Invalid read of size 8
==142014==    at 0x10CBB1: UnknownInlinedFun (classify.c:879)
==142014==    by 0x10CBB1: remove_no_existing_irq (classify.c:861)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:798)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:893)
==142014==    by 0x10FF00: parse_proc_interrupts (procinterrupts.c:358)
==142014==    by 0x11133A: scan (irqbalance.c:316)
==142014==    by 0x48CB5A0: g_timeout_dispatch (gmain.c:4889)
==142014==    by 0x48CAD4E: UnknownInlinedFun (gmain.c:3337)
==142014==    by 0x48CAD4E: g_main_context_dispatch (gmain.c:4055)
==142014==    by 0x491F607: g_main_context_iterate.constprop.0 (gmain.c:4131)
==142014==    by 0x48CA462: g_main_loop_run (gmain.c:4329)
==142014==    by 0x10AF0B: main (irqbalance.c:706)
==142014==  Address 0x57fa520 is 576 bytes inside a block of size 592 free'd
==142014==    at 0x48470E4: free (vg_replace_malloc.c:872)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:798)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:893)
==142014==    by 0x10FF00: parse_proc_interrupts (procinterrupts.c:358)
==142014==    by 0x10AED0: main (irqbalance.c:694)
==142014==  Block was alloc'd at
==142014==    at 0x4849464: calloc (vg_replace_malloc.c:1328)
==142014==    by 0x10B99E: UnknownInlinedFun (classify.c:269)
==142014==    by 0x10B99E: __add_banned_irq (classify.c:259)
==142014==    by 0x10F6BE: add_new_irq (classify.c:615)
==142014==    by 0x10F842: build_one_dev_entry (classify.c:654)
==142014==    by 0x10FA75: build_dev_irqs (classify.c:743)
==142014==    by 0x10ADD8: UnknownInlinedFun (classify.c:783)
==142014==    by 0x10ADD8: UnknownInlinedFun (irqbalance.c:242)
==142014==    by 0x10ADD8: main (irqbalance.c:664)
==142014== 
==142014== Invalid free() / delete / delete[] / realloc()
==142014==    at 0x48470E4: free (vg_replace_malloc.c:872)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:798)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:893)
==142014==    by 0x10FF00: parse_proc_interrupts (procinterrupts.c:358)
==142014==    by 0x11133A: scan (irqbalance.c:316)
==142014==    by 0x48CB5A0: g_timeout_dispatch (gmain.c:4889)
==142014==    by 0x48CAD4E: UnknownInlinedFun (gmain.c:3337)
==142014==    by 0x48CAD4E: g_main_context_dispatch (gmain.c:4055)
==142014==    by 0x491F607: g_main_context_iterate.constprop.0 (gmain.c:4131)
==142014==    by 0x48CA462: g_main_loop_run (gmain.c:4329)
==142014==    by 0x10AF0B: main (irqbalance.c:706)
==142014==  Address 0x57fa2e0 is 0 bytes inside a block of size 592 free'd
==142014==    at 0x48470E4: free (vg_replace_malloc.c:872)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:798)
==142014==    by 0x10FF00: UnknownInlinedFun (classify.c:893)
==142014==    by 0x10FF00: parse_proc_interrupts (procinterrupts.c:358)
==142014==    by 0x10AED0: main (irqbalance.c:694)
==142014==  Block was alloc'd at
==142014==    at 0x4849464: calloc (vg_replace_malloc.c:1328)
==142014==    by 0x10B99E: UnknownInlinedFun (classify.c:269)
==142014==    by 0x10B99E: __add_banned_irq (classify.c:259)
==142014==    by 0x10F6BE: add_new_irq (classify.c:615)
==142014==    by 0x10F842: build_one_dev_entry (classify.c:654)
==142014==    by 0x10FA75: build_dev_irqs (classify.c:743)
==142014==    by 0x10ADD8: UnknownInlinedFun (classify.c:783)
==142014==    by 0x10ADD8: UnknownInlinedFun (irqbalance.c:242)
==142014==    by 0x10ADD8: main (irqbalance.c:664)
==142014== 
==142014== 
==142014== HEAP SUMMARY:
==142014==     in use at exit: 21,769 bytes in 68 blocks
==142014==   total heap usage: 15,118 allocs, 15,386 frees, 18,402,934 bytes allocated
==142014== 
==142014== LEAK SUMMARY:
==142014==    definitely lost: 0 bytes in 0 blocks
==142014==    indirectly lost: 0 bytes in 0 blocks
==142014==      possibly lost: 304 bytes in 1 blocks
==142014==    still reachable: 21,465 bytes in 67 blocks
==142014==         suppressed: 0 bytes in 0 blocks
==142014== Rerun with --leak-check=full to see details of leaked memory
==142014== 
==142014== For lists of detected and suppressed errors, rerun with: -s
==142014== ERROR SUMMARY: 79368 errors from 7 contexts (suppressed: 0 from 0)

(I ctrl-c'ed it to get it to exit, as it did not crash when running under valgrind).

Comment 2 Andrew Schorr 2022-06-16 19:59:21 UTC
Without the --policyscript argument, I don't see any valgrind errors.
I created the stupidest of policy scripts:

sh-5.1$ cat /tmp/policy.sh
#!/bin/sh

echo ban=true
sh-5.1$ 

Then I ran:
valgrind /usr/sbin/irqbalance --foreground --policyscript=/tmp/policy.sh

And this is what I got:

==152519== Memcheck, a memory error detector
==152519== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==152519== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==152519== Command: /usr/sbin/irqbalance --foreground --policyscript=/tmp/policy.sh
==152519== 
==152519== Invalid read of size 4
==152519==    at 0x10B3A4: compare_ints (classify.c:256)
==152519==    by 0x48C5950: g_list_find_custom (glist.c:927)
==152519==    by 0x10B749: get_irq_info (classify.c:812)
==152519==    by 0x10FD79: parse_proc_interrupts (procinterrupts.c:302)
==152519==    by 0x11133A: scan (irqbalance.c:316)
==152519==    by 0x48CB5A0: g_timeout_dispatch (gmain.c:4889)
==152519==    by 0x48CAD4E: UnknownInlinedFun (gmain.c:3337)
==152519==    by 0x48CAD4E: g_main_context_dispatch (gmain.c:4055)
==152519==    by 0x491F607: g_main_context_iterate.constprop.0 (gmain.c:4131)
==152519==    by 0x48CA462: g_main_loop_run (gmain.c:4329)
==152519==    by 0x10AF0B: main (irqbalance.c:706)
==152519==  Address 0x504b620 is 0 bytes inside a block of size 592 free'd
==152519==    at 0x48470E4: free (vg_replace_malloc.c:872)
==152519==    by 0x10FF00: UnknownInlinedFun (classify.c:798)
==152519==    by 0x10FF00: UnknownInlinedFun (classify.c:893)
==152519==    by 0x10FF00: parse_proc_interrupts (procinterrupts.c:358)
==152519==    by 0x10AED0: main (irqbalance.c:694)
==152519==  Block was alloc'd at
==152519==    at 0x4849464: calloc (vg_replace_malloc.c:1328)
==152519==    by 0x10B99E: UnknownInlinedFun (classify.c:269)
==152519==    by 0x10B99E: __add_banned_irq (classify.c:259)
==152519==    by 0x10F6BE: add_new_irq (classify.c:615)
==152519==    by 0x10FA17: build_one_dev_entry (classify.c:682)
==152519==    by 0x10FA75: build_dev_irqs (classify.c:743)
==152519==    by 0x10ADD8: UnknownInlinedFun (classify.c:783)
==152519==    by 0x10ADD8: UnknownInlinedFun (irqbalance.c:242)
==152519==    by 0x10ADD8: main (irqbalance.c:664)
==152519== 
==152519== Invalid read of size 4
==152519==    at 0x10CB2E: remove_no_existing_irq (classify.c:865)
==152519==    by 0x10FF00: UnknownInlinedFun (classify.c:798)
==152519==    by 0x10FF00: UnknownInlinedFun (classify.c:893)
==152519==    by 0x10FF00: parse_proc_interrupts (procinterrupts.c:358)
==152519==    by 0x11133A: scan (irqbalance.c:316)
==152519==    by 0x48CB5A0: g_timeout_dispatch (gmain.c:4889)
==152519==    by 0x48CAD4E: UnknownInlinedFun (gmain.c:3337)
==152519==    by 0x48CAD4E: g_main_context_dispatch (gmain.c:4055)
==152519==    by 0x491F607: g_main_context_iterate.constprop.0 (gmain.c:4131)
==152519==    by 0x48CA462: g_main_loop_run (gmain.c:4329)
==152519==    by 0x10AF0B: main (irqbalance.c:706)
==152519==  Address 0x504b85c is 572 bytes inside a block of size 592 free'd
==152519==    at 0x48470E4: free (vg_replace_malloc.c:872)
==152519==    by 0x10FF00: UnknownInlinedFun (classify.c:798)
==152519==    by 0x10FF00: UnknownInlinedFun (classify.c:893)
==152519==    by 0x10FF00: parse_proc_interrupts (procinterrupts.c:358)
==152519==    by 0x10AED0: main (irqbalance.c:694)
==152519==  Block was alloc'd at
==152519==    at 0x4849464: calloc (vg_replace_malloc.c:1328)
==152519==    by 0x10B99E: UnknownInlinedFun (classify.c:269)
==152519==    by 0x10B99E: __add_banned_irq (classify.c:259)
==152519==    by 0x10F6BE: add_new_irq (classify.c:615)
==152519==    by 0x10FA17: build_one_dev_entry (classify.c:682)
==152519==    by 0x10FA75: build_dev_irqs (classify.c:743)
==152519==    by 0x10ADD8: UnknownInlinedFun (classify.c:783)
==152519==    by 0x10ADD8: UnknownInlinedFun (irqbalance.c:242)
==152519==    by 0x10ADD8: main (irqbalance.c:664)
==152519== 
==152519== Invalid read of size 8
==152519==    at 0x10CBB1: UnknownInlinedFun (classify.c:879)
==152519==    by 0x10CBB1: remove_no_existing_irq (classify.c:861)
==152519==    by 0x10FF00: UnknownInlinedFun (classify.c:798)
==152519==    by 0x10FF00: UnknownInlinedFun (classify.c:893)
==152519==    by 0x10FF00: parse_proc_interrupts (procinterrupts.c:358)
==152519==    by 0x11133A: scan (irqbalance.c:316)
==152519==    by 0x48CB5A0: g_timeout_dispatch (gmain.c:4889)
==152519==    by 0x48CAD4E: UnknownInlinedFun (gmain.c:3337)
==152519==    by 0x48CAD4E: g_main_context_dispatch (gmain.c:4055)
==152519==    by 0x491F607: g_main_context_iterate.constprop.0 (gmain.c:4131)
==152519==    by 0x48CA462: g_main_loop_run (gmain.c:4329)
==152519==    by 0x10AF0B: main (irqbalance.c:706)
==152519==  Address 0x504b860 is 576 bytes inside a block of size 592 free'd
==152519==    at 0x48470E4: free (vg_replace_malloc.c:872)
==152519==    by 0x10FF00: UnknownInlinedFun (classify.c:798)
==152519==    by 0x10FF00: UnknownInlinedFun (classify.c:893)
==152519==    by 0x10FF00: parse_proc_interrupts (procinterrupts.c:358)
==152519==    by 0x10AED0: main (irqbalance.c:694)
==152519==  Block was alloc'd at
==152519==    at 0x4849464: calloc (vg_replace_malloc.c:1328)
==152519==    by 0x10B99E: UnknownInlinedFun (classify.c:269)
==152519==    by 0x10B99E: __add_banned_irq (classify.c:259)
==152519==    by 0x10F6BE: add_new_irq (classify.c:615)
==152519==    by 0x10FA17: build_one_dev_entry (classify.c:682)
==152519==    by 0x10FA75: build_dev_irqs (classify.c:743)
==152519==    by 0x10ADD8: UnknownInlinedFun (classify.c:783)
==152519==    by 0x10ADD8: UnknownInlinedFun (irqbalance.c:242)
==152519==    by 0x10ADD8: main (irqbalance.c:664)
==152519== 
==152519== Invalid free() / delete / delete[] / realloc()
==152519==    at 0x48470E4: free (vg_replace_malloc.c:872)
==152519==    by 0x10FF00: UnknownInlinedFun (classify.c:798)
==152519==    by 0x10FF00: UnknownInlinedFun (classify.c:893)
==152519==    by 0x10FF00: parse_proc_interrupts (procinterrupts.c:358)
==152519==    by 0x11133A: scan (irqbalance.c:316)
==152519==    by 0x48CB5A0: g_timeout_dispatch (gmain.c:4889)
==152519==    by 0x48CAD4E: UnknownInlinedFun (gmain.c:3337)
==152519==    by 0x48CAD4E: g_main_context_dispatch (gmain.c:4055)
==152519==    by 0x491F607: g_main_context_iterate.constprop.0 (gmain.c:4131)
==152519==    by 0x48CA462: g_main_loop_run (gmain.c:4329)
==152519==    by 0x10AF0B: main (irqbalance.c:706)
==152519==  Address 0x504b620 is 0 bytes inside a block of size 592 free'd
==152519==    at 0x48470E4: free (vg_replace_malloc.c:872)
==152519==    by 0x10FF00: UnknownInlinedFun (classify.c:798)
==152519==    by 0x10FF00: UnknownInlinedFun (classify.c:893)
==152519==    by 0x10FF00: parse_proc_interrupts (procinterrupts.c:358)
==152519==    by 0x10AED0: main (irqbalance.c:694)
==152519==  Block was alloc'd at
==152519==    at 0x4849464: calloc (vg_replace_malloc.c:1328)
==152519==    by 0x10B99E: UnknownInlinedFun (classify.c:269)
==152519==    by 0x10B99E: __add_banned_irq (classify.c:259)
==152519==    by 0x10F6BE: add_new_irq (classify.c:615)
==152519==    by 0x10FA17: build_one_dev_entry (classify.c:682)
==152519==    by 0x10FA75: build_dev_irqs (classify.c:743)
==152519==    by 0x10ADD8: UnknownInlinedFun (classify.c:783)
==152519==    by 0x10ADD8: UnknownInlinedFun (irqbalance.c:242)
==152519==    by 0x10ADD8: main (irqbalance.c:664)
==152519== 
^C==152519== Invalid free() / delete / delete[] / realloc()
==152519==    at 0x48470E4: free (vg_replace_malloc.c:872)
==152519==    by 0x10C73E: UnknownInlinedFun (classify.c:694)
==152519==    by 0x10C73E: UnknownInlinedFun (classify.c:798)
==152519==    by 0x10C73E: free_irq_db (classify.c:702)
==152519==    by 0x10AF3F: UnknownInlinedFun (irqbalance.c:249)
==152519==    by 0x10AF3F: main (irqbalance.c:711)
==152519==  Address 0x504b620 is 0 bytes inside a block of size 592 free'd
==152519==    at 0x48470E4: free (vg_replace_malloc.c:872)
==152519==    by 0x10FF00: UnknownInlinedFun (classify.c:798)
==152519==    by 0x10FF00: UnknownInlinedFun (classify.c:893)
==152519==    by 0x10FF00: parse_proc_interrupts (procinterrupts.c:358)
==152519==    by 0x10AED0: main (irqbalance.c:694)
==152519==  Block was alloc'd at
==152519==    at 0x4849464: calloc (vg_replace_malloc.c:1328)
==152519==    by 0x10B99E: UnknownInlinedFun (classify.c:269)
==152519==    by 0x10B99E: __add_banned_irq (classify.c:259)
==152519==    by 0x10F6BE: add_new_irq (classify.c:615)
==152519==    by 0x10FA17: build_one_dev_entry (classify.c:682)
==152519==    by 0x10FA75: build_dev_irqs (classify.c:743)
==152519==    by 0x10ADD8: UnknownInlinedFun (classify.c:783)
==152519==    by 0x10ADD8: UnknownInlinedFun (irqbalance.c:242)
==152519==    by 0x10ADD8: main (irqbalance.c:664)
==152519== 
==152519== 
==152519== HEAP SUMMARY:
==152519==     in use at exit: 21,769 bytes in 68 blocks
==152519==   total heap usage: 8,131 allocs, 8,644 frees, 11,912,097 bytes allocated
==152519== 
==152519== LEAK SUMMARY:
==152519==    definitely lost: 0 bytes in 0 blocks
==152519==    indirectly lost: 0 bytes in 0 blocks
==152519==      possibly lost: 304 bytes in 1 blocks
==152519==    still reachable: 21,465 bytes in 67 blocks
==152519==         suppressed: 0 bytes in 0 blocks
==152519== Rerun with --leak-check=full to see details of leaked memory
==152519== 
==152519== For lists of detected and suppressed errors, rerun with: -s
==152519== ERROR SUMMARY: 62969 errors from 5 contexts (suppressed: 0 from 0)

As above, I had to Ctrl-C to kill it, since it didn't seem to crash.

It seems clear that there's a bug related to policy scripts that emit ban=true.

Regards,
Andy

Comment 3 Andrew Schorr 2022-06-16 20:24:00 UTC
But that being said, I was unable to reproduce this in a qemu vm with 4 cpus and that same ban=true policy script, so maybe it depends on my hardware. I've got an AMD 7443P cpu.

Comment 4 ltao 2022-06-17 12:55:11 UTC
(In reply to Andrew Schorr from comment #3)
> But that being said, I was unable to reproduce this in a qemu vm with 4 cpus
> and that same ban=true policy script, so maybe it depends on my hardware.
> I've got an AMD 7443P cpu.

Hi Andrew,

Thanks for reporting the issue. However I couldn't reproduce it on my machine, with irqbalance-1.8.0-5.el9.x86_64 and upstream irqbalance, with the cmdline and policy scripts you provided:

[root@amd-ethanol-01 tmp]# valgrind /usr/sbin/irqbalance --foreground --policyscript=/tmp/policy.sh
==31315== Memcheck, a memory error detector
==31315== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==31315== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==31315== Command: /usr/sbin/irqbalance --foreground --policyscript=/tmp/policy.sh
==31315== 
^C==31315== 
==31315== HEAP SUMMARY:
==31315==     in use at exit: 21,769 bytes in 68 blocks
==31315==   total heap usage: 6,366 allocs, 6,298 frees, 13,048,565 bytes allocated
==31315== 
==31315== LEAK SUMMARY:
==31315==    definitely lost: 0 bytes in 0 blocks
==31315==    indirectly lost: 0 bytes in 0 blocks
==31315==      possibly lost: 304 bytes in 1 blocks
==31315==    still reachable: 21,465 bytes in 67 blocks
==31315==         suppressed: 0 bytes in 0 blocks
==31315== Rerun with --leak-check=full to see details of leaked memory
==31315== 
==31315== For lists of detected and suppressed errors, rerun with: -s
==31315== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

I don't know if it is related to cpu, which I used for testing is AMD EPYC 7601 32-Core Processor.

Here is my suggestion:

1) Try the upstream irqbalance(https://github.com/Irqbalance/irqbalance.git) to see if it is reproducible on your machine. If does, you can open an issue there.
2) Since it only happened on one machine, you can try another amd physical machines, if you have any, to see if reproducible.

If you have further findings, we can continue our discussion here. 

Thanks,
Tao Liu

Comment 5 Andrew Schorr 2022-06-17 13:31:14 UTC
Thanks for doing some investigation. I did notice that after a system reboot, irqbalance started successfully
without crashing. I then stopped it and had to run it a few times under valgrind before I started getting
the error. So I think it's somehow dependent on the state of the system. When I get a chance, I'll try
to duplicate with upstream irqbalance.

As for one machine: at the moment, I have only one running CentOS Stream 9, and I was restarting
irqbalance repeatedly to test some changes to my policy script. That's why I discovered the issue. I have
no idea whether it would happen on other systems, but I do think it somehow depends on restarting
irqbalance.

Thanks,
Andy

Comment 6 Andrew Schorr 2022-06-19 04:00:38 UTC
Current git master works fine. I ran git bisect and found the patch that fixes the problem.
The bug is fixed by this patch:

commit 066499ad5231a8a8d37f08a3af5dd6c38431ce6f
Author: liuchao173 <55137861+liuchao173.github.com>
Date:   Fri May 7 20:48:32 2021 +0800

    remove no existing irq in banned_irqs
    
    when a banned irq doesn't exist, it won't be removed from banned_irqs

 classify.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comment 7 ltao 2022-06-20 08:58:51 UTC
(In reply to Andrew Schorr from comment #6)
> Current git master works fine. I ran git bisect and found the patch that
> fixes the problem.
> The bug is fixed by this patch:
> 
> commit 066499ad5231a8a8d37f08a3af5dd6c38431ce6f
> Author: liuchao173 <55137861+liuchao173.github.com>
> Date:   Fri May 7 20:48:32 2021 +0800
> 
>     remove no existing irq in banned_irqs
>     
>     when a banned irq doesn't exist, it won't be removed from banned_irqs
> 
>  classify.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)

Hi Andrew,

Thanks for your great work! I created another bug for rebasing irqbalance to v1.9.0, which will have this patch integrated automatically. So this one can be fixed when the rebasing finishes.

Thanks,
Tao Liu

Comment 8 ltao 2022-07-01 05:30:58 UTC
This bz is fixed automatically when rebased to v1.9.0, because it already contains the patch which mentioned in comment6.

Comment 9 ltao 2022-07-19 10:46:21 UTC
Hi Jiri,

I think itm is also needed for this bz, to get release+ flag...

Thanks,
Tao Liu

Comment 10 Jiri Dluhos 2022-07-28 17:48:33 UTC
After some valgrinding, I would say that this bug is indeed fixed.

Comment 13 errata-xmlrpc 2022-11-15 11:18:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (irqbalance bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8328


Note You need to log in before you can comment on or make changes to this bug.