Bug 1951292

Summary: irqbalance: FTBFS in runoneshot.sh (probably under load)
Product: Red Hat Enterprise Linux 9 Reporter: Mohan Boddu <mboddu>
Component: irqbalanceAssignee: Nobody <nobody>
Status: CLOSED CURRENTRELEASE QA Contact: Jiri Dluhos <jdluhos>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: CentOS StreamCC: bstinson, carl, fweimer, jeder, jshortt, jwboyer, ruyang, rvr
Target Milestone: betaKeywords: Reopened, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: irqbalance-1.8.0-2.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-07 21:55:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1951115, 1951392    

Description Mohan Boddu 2021-04-19 21:24:36 UTC
irqbalance failed to build from source in Red Hat Enterprise Linux 9 CentOS Stream

https://kojihub.stream.rdu2.redhat.com//koji/taskinfo?taskID=247641


For details on the mass rebuild see:

Please fix irqbalance at your earliest convenience and set the bug's status to
ASSIGNED when you start fixing it.

Comment 1 Kairui Song 2021-04-26 07:40:38 UTC
Strangely I can't reproduce this issue if I try to rebuild it, tried several times:

https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=252569
https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=252575
https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=252581

Maybe just some server side error? Can we just rebuild it and close the bug?

Comment 2 Kairui Song 2021-04-27 19:10:02 UTC
I simply did a rebuild and it just work, I guess it's just a temporary server error, closing the bug.

Comment 3 Florian Weimer 2021-07-10 08:21:52 UTC
I see failures when the builder is under load. Just doing multiple scratch builds of the package in parallel seems to be enough to trigger it.

make[3]: Entering directory '/builddir/build/BUILD/irqbalance-1.7.0/tests'
FAIL: runoneshot.sh
============================================================================
Testsuite summary for irqbalance 1.7.0
============================================================================
# TOTAL: 1
# PASS:  0
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
============================================================================
See tests/test-suite.log
============================================================================
make[3]: *** [Makefile:521: test-suite.log] Error 1
make[3]: Leaving directory '/builddir/build/BUILD/irqbalance-1.7.0/tests'

I saw only failures on x86_64 and i686, so it may be the case that the abstract socket fallback code breaks due to parallel execution:

        /*
         * First try to create a file-based socket in tmpfs.  If that doesn't
         * succeed, fall back to an abstract socket (non file-based).
         */
        addr.sun_family = AF_UNIX;
        snprintf(socket_name, 64, "%s/%s%d.sock", SOCKET_TMPFS, SOCKET_PATH, getpid());
        strncpy(addr.sun_path, socket_name, sizeof(addr.sun_path));
        if (bind(socket_fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
                log(TO_ALL, LOG_WARNING, "Daemon couldn't be bound to the file-based socket.\n");

                /* Try binding to abstract */
                memset(&addr, 0, sizeof(struct sockaddr_un));
                addr.sun_family = AF_UNIX;
                if (bind(socket_fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
                        log(TO_ALL, LOG_WARNING, "Daemon couldn't be bound to the abstract socket, bailing out.\n");
                        return 1;
                }
        }

The abstract namespace is shared between chroots.

Comment 5 Florian Weimer 2021-07-29 14:32:17 UTC
I still think this is a real package bug (not just a build issue), but the build failure is sufficient rare that I do not think we need to track this for the upcoming mass rebuild.

Comment 7 Kairui Song 2021-08-02 06:50:22 UTC
As far as I know, the socket was introduced to support the irqbalance UI, and it's only used by the irqbalance UI.
But the UI component was never installed in RHEL / Fedora, and there is no SELinux rule for any other service to use that socket. I think we can just simply disable that socket when UI is not used.

Comment 8 Jiri Dluhos 2021-08-24 21:22:26 UTC
Checked that the irqbalance source package can be built from source RPM without problems; the patch is applied; from output of netstat I conclude that the socket really does not get open:

Test with older irqbalance (on fedora machine):

$ rpm -qi irqbalance | head -n 5
Name        : irqbalance
Epoch       : 2
Version     : 1.7.0
Release     : 5.fc34
Architecture: x86_64

$ sudo netstat -xpan | grep irqbalance
unix  2      [ ACC ]     STREAM     LISTENING     18948    1213/irqbalance      /run/irqbalance/irqbalance1213.sock
unix  2      [ ]         DGRAM                    18947    1213/irqbalance      
unix  3      [ ]         STREAM     CONNECTED     21887    1213/irqbalance      

With the patched irqbalance:

# rpm -qi irqbalance | head -n 5
Name        : irqbalance
Epoch       : 2
Version     : 1.8.0
Release     : 3.el9
Architecture: x86_64

# sudo netstat -xpan | grep irqbalance
unix  3      [ ]         STREAM     CONNECTED     29835    881/irqbalance       
unix  2      [ ]         DGRAM                    28957    881/irqbalance

Setting Verified:Tested.

Comment 11 Jiri Dluhos 2021-08-25 18:36:17 UTC
As the new irqbalance is already in the RHEL9 repo (more precisely, irqbalance-1.8.0-3 or better), we can set VERIFIED. Thanks to everyone involved!

Comment 15 Red Hat Bugzilla 2023-09-15 01:05:19 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days