RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1046594 - Libvirtd crashed when loading/unloading ixgbe(82599) module repeatedly
Summary: Libvirtd crashed when loading/unloading ixgbe(82599) module repeatedly
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: netcf
Version: 7.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 7.0
Assignee: Laine Stump
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-26 09:07 UTC by Hu Jianwei
Modified: 2014-06-18 08:35 UTC (History)
17 users (show)

Fixed In Version: netcf-0.2.3-6.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-13 11:13:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
all_thread_backtrace (9.63 KB, text/plain)
2013-12-26 09:07 UTC, Hu Jianwei
no flags Details
valgrind_libvirtd (5.44 KB, text/plain)
2013-12-26 09:08 UTC, Hu Jianwei
no flags Details
kernel message log (61.89 KB, text/x-log)
2013-12-26 09:08 UTC, Hu Jianwei
no flags Details
libvirtd log (600.18 KB, text/plain)
2013-12-26 09:09 UTC, Hu Jianwei
no flags Details

Description Hu Jianwei 2013-12-26 09:07:49 UTC
Created attachment 841790 [details]
all_thread_backtrace

Description of problem:
Libvirtd crashed when loading/unloading ixgbe(82599) module repeatedly

Version-Release number of selected component (if applicable):
libvirt-1.1.1-16.el7.x86_64
qemu-kvm-1.5.3-30.el7.x86_64
kernel-3.10.0-64.el7.x86_64

How reproducible:
90%

Steps to Reproduce:
1. load/unload ixgbe module many times.
[root@ibm-x3850x5-05 ~]#for i in {1..10}; do echo $i;modprobe -r ixgbe; sleep 1; modprobe ixgbe max_vfs=63; sleep 1; done

2. Check the libvirtd status
[root@ibm-x3850x5-05 ~]# systemctl status libvirtd
libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled)
   Active: failed (Result: signal) since Thu 2013-12-26 01:15:51 EST; 1min 38s ago
  Process: 15267 ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS (code=killed, signal=ABRT)
 Main PID: 15267 (code=killed, signal=ABRT)

Dec 26 01:15:51 ibm-x3850x5-05.qe.lab.eng.nay.redhat.com libvirtd[15267]: 7f0e34000000-7f0e34623000 rw-p 00000000 00:00 0
Dec 26 01:15:51 ibm-x3850x5-05.qe.lab.eng.nay.redhat.com libvirtd[15267]: 7f0e34623000-7f0e38000000 ---p 00000000 00:00 0
Dec 26 01:15:51 ibm-x3850x5-05.qe.lab.eng.nay.redhat.com libvirtd[15267]: 7f0e38000000-7f0e38036000 rw-p 00000000 00:00 0
Dec 26 01:15:51 ibm-x3850x5-05.qe.lab.eng.nay.redhat.com systemd[1]: libvirtd.service: main process exited, code=killed, status=6/ABRT
Dec 26 01:15:51 ibm-x3850x5-05.qe.lab.eng.nay.redhat.com systemd[1]: Unit libvirtd.service entered failed state.
Dec 26 01:16:44 ibm-x3850x5-05.qe.lab.eng.nay.redhat.com dnsmasq[16463]: reading /etc/resolv.conf
Dec 26 01:16:44 ibm-x3850x5-05.qe.lab.eng.nay.redhat.com dnsmasq[16463]: using nameserver 10.66.127.10#53
Dec 26 01:16:44 ibm-x3850x5-05.qe.lab.eng.nay.redhat.com dnsmasq[16463]: using nameserver 10.66.78.111#53
Dec 26 01:16:44 ibm-x3850x5-05.qe.lab.eng.nay.redhat.com dnsmasq[16463]: using nameserver 10.66.78.117#53
Dec 26 01:16:44 ibm-x3850x5-05.qe.lab.eng.nay.redhat.com dnsmasq[16463]: using local addresses only for unqualified names

Actual results:
As shown above steps. Detailed log see attachments

Expected results:
Reload ixgbe module should not affect libvirtd's status.

Comment 1 Hu Jianwei 2013-12-26 09:08:18 UTC
Created attachment 841791 [details]
valgrind_libvirtd

Comment 2 Hu Jianwei 2013-12-26 09:08:57 UTC
Created attachment 841792 [details]
kernel message log

Comment 3 Hu Jianwei 2013-12-26 09:09:31 UTC
Created attachment 841793 [details]
libvirtd log

Comment 5 Laine Stump 2014-01-07 15:19:21 UTC
I was able to reproduce this on Fedora 20 (which has the same netcf version). The problem is the function aug_get_mac() - it doesn't initialize the local char *path to NULL, but then unconditionally frees it on error; the problem is that one error condition can be encountered prior to path getting set.

I'm changing the component to netcf and will be posting a patch upstream shortly.

Comment 6 Laine Stump 2014-01-07 15:22:09 UTC
Note that repproducing the bug will be quicker if you run this loop concurrent with the loop that is loading/unloading the network card driver (btw, on my system I had an 82576 card / igb driver, so you don't necessarily need an 82559):

  while true; do virsh iface-list; done

Comment 7 Laine Stump 2014-01-08 11:12:54 UTC
A patch has been posted to the upstream netcf mailing list:

https://lists.fedorahosted.org/pipermail/netcf-devel/2014-January/000853.html

I tested this patch by attaching gdb to the libvirtd process (so I would easily see when it crashed) and running these two shell scripts simultaneously in different windows:

---

  while true; do virsh iface-list; done
---

  i=1
  while true; do
    echo $i; i=$(expr $i + 1)
    modprobe -r igb
    usleep 1
    modprobe igb max_vfs=7
    usleep 1
  done

Without the patched netcf, libvirtd would crash within < 30 seconds. Without the patch I ran the test for several minutes with no crashes (the virsh iface-list will periodically fail due to interface config being in an inconsistent state through multiple calls to the virInterface API, but that is an unsolvable (and acceptable) problem.

Comment 8 Laine Stump 2014-01-08 14:27:15 UTC
Pushed this upstream:

commit 8ed36d22fbc792474ca9c3b06c8a326b1fb5af08
Author: Laine Stump <laine>
Date:   Tue Jan 7 20:12:06 2014 +0200

    eliminate use of uninitialized data when getting mac address
    
    If the call to get_augeas() at the top of aug_get_mac() failed, we
    would goto error and FREE(path), which would not have been
    initialized. And if by some magic of fate we happened to get past
    that, we would return garbage for the return code, since r was also
    not initialized. This patch initializes both path and r to fix the
    crash documented in Bug 1046594.
    
    Although it doesn't directly impact the referenced bug, a quick audit
    of other functions in the same file showed that defnode() had the same
    problem with uninitialized "r". Beyond that, I also defensively
    initialized the pointer to mac address to NULL both in aug_get_mac()
    as well as two of its callers, to make future audits of the code
    easier, and to shut up both valgrind and whatever static analyzers
    might be run on the code.

Comment 9 Laine Stump 2014-01-21 13:16:05 UTC
Hanns-Joachim Uhl: Why do you consider this BZ to block an update to the ixbge driver? Did you encounter this crash while testing?

The problem is actually driver-agnostic (so it is just as likely to happen with the old isgbe as the new ixgbe). It can be completely avoided when testing by simply not running virt-manager (which happens to frequently call netcf) during the testing of the new driver (or at least during parts that involve unloading/loading a netdev driver). Depending on your tests, that may be less than ideal, but likely not a large problem.

(That said, a netcf build is likely coming this week, but lack of a build probably shouldn't prevent testing of the ixgbe driver)

Comment 10 Laine Stump 2014-01-21 13:20:09 UTC
To test that the fix is working properly, you will need to restart libvirtd.service after updating the netcf package (there is no method of automatically triggering that other than mandating offline updates). So:


   yum update netcf-*.x86_64.rpm
   systemctl restart libvirtd.service
   (now run the test scripts in Comment 7)

Comment 14 John Ronciak 2014-01-22 22:05:32 UTC
So how is then blocking BZ "Bug 726818 - [Intel 7.0 FEAT] Update ixgbe driver to latest upstream."?  It should not be any more correct?

Comment 15 Hanns-Joachim Uhl 2014-01-23 09:06:54 UTC
(In reply to John Ronciak from comment #14)
> So how is then blocking BZ "Bug 726818 - [Intel 7.0 FEAT] Update ixgbe
> driver to latest upstream."?  It should not be any more correct?
.
... correct ...

Comment 16 Laine Stump 2014-01-23 10:21:21 UTC
Since Bug 965845 was also in the "blocks" list, and it is just a duplicate of Bug 726818, I've also removed it from the blocks list.

Comment 17 Jincheng Miao 2014-01-24 08:38:04 UTC
In latest netcf-0.2.3-6.el7.x86_64, run with those two scripts in Comment 7, there is no crash happened.
So I choose to change the status to VERIFIED.

Comment 18 Ludek Smid 2014-06-13 11:13:40 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.


Note You need to log in before you can comment on or make changes to this bug.