Bug 1414955 - Virsh hangs when running cyclictests for 24 hours
Summary: Virsh hangs when running cyclictests for 24 hours
Keywords:
Status: CLOSED DUPLICATE of bug 1403265
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel-rt
Version: 7.4
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Clark Williams
QA Contact: Jiri Kastner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-19 19:48 UTC by Greeshma Gopinath
Modified: 2019-05-16 18:22 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-27 22:10:37 UTC
Target Upstream Version:


Attachments (Terms of Use)
dmesg log file (11.77 KB, text/plain)
2017-01-19 19:48 UTC, Greeshma Gopinath
no flags Details

Description Greeshma Gopinath 2017-01-19 19:48:46 UTC
Created attachment 1242573 [details]
dmesg log file

Description of problem:

Virsh hangs when running cyclictest on multi-vcpu machine.
The test was run for 24 hours.

Other issues found:
A number of processes were stopped like tuned.


Version-Release number of selected component (if applicable):
kernel-rt-3.10.0-542.rt56.449.el7.x86_64


How reproducible:
If related to the BZ:https://bugzilla.redhat.com/show_bug.cgi?id=1403265
then pretty frequent.


Steps to Reproduce:
1.Create an rt environemnt
2.Create a rt-guest 
3.Run cyclictests

Actual results:
Virsh hangs




Additional info:
Maybe related to this BZ:https://bugzilla.redhat.com/show_bug.cgi?id=1403265

Comment 1 Greeshma Gopinath 2017-01-19 19:52:17 UTC
By mutlti-vcpu ,I meant the machine has a guest with several real time cpus

Comment 4 Luiz Capitulino 2017-01-20 13:58:33 UTC
*Very initial* debugging seems to show that systemd executes the following code path in the kernel:

cgroup_rmdir()
  cgroup_destroy_locked()
    mem_cgroup_css_offline()
      mem_cgroup_reparent_charges()
        mem_cgroup_start_move()
          synchronize_rcu()
            wait_rcu_gp()
              wait_for_completion()

Then, two things happen:

1. It blocks for good in wait_for_completion()

2. As it took cgroups global lock before blocking, everyone acquiring that lock will block too. That's why we have a bunch of processes blocked (kworkers, libvirtd, systemd-journal etc)

I'll keep investigating...

Comment 5 Luiz Capitulino 2017-01-23 19:23:22 UTC
It seems that this issue happens as a result of bug 1403265 triggering first. In that case this BZ might a duplicate.

I'll focus on getting bug 1403265 fixed first and will get back to this afterwards.

Comment 6 Luiz Capitulino 2017-01-27 22:10:37 UTC
I've confirmed that this issue only happens as a result of bug 1403265 triggering first. Closing as a dupe.

*** This bug has been marked as a duplicate of bug 1403265 ***

Comment 7 wbs9399 2019-05-16 12:25:57 UTC
I have no permission on 'bug 1403265'. Which version of redhat kernel fixed this issue? @Luiz Capitulino

Comment 8 Beth Uptagrafft 2019-05-16 14:30:52 UTC
(In reply to wbs9399 from comment #7)
> I have no permission on 'bug 1403265'. Which version of redhat kernel fixed
> this issue? @Luiz Capitulino

This was fixed awhile back in AUG2017 in kernel-rt-3.10.0-693.rt56.617.el7. If you install the most recent kernel-rt release you will get this fix plus all the bug and security fixes released since then.
-Beth

Comment 9 Clark Williams 2019-05-16 18:22:32 UTC
https://access.redhat.com/errata/RHSA-2017:2077

kernel-rt-3.10.0-693.rt56.617.el7


Note You need to log in before you can comment on or make changes to this bug.