Bug 1135491 - <iothread> cpuset CPU binding support
Summary: <iothread> cpuset CPU binding support
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: John Ferlan
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 1101574
Blocks: 1161617
TreeView+ depends on / blocked
 
Reported: 2014-08-29 12:56 UTC by Stefan Hajnoczi
Modified: 2015-11-19 05:48 UTC (History)
8 users (show)

Fixed In Version: libvirt-1.2.14-1.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-19 05:48:04 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:2202 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2015-11-19 08:17:58 UTC

Description Stefan Hajnoczi 2014-08-29 12:56:56 UTC
Description of problem:

libvirt 1.2.8 adds the <iothread> tag for defining QEMU IOThreads which can perform device emulation.

It is desirable to bind IOThreads to host CPUs just like vcpu threads.  Please add a cpuset attribute to the <iothread> tag.

A new virsh command may also be necessary just like vcpupin/emulatorpin.  Perhaps the emulatorpin command can be extended to take an <iothread> ID so pinning is applied to a particular IOThread instead of the main QEMU thread.

Comment 2 Stefan Hajnoczi 2014-08-29 12:58:32 UTC
John, I have assigned it to you since you are currently handling the <iothread> implementation.  Feel free to assign to someone else if you will be unable to work on this.

Comment 3 John Ferlan 2014-08-29 18:42:39 UTC
Since 1.2.8 was the last release to allow API changes into RHEL7.1 - I'll use this BZ as the means/marker for the API changes into RHEL7.2. I changed the release flag too.

Still probably have some changes that can use the other BZ including adding underpinnings necessary to get the thread id's via qmp and attach-disk/update-device type changes to add an --iothread parameter.

Extending emulatorpin is not the right answer. Sure it "allows" one to cheat short term, but once it is there it's tough to remove it.  I'm going to look into perhaps adding a --cpuset argument that would define the set of CPU's IOThreads would use. Then in qemuProcessStart add some code that will get the thread tid's and assign them after the qemu process has started... No guarantees, but it might suffice at least short term... 


Longer term, the following is what I think will suffice - if other thoughts come up - then we can work those out...

A new 'virsh' command that will "manage" IOThreads.  This will use new libvirt API's (and also have libvirt-python API's).  They'll be named something like 'virDomainIothread{G|S}et()' - I think that's all I'd need.

The virsh command would be:

virsh domiothreads domain 

   [--list]
   [[--config]  [--live]  |  [--current]]
   [--pin thread_id]
   [--cpuset "string"])

where [--list] would be the default if nothing is provided and would list:

IOThread Name    Thread Id   CPU Id   Resource(s)
-------------    ---------   ------   -----------

Where "IOThread Name" and "Thread Id" come from
(QEMU) query-iothreads
{u'return': [{u'id': u'jaftest1', u'thread-id': 30992}, {u'id': u'jaftest2', u'thread-id': 30993}]}
(QEMU) 

"CPU Id" would be a get of which CPU the current Thread Id is running on

"Resource(s)" would be empty or "list" of resources (currently only disks) using the thread (possibly/hopefully).

[--config] would allow modifying the existing config to add/remove, but not effect the running system

[--live] would add iothread objects (object-add in some matter)

[--current] would be the domain's current state (but exclusive of live/config - following other examples)

[--pin thread id] would be a way to pin an iothread to a specific CPU set (or single). It's only valid for live domain of course.

[--cpuset "string"] to manage having IOThreads assigned to specific set at startup.  The [--pin] would conceivably override.


That's enough thinking for now!

Comment 4 Stefan Hajnoczi 2014-09-02 15:52:54 UTC
(In reply to John Ferlan from comment #3)
> The virsh command would be:
> 
> virsh domiothreads domain 
> 
>    [--list]
>    [[--config]  [--live]  |  [--current]]
>    [--pin thread_id]
>    [--cpuset "string"])
> 
> where [--list] would be the default if nothing is provided and would list:
> 
> IOThread Name    Thread Id   CPU Id   Resource(s)
> -------------    ---------   ------   -----------
> 
> Where "IOThread Name" and "Thread Id" come from
> (QEMU) query-iothreads
> {u'return': [{u'id': u'jaftest1', u'thread-id': 30992}, {u'id': u'jaftest2',
> u'thread-id': 30993}]}
> (QEMU) 
> 
> "CPU Id" would be a get of which CPU the current Thread Id is running on
> 
> "Resource(s)" would be empty or "list" of resources (currently only disks)
> using the thread (possibly/hopefully).
> 
> [--config] would allow modifying the existing config to add/remove, but not
> effect the running system
> 
> [--live] would add iothread objects (object-add in some matter)
> 
> [--current] would be the domain's current state (but exclusive of
> live/config - following other examples)
> 
> [--pin thread id] would be a way to pin an iothread to a specific CPU set
> (or single). It's only valid for live domain of course.
> 
> [--cpuset "string"] to manage having IOThreads assigned to specific set at
> startup.  The [--pin] would conceivably override.

Sounds good to me.

Comment 6 John Ferlan 2015-03-11 16:50:17 UTC
After a few review cycles and separate commits for the info vs. change functionality, the code has been pushed upstream.

git describe 1cfc0a9990866b423e1110997f9a06f1d6d869c9
v1.2.13-142-g1cfc0a9


NB: Displaying the "Resource(s)" column was rejected upstream since it's part of the XML.

commit 1cfc0a9990866b423e1110997f9a06f1d6d869c9
Author: John Ferlan <jferlan@redhat.com>
Date:   Thu Mar 5 19:08:04 2015 -0500

    virsh: Add iothreadpin command
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1135491
    
    $ virsh iothread --help
    
      NAME
        iothreadpin - control domain IOThread affinity
    
      SYNOPSIS
        iothreadpin <domain> <iothread> <cpulist> [--config] [--live] [--current]
    
      DESCRIPTION
        Pin domain IOThreads to host physical CPUs.
    
      OPTIONS
        [--domain] <string>  domain name, id or uuid
        [--iothread] <number>  IOThread ID number
        [--cpulist] <string>  host cpu number(s) to set
        --config         affect next boot
        --live           affect running domain
        --current        affect current domain
    
    Using the output from iothreadsinfo, allow changing the pinned CPUs for
    a single IOThread.
    
    $ virsh iothreadsinfo $dom
     IOThread ID    CPU Affinity
    ---------------------------------------------------
     1               2
     2               3
     3               0-1
    
    $ virsh iothreadpin $dom 3 0-2

    Then view the change
    
    $ virsh iothreadsinfo $dom
     IOThread ID    CPU Affinity
    ---------------------------------------------------
     1               2
     2               3
     3               0-2
    
    If an invalid value is supplied or require option missing,
    then an error will be displayed:
    
    $ virsh iothreadpin $dom 4 3
    error: invalid argument: iothread value out of range 4 > 3
    
    $ virsh iothreadpin $dom 3
    error: command 'iothreadpin' requires <cpulist> option

Comment 8 Luyao Huang 2015-07-21 07:33:47 UTC
Verify this bug with libvirt-1.2.17-2.el7.x86_64 and qemu-kvm-rhev-2.3.0-12.el7.x86_64:

1.
prepare a running guest with iothread:

# virsh iothreadinfo rhel7.0-rhel
 IOThread ID     CPU Affinity   
---------------------------------------------------
 1               1

2. check the cgroup settings:

# cgget -g cpuset /machine.slice/machine-qemu\\x2drhel7.0\\x2drhel.scope/iothread1
/machine.slice/machine-qemu\x2drhel7.0\x2drhel.scope/iothread1:
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 1
cpuset.sched_relax_domain_level: -1
cpuset.sched_load_balance: 1
cpuset.mem_hardwall: 0
cpuset.mem_exclusive: 0
cpuset.cpu_exclusive: 0
cpuset.mems: 0
cpuset.cpus: 1


3. bind iothread to another cpu:

# virsh iothreadpin rhel7.0-rhel  1 3


4. recheck the cgroup and taskset:

# cgget -g cpuset /machine.slice/machine-qemu\\x2drhel7.0\\x2drhel.scope/iothread1
/machine.slice/machine-qemu\x2drhel7.0\x2drhel.scope/iothread1:
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 1
cpuset.sched_relax_domain_level: -1
cpuset.sched_load_balance: 1
cpuset.mem_hardwall: 0
cpuset.mem_exclusive: 0
cpuset.cpu_exclusive: 0
cpuset.mems: 0
cpuset.cpus: 3

# virsh qemu-monitor-command rhel7.0-rhel '{"execute": "query-iothreads"}' --pretty
{
    "return": [
        {
            "thread-id": 30228,
            "id": "iothread1"
        }
    ],
    "id": "libvirt-14"
}

# taskset -p 30228
pid 30228's current affinity mask: 8

5. check the xml 

# virsh dumpxml rhel7.0-rhel |grep iothreadpin
    <iothreadpin iothread='1' cpuset='3'/>

6. restart libvirtd and recheck:

# service libvirtd restart
Redirecting to /bin/systemctl restart  libvirtd.service
# virsh dumpxml rhel7.0-rhel |grep iothreadpin
    <iothreadpin iothread='1' cpuset='3'/>

7. managedsave and restart, recheck:

# virsh managedsave rhel7.0-rhel

Domain rhel7.0-rhel state saved by libvirt

# virsh start rhel7.0-rhel
Domain rhel7.0-rhel started

# virsh dumpxml rhel7.0-rhel |grep iothreadpin
    <iothreadpin iothread='1' cpuset='3'/>

# cgget -g cpuset /machine.slice/machine-qemu\\x2drhel7.0\\x2drhel.scope/iothread1
/machine.slice/machine-qemu\x2drhel7.0\x2drhel.scope/iothread1:
cpuset.memory_spread_slab: 0
cpuset.memory_spread_page: 0
cpuset.memory_pressure: 0
cpuset.memory_migrate: 1
cpuset.sched_relax_domain_level: -1
cpuset.sched_load_balance: 1
cpuset.mem_hardwall: 0
cpuset.mem_exclusive: 0
cpuset.cpu_exclusive: 0
cpuset.mems: 0
cpuset.cpus: 3

# virsh qemu-monitor-command rhel7.0-rhel '{"execute": "query-iothreads"}' --pretty
{
    "return": [
        {
            "thread-id": 1300,
            "id": "iothread1"
        }
    ],
    "id": "libvirt-103"
}


# taskset -p 1300
pid 1300's current affinity mask: 8



And test with qemu-kvm-1.5.3-97.el7.x86_64, prepare a guest with iothread, seems there is a issue here:

1.
# virsh dumpxml r7 |grep iothreads
  <iothreads>1</iothreads>

2. bind iothread1 to a cpu:
# virsh iothreadpin r7 1 3

3.
# virsh dumpxml r7 |grep iothread
  <iothreads>1</iothreads>
  <iothreadids>
    <iothread id='1'/>
  </iothreadids>
    <iothreadpin iothread='1' cpuset='3'/>

4. xml and command show success, but actual we failed:

# lscgroup
cpuset:/machine.slice/machine-qemu\x2dr7.scope/iothread1

# cat /sys/fs/cgroup/cpuset/machine.slice/machine-qemu\\x2dr7.scope/iothread1/tasks
(no tasks here)

# cat /proc/22868/task/22940/status | grep Cpus
Cpus_allowed:	e
Cpus_allowed_list:	1-3

Comment 9 Luyao Huang 2015-07-21 07:37:56 UTC
Hi John,

Would you please help to check comment 8 ? i think this api was broken when use qemu-kvm, can i reopen this bug for this issue? or need open a new bug ?

thanks a lot for your reply

Luyao

Comment 10 John Ferlan 2015-07-23 16:19:04 UTC
I have absolutely no idea what you're testing and what the question/issue is.  The first sequence of commands does one thing and the last one does something else.  The first sequence seems to be using qemu 2.3 and the second using 1.5? Is one ok? Is one broken?  What about the taskset output for the second sequence? Perhaps use '--cpu-list' switch in order to see the list rather than the mask so it's easier/clearer to see. Mixing in cgroups and cpuset is confusing. 

It's just not clear what's being tested.

What does a similar vcpupin command sequence show? That is if you pinned the cpuset would you get a similar result? Both use the same sequence to do the job, so I would think both would have similar results.

What are you expecting?

Comment 11 Luyao Huang 2015-07-27 03:41:00 UTC
(In reply to John Ferlan from comment #10)
> I have absolutely no idea what you're testing and what the question/issue
> is.  The first sequence of commands does one thing and the last one does
> something else.  The first sequence seems to be using qemu 2.3 and the
> second using 1.5? Is one ok? Is one broken?  What about the taskset output

Sorry, seems my comment is not clearly, i should give more explanation.

I found it works well with qemu 2.3 (qemu-kvm-rhev), but it is broken with qemu
1.5 (qemu-kvm), taskset output is like this:

# cat /proc/22868/task/22940/status | grep Cpus
Cpus_allowed:	e                               <------ taskset will output
Cpus_allowed_list:	1-3                     <------ more pretty list (taskset with --cpu-list)

> for the second sequence? Perhaps use '--cpu-list' switch in order to see the
> list rather than the mask so it's easier/clearer to see. Mixing in cgroups
> and cpuset is confusing. 
> 

Okay, good idea, 

> It's just not clear what's being tested.
> 

Test libvirt really bind the right pid (iothread's pid) to the right host cpus, and check libvirt set the right cpuset in cgroup for iothread if cpuset group is available. 

> What does a similar vcpupin command sequence show? That is if you pinned the
> cpuset would you get a similar result? Both use the same sequence to do the
> job, so I would think both would have similar results.
> 

vcpupin command works well with qemu 1.5 , the test result:

1. libvirt output of vcpupin:
# virsh vcpupin test4
VCPU: CPU Affinity
----------------------------------
   0: 1
   1: 0-3

2. check libvirt set cpuset for vcpu0 in cgroup:
# virsh qemu-monitor-command test4 --hmp info cpus
* CPU #0: pc=0x00000000000f7fa8 (halted) thread_id=9589
  CPU #1: pc=0x00000000000f7bef (halted) thread_id=9591

# cat /sys/fs/cgroup/cpuset/machine.slice/machine-qemu\\x2dtest4.scope/vcpu0/tasks 
9589
(this step is very important we need make sure we set right task in cgroup)

# cat /sys/fs/cgroup/cpuset/machine.slice/machine-qemu\\x2dtest4.scope/vcpu0/cpuset.cpus
1

3. check the taskset or status in /proc

# ps aux|grep qemu
qemu      9580  1.5  0.4 990736 35420 ? ...

# ll /proc/9580/task/
total 0
dr-xr-xr-x. 6 qemu qemu 0 Jul 27 10:31 9580    <----emulator
dr-xr-xr-x. 6 qemu qemu 0 Jul 27 10:32 9589    <----vcpu0
dr-xr-xr-x. 6 qemu qemu 0 Jul 27 10:32 9591    <----vcpu1
dr-xr-xr-x. 6 qemu qemu 0 Jul 27 10:32 9600    <----???

# taskset --cpu-list -p 9589
pid 9589's current affinity list: 1

So vcpupin works as expected.

> What are you expecting?

Hmm... i am not sure my idea is right, maybe i missed some important thing or information for iothread.

i think libvirt should forbid set the iothreadpin with old qemu (or cannot start a guest with iothread with old qemu)

Also there are some strange things when i test it again(test with qemu 1.5):

1. check the libvirt iothreadpin:

# virsh dumpxml test4 |grep iothread
  <iothreads>1</iothreads>
  <iothreadids>
    <iothread id='1'/>
  </iothreadids>
    <iothreadpin iothread='1' cpuset='2'/>


2. check libvirt set in cgroup (in this step i get a very strange result, why libvirt set the pin to one of libvirtd thread ?)

# ll /proc/9580/task/
total 0
dr-xr-xr-x. 6 qemu qemu 0 Jul 27 10:31 9580    <----emulator
dr-xr-xr-x. 6 qemu qemu 0 Jul 27 10:32 9589    <----vcpu0
dr-xr-xr-x. 6 qemu qemu 0 Jul 27 10:32 9591    <----vcpu1
dr-xr-xr-x. 6 qemu qemu 0 Jul 27 10:32 9600    <----???

# cat /sys/fs/cgroup/cpuset/machine.slice/machine-qemu\\x2dtest4.scope/iothread1/cpuset.cpus
2

# cat /sys/fs/cgroup/cpuset/machine.slice/machine-qemu\\x2dtest4.scope/iothread1/tasks 
4402

# ps -eLf |grep 4402
root      4391     1  4402  0   16 10:28 ?        00:00:00 /usr/sbin/libvirtd
root     16347 19012 16347  0    1 11:15 pts/0    00:00:00 grep --color=auto 4402

# taskset --cpu-list -p 4402
pid 4402's current affinity list: 2

Then try to change the iothreadpin and recheck the result:

# virsh iothreadpin test4 1 3 

# taskset --cpu-list -p 4402
pid 4402's current affinity list: 3

# cat /sys/fs/cgroup/cpuset/machine.slice/machine-qemu\\x2dtest4.scope/iothread1/cpuset.cpus
3

And i notice there is no "-object iothread,id=iothread1" in qemu CLI, i have another question, is iothread really work on qemu 1.5.3 ?

Comment 14 Luyao Huang 2015-08-04 10:02:38 UTC
Since qemu-kvm-1.5.3-97.el7.x86_64 not support iothread, i have open a new bug 1249981 to track the left issue, and verify this bug with comment 8.

Thanks,
Luyao

Comment 16 errata-xmlrpc 2015-11-19 05:48:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2202.html


Note You need to log in before you can comment on or make changes to this bug.