Bug 1013157 - backport block-layer dataplane implementation
backport block-layer dataplane implementation
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev (Show other bugs)
7.0
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Stefan Hajnoczi
Virtualization Bugs
: FutureFeature
: 1101572 (view as bug list)
Depends On:
Blocks: 824644 1101569 824648 824649 824650 1001564 1029596 1030582 1086307 1101574 1101577 1252481
  Show dependency treegraph
 
Reported: 2013-09-27 19:09 EDT by Ademar Reis
Modified: 2015-08-11 10:15 EDT (History)
8 users (show)

See Also:
Fixed In Version: qemu-2.1
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-03-05 04:42:39 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Ademar Reis 2013-09-27 19:09:00 EDT
Stefan is working on the proper data-plane implementation that covers all of the block layer and this BZ is for the backport of such work. The current implementation duplicates blocklayer code and is very limited, serving only as a tech-preview experiment.

Below are some data-plane features which are missing in the current version of qemu-kvm.

 * Image formats (qcow2)
   Dataplane currently only supports raw image files.  It should be
   possible to use image formats like qcow2.

 * Protocols (iSCSI, GlusterFS, Ceph, NBD)
   Dataplane currently only supports local files using Linux AIO.
   Network protocols like NBD, iSCSI, GlusterFS, and Ceph should be
   supported.

 * I/O throttling
   I/O throttling requires the new AioContext timer support which is
   currently being merged upstream.

 * Hot unplug
   Due to the way bdrv_in_use() is currently used to prevent hot unplug,
   the virtio-blk-pci adapter cannot be hot unplugged.  The hot unplug
   command will fail with EBUSY.

 * iothreads as objects
   The user should be able to define the number of iothreads and bind
   devices to specific threads.  iothreads should be discoverable using
   a QMP query-iothreads command, which includes thread IDs suitable for
   CPU affinity setting.

 * Runtime NBD exports
   The runtime NBD server should not interfere with dataplane.  This
   will allow image fleecing and other NBD export users to coexist with
   dataplane.

 * Block jobs
   Block jobs should not interfere with dataplane.  Currently block jobs
   cannot be started (EBUSY).

This BZ will be closed once the first patch-series implementing the bulk of the work is backported and the new implementation is ready for wider testing. If necessary, extra BZs will be open to track missing functionalities by then.
Comment 1 Ademar Reis 2013-12-06 18:00:55 EST
*** Bug 824647 has been marked as a duplicate of this bug. ***
Comment 2 Stefan Hajnoczi 2014-07-28 05:12:01 EDT
We will get this for free when RHEV switches to QEMU 2.1.0.
Comment 4 Sibiao Luo 2014-09-01 01:34:49 EDT
(In reply to Ademar Reis from comment #0)
> Stefan is working on the proper data-plane implementation that covers all of
> the block layer and this BZ is for the backport of such work. The current
> implementation duplicates blocklayer code and is very limited, serving only
> as a tech-preview experiment.

Verify this issue with the following supported parts.
host info:
# uname -r && rpm -q qemu-kvm-rhev && rpm -q seabios
3.10.0-145.el7.x86_64
qemu-kvm-rhev-2.1.0-2.el7.x86_64
seabios-1.7.5-4.el7.x86_64
guest info:
# uname -r
3.10.0-145.el7.x86_64

> Below are some data-plane features which are missing in the current version
> of qemu-kvm.
> 
>  * Image formats (qcow2)
>    Dataplane currently only supports raw image files.  It should be
>    possible to use image formats like qcow2.
> 
1.create a qcow2 data disk image.
# qemu-img create -f qcow2 /home/my-data-disk.qcow2 10G
Formatting '/home/my-data-disk.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 lazy_refcounts=off
2.launch the qcow2 data disk image to KVM guest.
e.g:...-drive file=/home/my-data-disk.qcow2,if=none,id=drive-data-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-data-disk,id=data-disk,config-wce=off,x-data-plane=on,bus=pci.0,addr=0x7
3.verify the disk can be detected in guest and can work well with format/dd.
# fdisk -l                                      <------------detect it in guest
# mkfs.ext4 /dev/vda                            <------------successfully
# dd if=/dev/zero of=/dev/vda bs=1M count=1000  <------------successfully

>  * Protocols (iSCSI, GlusterFS, Ceph, NBD)
>    Dataplane currently only supports local files using Linux AIO.
>    Network protocols like NBD, iSCSI, GlusterFS, and Ceph should be
>    supported.
> 
- iSCSI:
e.g.:...-drive file=/dev/disk/by-path/ip-10.66.33.253:3260-iscsi-iqn.2014.sluo.com:iscsi.storage.1-lun-1,if=none,id=drive-data-disk,format=raw,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-data-disk,id=data-disk,config-wce=off,x-data-plane=on,bus=pci.0,addr=0x7

Verify the disk can be detected in guest and can work well with format/dd.
# fdisk -l                                     <------------detect it in guest
# mkfs.ext4 /dev/vda                           <------------successfully
# dd if=/dev/zero of=/dev/vda bs=1M count=1000 <------------successfully

- Libiscsi:
e.g:...-drive file=iscsi://10.66.33.253:3260/iqn.2014.sluo.com:iscsi.storage.1/1,if=none,id=drive-data-disk,format=raw,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-data-disk,id=data-disk,config-wce=off,x-data-plane=on,bus=pci.0,addr=0x7

Verify the disk can be detected in guest and can work well with format/dd.
# fdisk -l                                     <------------detect it in guest
# mkfs.ext4 /dev/vda                           <------------successfully
# dd if=/dev/zero of=/dev/vda bs=1M count=1000 <------------successfully

- GluserFS:
# qemu-img create -f qcow2 gluster://10.66.106.22/sluo_volume/my-data-disk.qcow2 10G
Formatting 'gluster://10.66.106.22/sluo_volume/my-data-disk.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 

# qemu-img info gluster://10.66.106.22/sluo_volume/my-data-disk.qcow2
image: gluster://10.66.106.22/sluo_volume/my-data-disk.qcow2
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 140K
cluster_size: 65536
Format specific information:
    compat: 0.10

e.g:...-drive file=gluster://10.66.106.22/sluo_volume/my-data-disk.qcow2,if=none,id=drive-data-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-data-disk,id=data-disk,config-wce=off,x-data-plane=on,bus=pci.0,addr=0x7

Verify the disk can be detected in guest and can work well with format/dd.
# fdisk -l                                     <------------detect it in guest
# mkfs.ext4 /dev/vda                           <------------successfully
# dd if=/dev/zero of=/dev/vda bs=1M count=1000 <------------successfully

- NBD:
# nbd-server 12345 /home/my-data-disk.qcow2

** (process:28046): WARNING **: Specifying an export on the command line is deprecated.

** (process:28046): WARNING **: Please use a configuration file instead.

e.g:...-drive file=nbd:10.66.11.154:12345,if,if=none,id=drive-data-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-data-disk,id=data-disk,config-wce=off,x-data-plane=on,bus=pci.0,addr=0x7

Verify the disk can be detected in guest and can work well with format/dd.
# fdisk -l                                     <------------detect it in guest
# mkfs.ext4 /dev/vda                           <------------successfully
# dd if=/dev/zero of=/dev/vda bs=1M count=1000 <------------successfully

- Ceph:

I did not use the Ceph in the past, just leave Ceph along first, i will investigate it later and try it. 

>  * I/O throttling
>    I/O throttling requires the new AioContext timer support which is
>    currently being merged upstream.
> 

>  * Hot unplug
>    Due to the way bdrv_in_use() is currently used to prevent hot unplug,
>    the virtio-blk-pci adapter cannot be hot unplugged.  The hot unplug
>    command will fail with EBUSY.
> 
e.g:...-drive file=/dev/disk/by-path/ip-10.66.33.253:3260-iscsi-iqn.2014.sluo.com:iscsi.storage.1-lun-1,if=none,id=drive-data-disk,format=raw,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-data-disk,id=data-disk,config-wce=off,x-data-plane=on,bus=pci.0,addr=0x7
(qemu) device_del data-disk
(qemu) info block    <-------hotunplug successfully.
guest ]# fdisk -l    <-------disappear from guest.
guest ]# dmesg       <-------not see any error

>  * iothreads as objects
>    The user should be able to define the number of iothreads and bind
>    devices to specific threads.  iothreads should be discoverable using
>    a QMP query-iothreads command, which includes thread IDs suitable for
>    CPU affinity setting.
> 
e.g:...-object iothread,id=iothread0 -drive file=/home/my-data-disk.qcow2,if=none,id=drive-data-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-data-disk,id=data-disk,iothread=iothread0,bus=pci.0,addr=0x7

discoverable iothreads via QMP:
{"execute":"query-iothreads"}
{"return": [{"thread-id": 6072, "id": "iothread0"}]}

>  * Runtime NBD exports
>    The runtime NBD server should not interfere with dataplane.  This
>    will allow image fleecing and other NBD export users to coexist with
>    dataplane.
> 
1).launch a KVM guest with data-plane.
e.g:...-object iothread,id=iothread0 -drive file=/home/my-data-disk.qcow2,if=none,id=drive-data-disk0,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-data-disk0,id=data-disk0,iothread=iothread0,bus=pci.0,addr=0x7
2).nbd_server_start nbd_server_start [-a] [-w] host:port -- serve block devices on the given host and port and export a block device via NBD.
{"execute":"qmp_capabilities"}
{"return": {}}
{ "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet", "data": { "host": "10.66.11.154", "port": "1234" } } } }
{"return": {}}
{ "execute": "nbd-server-add", "arguments": { "device": "drive-data-disk0", "writable": true } }
{"return": {}}
3).read/write to the nbd target.
# qemu-img info nbd://10.66.11.154:1234/drive-data-disk0
image: 
file format: raw
virtual size: 10G (10737418240 bytes)
disk size: unavailable
# qemu-io -c 'read 512 1024' nbd://10.66.11.154:1234/drive-data-disk0
read 1024/1024 bytes at offset 512
1 KiB, 1 ops; 0.0003 sec (2.713 MiB/sec and 2777.7778 ops/sec)
# qemu-io -c 'write 512 1024' nbd://10.66.11.154:1234/drive-data-disk0
wrote 1024/1024 bytes at offset 512
1 KiB, 1 ops; 0.0005 sec (1.922 MiB/sec and 1968.5039 ops/sec)

>  * Block jobs
>    Block jobs should not interfere with dataplane.  Currently block jobs
>    cannot be started (EBUSY).
> 
e.g.:...-object iothread,id=iothread0 -drive file=/home/my-data-disk.qcow2,if=none,id=drive-data-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop -device virtio-blk-pci,drive=drive-data-disk,id=data-disk,iothread=iothread0,bus=pci.0,addr=0x7

- do block stream:
{ "execute": "drive-mirror", "arguments": { "device": "drive-data-disk", "target": "/root/sn1", "format": "qcow2", "mode": "absolute-paths", "sync": "full", "speed": 1000000000, "on-source-error": "stop", "on-target-error": "stop" } }
{"error": {"class": "GenericError", "desc": "Device 'drive-data-disk' is busy: block device is in use by data plane"}}

- do block mirror:
{ "execute": "blockdev-snapshot-sync", "arguments": { "device": "drive-data-disk","snapshot-file": "/home/snap", "format": "qcow2" } }
{"error": {"class": "GenericError", "desc": "Device 'drive-data-disk' is busy: block device is in use by data plane"}}

> This BZ will be closed once the first patch-series implementing the bulk of
> the work is backported and the new implementation is ready for wider
> testing. If necessary, extra BZs will be open to track missing
> functionalities by then.

Best Regards,
sluo
Comment 5 Sibiao Luo 2014-09-01 02:20:01 EDT
(In reply to Sibiao Luo from comment #4)
> (In reply to Ademar Reis from comment #0)
> > Stefan is working on the proper data-plane implementation that covers all of
> > the block layer and this BZ is for the backport of such work. The current
> > implementation duplicates blocklayer code and is very limited, serving only
> > as a tech-preview experiment.
> 
> Verify this issue with the following supported parts.
> host info:
> # uname -r && rpm -q qemu-kvm-rhev && rpm -q seabios
> 3.10.0-145.el7.x86_64
> qemu-kvm-rhev-2.1.0-2.el7.x86_64
> seabios-1.7.5-4.el7.x86_64
> guest info:
> # uname -r
> 3.10.0-145.el7.x86_64
> 
> 
> >  * I/O throttling
> >    I/O throttling requires the new AioContext timer support which is
> >    currently being merged upstream.
> > 
> 
1.launch a KVM guest with data-plane.
e.g:...-object iothread,id=iothread0 -drive file=/home/my-data-disk.qcow2,if=none,id=drive-data-disk0,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop,bps=1024000,bps_rd=0,bps_wr=0,iops=1024000,iops_rd=0,iops_wr=0 -device virtio-blk-pci,drive=drive-data-disk0,id=data-disk0,iothread=iothread0,bus=pci.0,addr=0x7
(qemu) info block
drive-system-disk: /home/RHEL-7.0-20140507.0-Server-x86_64.qcow2 (qcow2)

drive-data-disk0: /home/my-data-disk.qcow2 (qcow2)
    I/O throttling:   bps=1024000 bps_rd=0 bps_wr=0 bps_max=102400 bps_rd_max=0 bps_wr_max=0 iops=1024000 iops_rd=0 iops_wr=0 iops_max=102400 iops_rd_max=0 iops_wr_max=0 iops_size=0
...
2).query the block info via HMP/QMP monitor.
{"execute":"query-block"}
{"return": [...{"io-status": "ok", "device": "drive-data-disk0", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 10737418240, "filename": "/home/my-data-disk.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": 140320768, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "qcow2", "bps_max": 102400, "iops": 1024000, "bps_wr": 0, "encrypted": false, "bps": 1024000, "bps_rd": 0, "iops_max": 102400, "file": "/home/my-data-disk.qcow2", "encryption_key_missing": false}, "type": "unknown"}, {"io-status": "ok", "device": "ide1-cd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "floppy0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "sd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}]}
3).do fio to the disk in guest.
# fio --filename=/dev/vda --direct=1 --rw=randrw --bs=100K --size=10M --name=test --iodepth=100 --ioengine=libaio
test: (g=0): rw=randrw, bs=100K-100K/100K-100K/100K-100K, ioengine=libaio, iodepth=100
fio-2.1.10
Starting 1 process
Jobs: 1 (f=1): [m] [3.2% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 05m:31s]
test: (groupid=0, jobs=1): err= 0: pid=4230: Mon Sep  1 02:17:19 2014
  read : io=4900.0KB, bw=501659B/s, iops=4, runt= 10002msec
    slat (usec): min=6, max=34, avg=10.53, stdev= 4.11
    clat (msec): min=1, max=10001, avg=9390.61, stdev=2413.35
     lat (msec): min=1, max=10001, avg=9390.62, stdev=2413.35
    clat percentiles (usec):
     |  1.00th=[ 1352],  5.00th=[99840], 10.00th=[10027008], 20.00th=[10027008],
     | 30.00th=[10027008], 40.00th=[10027008], 50.00th=[10027008], 60.00th=[10027008],
     | 70.00th=[10027008], 80.00th=[10027008], 90.00th=[10027008], 95.00th=[10027008],
     | 99.00th=[10027008], 99.50th=[10027008], 99.90th=[10027008], 99.95th=[10027008],
     | 99.99th=[10027008]
    bw (KB  /s): min=   39, max=   39, per=7.98%, avg=39.00, stdev= 0.00
  write: io=5300.0KB, bw=542611B/s, iops=5, runt= 10002msec
    slat (usec): min=8, max=22, avg=15.28, stdev= 3.32
    clat (usec): min=10000K, max=10001K, avg=10000757.38, stdev=421.06
     lat (usec): min=10000K, max=10001K, avg=10000772.89, stdev=418.90
    clat percentiles (msec):
     |  1.00th=[10028],  5.00th=[10028], 10.00th=[10028], 20.00th=[10028],
     | 30.00th=[10028], 40.00th=[10028], 50.00th=[10028], 60.00th=[10028],
     | 70.00th=[10028], 80.00th=[10028], 90.00th=[10028], 95.00th=[10028],
     | 99.00th=[10028], 99.50th=[10028], 99.90th=[10028], 99.95th=[10028],
     | 99.99th=[10028]
    lat (msec) : 2=0.98%, 20=0.98%, 250=0.98%, >=2000=97.06%
  cpu          : usr=0.00%, sys=0.02%, ctx=93, majf=0, minf=29
  IO depths    : 1=1.0%, 2=2.0%, 4=3.9%, 8=7.8%, 16=15.7%, 32=31.4%, >=64=38.2%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=75.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=25.0%
     issued    : total=r=49/w=53/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=100

Run status group 0 (all jobs):
   READ: io=4900KB, aggrb=489KB/s, minb=489KB/s, maxb=489KB/s, mint=10002msec, maxt=10002msec
  WRITE: io=5300KB, aggrb=529KB/s, minb=529KB/s, maxb=529KB/s, mint=10002msec, maxt=10002msec

Disk stats (read/write):
  vda: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

Best Regards,
sluo
Comment 9 Stefan Hajnoczi 2014-11-18 10:54:36 EST
*** Bug 1101572 has been marked as a duplicate of this bug. ***
Comment 10 Stefan Hajnoczi 2014-11-18 10:55:28 EST
Current dataplane status.

Supported:
 * Image formats
 * I/O throttling
 * Block jobs
 * GlusterFS, RBD, iSCSI, NBD

Unsupported:
 * External and internal snapshots
 * QMP 'transaction' command
 * Eject
 * qcow2 encryption
Comment 12 errata-xmlrpc 2015-03-05 04:42:39 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0624.html

Note You need to log in before you can comment on or make changes to this bug.