Bug 1450759

Summary: Creating fallocated image using qemu-img using gfapi fails
Product: Red Hat Enterprise Linux 7 Reporter: SATHEESARAN <sasundar>
Component: qemu-kvm-rhevAssignee: Jeff Cody <jcody>
Status: CLOSED ERRATA QA Contact: Ping Li <pingl>
Severity: urgent Docs Contact:
Priority: high    
Version: 7.3CC: aliang, coli, knoel, ndevos, ngu, rhs-bugs, sabose, sasundar, storage-qa-internal, virt-maint
Target Milestone: pre-dev-freezeKeywords: Patch
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-8.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1450903 (view as bug list) Environment:
Hyperconverged Infra
Last Closed: 2017-08-02 04:38:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1411323, 1450903, 1485863    
Attachments:
Description Flags
gluster: add support for PREALLOC_MODE_FALLOC none

Description SATHEESARAN 2017-05-15 06:52:17 UTC
Description of problem:
-----------------------
Unable to create VM image files using qemu-img ( that uses gfapi ) with preallocation mode set to 'falloc'

Version-Release number of selected component (if applicable):
--------------------------------------------------------------
RHGS 3.2.0 ( glusterfs-3.8.4-18.el7rhgs )
RHGS 3.3.0 interim build ( glusterfs-3.8.4-24.el7rhgs )

How reproducible:
-----------------
Always

Steps to Reproduce:
--------------------
1. Create VM image file using 'qemu-img' ( using gfapi )command with preallocation set to 'falloc'

Actual results:
---------------
Unable to create preallocated VM image file

Expected results:
-----------------
Able to create preallocated VM image file

Comment 1 SATHEESARAN 2017-05-15 06:54:23 UTC
Error message says that GlusterFS doesn't support zerofill API

# qemu-img create -f qcow2  -o preallocation=none gluster://host1.example.com/vol/vm1.img 1G
....
....
....
....
qemu-img: gluster://dhcp37-191.lab.eng.blr.redhat.com/voltemp/vm1.img: Invalid preallocation mode: 'falloc' or GlusterFS doesn't support zerofill API

Comment 3 Niels de Vos 2017-05-15 08:14:59 UTC
What type of volume are you using? Not all Gluster xlators implement the fallocate() FOP. We'll need to examine the (client+server) graph to see where this is missing.

Comment 4 SATHEESARAN 2017-05-15 09:13:20 UTC
(In reply to Niels de Vos from comment #3)
> What type of volume are you using? Not all Gluster xlators implement the
> fallocate() FOP. We'll need to examine the (client+server) graph to see
> where this is missing.

Hi Niels, 

I was using replica 3 sharded volume.

Here is the volume info:
gluster volume info
 
Volume Name: vmstore
Type: Replicate
Volume ID: da887213-8669-479c-88c0-f63554507528
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: host1.lab.eng.blr.redhat.com:/gluster/brick1/b1
Brick2: host2.lab.eng.blr.redhat.com:/gluster/brick1/b1
Brick3: host3.lab.eng.blr.redhat.com:/gluster/brick1/b1
Options Reconfigured:
cluster.granular-entry-heal: enable
performance.strict-o-direct: on
network.ping-timeout: 30
server.allow-insecure: on
storage.owner-gid: 36
storage.owner-uid: 36
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: disable
performance.low-prio-threads: 32
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
auth.allow: *
user.cifs: off
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on

I can provide the live setup for analyzing more details

Comment 5 Niels de Vos 2017-05-15 09:15:07 UTC
QEMU only supports 'falloc' when glfs_zerofill() is available during build time (from 'configure' in the QEMU sources):

3565 ##########################################
3566 # glusterfs probe
3567 if test "$glusterfs" != "no" ; then
3568   if $pkg_config --atleast-version=3 glusterfs-api; then
3569     glusterfs="yes"
3570     glusterfs_cflags=$($pkg_config --cflags glusterfs-api)
3571     glusterfs_libs=$($pkg_config --libs glusterfs-api)
3572     if $pkg_config --atleast-version=4 glusterfs-api; then
3573       glusterfs_xlator_opt="yes"
3574     fi
3575     if $pkg_config --atleast-version=5 glusterfs-api; then
3576       glusterfs_discard="yes"
3577     fi
3578     if $pkg_config --atleast-version=6 glusterfs-api; then
3579       glusterfs_zerofill="yes"
3580     fi
3581   else
3582     if test "$glusterfs" = "yes" ; then
3583       feature_not_found "GlusterFS backend support" \
3584           "Install glusterfs-api devel >= 3"
3585     fi
3586     glusterfs="no"
3587   fi
3588 fi


QEMU version: qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64

This was built with glusterfs-api-devel-3.7.9-12.el7.x86_64.rpm that contains usr/lib64/pkgconfig/glusterfs-api.pc:
...
Name: glusterfs-api
Description: GlusterFS API
/* This is the API version, NOT package version */
Version: 7.3.7.9
...


This shows that the version is high enough, and that the QEMU build should have enabled support for 'falloc'.

A little more inspection is needed...

Comment 6 Niels de Vos 2017-05-15 09:21:46 UTC
Confirmation that QEMU was built to use the glfs_zerofill() functions:

$ cd qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64.d
$ ldd usr/libexec/qemu-kvm | grep gfapi
	libgfapi.so.0 => /lib64/libgfapi.so.0 (0x00007f7078f53000)
$ objdump -T usr/libexec/qemu-kvm | grep glfs_zerofill
0000000000000000      DF *UND*	0000000000000000  GFAPI_3.5.0 glfs_zerofill_async
0000000000000000      DF *UND*	0000000000000000  GFAPI_3.5.0 glfs_zerofill


Will need to find out why QEMU reports "GlusterFS doesn't support zerofill API".

Comment 7 Niels de Vos 2017-05-15 09:48:26 UTC
Hmm, it seems that 'falloc' is not a valid pre-allocation option for the block/gluster driver in qemu-kvm-rhev-2.6.0-28.el7_3.9:

[block/gluster.c:qemu_gluster_create()]
 972     tmp = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
 973     if (!tmp || !strcmp(tmp, "off")) {
 974         prealloc = 0;
 975     } else if (!strcmp(tmp, "full") && gluster_supports_zerofill()) {
 976         prealloc = 1;
 977     } else {
 978         error_setg(errp, "Invalid preallocation mode: '%s'"
 979                          " or GlusterFS doesn't support zerofill API", tmp);
 980         ret = -EINVAL;
 981         goto out;
 982     }

The only pre-allocation options are "off" or "full". The implementation for the Gluster driver in QEMU is a little more simple than the raw-posix driver that is used for filesystem mounts (FUSE).

Compare this to the fuller implementation in block/raw-posix.c:raw_create()

1702     buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
1703     prealloc = qapi_enum_parse(PreallocMode_lookup, buf,
1704                                PREALLOC_MODE__MAX, PREALLOC_MODE_OFF,
1705                                &local_err);
....
1742     switch (prealloc) {
1743 #ifdef CONFIG_POSIX_FALLOCATE
1744     case PREALLOC_MODE_FALLOC:
1745         /* posix_fallocate() doesn't set errno. */
1746         result = -posix_fallocate(fd, 0, total_size);
1747         if (result != 0) {
1748             error_setg_errno(errp, -result,
1749                              "Could not preallocate data for the new file");
1750         }
1751         break;
1752 #endif
1753     case PREALLOC_MODE_FULL:
1754     {
1755         int64_t num = 0, left = total_size;
1756         buf = g_malloc0(65536); 
1757 
1758         while (left > 0) {
1759             num = MIN(left, 65536);
1760             result = write(fd, buf, num);
1761             if (result < 0) {
1762                 result = -errno;
1763                 error_setg_errno(errp, -result,
1764                                  "Could not write to the new file");
1765                 break;           
1766             }
1767             left -= result;
1768         }
1769         if (result >= 0) {
1770             result = fsync(fd);
1771             if (result < 0) {
1772                 result = -errno;
1773                 error_setg_errno(errp, -result,
1774                                  "Could not flush new file to disk");
1775             }                    
1776         }
1777         g_free(buf);
1778         break;
1779     }
1780     case PREALLOC_MODE_OFF:
1781         break;
1782     default:
1783         result = -EINVAL;
1784         error_setg(errp, "Unsupported preallocation mode: %s",
1785                    PreallocMode_lookup[prealloc]);
1786         break;
1787     }


So, in order to have QEMU support preallocation=falloc, the block/gluster.c sources need to be updated.

Comment 9 Niels de Vos 2017-05-15 12:48:14 UTC
Created attachment 1278970 [details]
gluster: add support for PREALLOC_MODE_FALLOC

Initial patch, completely untested. I'll get some testing done before sending it upstream for review and inclusion.

Updated versions will become available on https://github.com/nixpanic/qemu/tree/gfapi/fallocate/rhbz1450759 as well.

Comment 10 Niels de Vos 2017-05-15 19:50:17 UTC
Posted for review at http://lists.nongnu.org/archive/html/qemu-block/2017-05/msg00667.html

Comment 11 Niels de Vos 2017-05-16 12:59:12 UTC
Results of a test-build with the posted patch (testing done before posting the
patch, of course):

1. verification of the error message with a relatively current QEMU

[root@vm013 ~]# rpm -q qemu-img
qemu-img-2.7.1-6.fc25.x86_64
[root@vm013 ~]# qemu-img create -f qcow2 -o preallocation=falloc gluster://vm015.example.com/one-brick/bz1450759.falloc.img 20M
Formatting 'gluster://vm015.example.com/one-brick/bz1450759.falloc.img', fmt=qcow2 size=20971520 encryption=off cluster_size=65536 preallocation=falloc lazy_refcounts=off refcount_bits=16
qemu-img: gluster://vm015.example.com/one-brick/bz1450759.falloc.img: Invalid preallocation mode: 'falloc' or GlusterFS doesn't support zerofill API


2. verification of the fix with the test-build:

[root@vm013 ~]# rpm -q qemu-img
qemu-img-2.9.0-1.fc25.0.1bz1450759.x86_64
[root@vm013 ~]# qemu-img create -f qcow2 -o preallocation=falloc gluster://vm015.example.com/one-brick/bz1450759.falloc.img 20M
Formatting 'gluster://vm015.example.com/one-brick/bz1450759.falloc.img', fmt=qcow2 size=20971520 encryption=off cluster_size=65536 preallocation=falloc lazy_refcounts=off refcount_bits=16

[root@vm013 ~]# qemu-img create -f qcow2 -o preallocation=full gluster://vm015.example.com/one-brick/bz1450759.full.img 20M
Formatting 'gluster://vm015.example.com/one-brick/bz1450759.full.img', fmt=qcow2 size=20971520 encryption=off cluster_size=65536 preallocation=full lazy_refcounts=off refcount_bits=16

[root@vm013 ~]# qemu-img create -f qcow2 -o preallocation=off gluster://vm015.example.com/one-brick/bz1450759.off.img 20M
Formatting 'gluster://vm015.example.com/one-brick/bz1450759.off.img', fmt=qcow2 size=20971520 encryption=off cluster_size=65536 preallocation=off lazy_refcounts=off refcount_bits=16

Comment 12 Ademar Reis 2017-06-05 23:51:08 UTC
Requesting exception flag because this is important for Gluster support (layered products). The change is low risk and doesn't affect RHEL users, as this is for qemu-kvm-rhev.

Comment 13 Miroslav Rezanina 2017-06-06 08:54:25 UTC
Fix included in qemu-kvm-rhev-2.9.0-8.el7

Comment 15 Ping Li 2017-06-11 14:22:21 UTC
Reproduced bug with qemu-kvm-rhev-2.9.0-7.el7:

1. For qcow2 image
1.1 off mode ------> pass
# qemu-img create -f qcow2 -o preallocation=off gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img 1G
Formatting 'gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 preallocation=off lazy_refcounts=off refcount_bits=16
# qemu-img info gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
image: gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 193K
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

1.2 metadata mode ------> pass
# qemu-img create -f qcow2 -o preallocation=metadata gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img 1G
Formatting 'gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
# qemu-img info gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
image: gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 516K
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

1.3 falloc mode ------> fail
# qemu-img create -f qcow2 -o preallocation=falloc gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img 1G
qemu-img: gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img: Invalid preallocation mode: 'falloc' or GlusterFS doesn't support zerofill API

1.4 full mode ------> pass
# qemu-img create -f qcow2 -o preallocation=full gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img 1G
# qemu-img info gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
image: gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 1.0G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

2. For raw image
2.1 off mode ------> pass
# qemu-img create -f raw -o preallocation=off gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img 1G
Formatting 'gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img', fmt=raw size=1073741824 preallocation=off
# qemu-img info gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
image: gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
file format: raw
virtual size: 1.0G (1073741824 bytes)
disk size: 0

2.2 falloc mode ------> fail
# qemu-img create -f raw -o preallocation=falloc gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img 1G
Formatting 'gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img', fmt=raw size=1073741824 preallocation=falloc
qemu-img: gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img: Invalid preallocation mode: 'falloc' or GlusterFS doesn't support zerofill API

2.3 full mode ------> pass
# qemu-img create -f raw -o preallocation=full gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img 1G
Formatting 'gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img', fmt=raw size=1073741824 preallocation=full
# qemu-img info gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
image: gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
file format: raw
virtual size: 1.0G (1073741824 bytes)
disk size: 1.0G



Verified the issue with qemu-kvm-rhev-2.9.0-9.el7:

1. For qcow2 image
1.1 off mode ------> pass
# qemu-img create -f qcow2 -o preallocation=off gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img 1G
Formatting 'gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 preallocation=off lazy_refcounts=off refcount_bits=16
# qemu-img info gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
image: gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 193K
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

1.2 metadata mode ------> pass
# qemu-img create -f qcow2 -o preallocation=metadata gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img 1G
Formatting 'gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
# qemu-img info gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
image: gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 516K
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

1.3 falloc mode ------> pass
# qemu-img create -f qcow2 -o preallocation=falloc gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img 1G
Formatting 'gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 preallocation=falloc lazy_refcounts=off refcount_bits=16
# qemu-img info gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
image: gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 1.0G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

1.4 full mode ------> pass
# qemu-img create -f qcow2 -o preallocation=full gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img 1G
Formatting 'gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 preallocation=full lazy_refcounts=off refcount_bits=16
# qemu-img info gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
image: gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 1.0G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

2. For raw image
2.1 off mode ------> pass
# qemu-img create -f raw -o preallocation=off gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img 1G
Formatting 'gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img', fmt=raw size=1073741824 preallocation=off
# qemu-img info gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
image: gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
file format: raw
virtual size: 1.0G (1073741824 bytes)
disk size: 0

2.2 falloc mode ------> pass
# qemu-img create -f raw -o preallocation=falloc gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img 1G
Formatting 'gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img', fmt=raw size=1073741824 preallocation=falloc
# qemu-img info gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
image: gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
file format: raw
virtual size: 1.0G (1073741824 bytes)
disk size: 1.0G

2.3 full mode ------> pass
# qemu-img create -f raw -o preallocation=full gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img 1G
Formatting 'gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img', fmt=raw size=1073741824 preallocation=full
# qemu-img info gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
image: gluster://bootp-73-199-197.lab.eng.pek2.redhat.com/gv0/vm1.img
file format: raw
virtual size: 1.0G (1073741824 bytes)
disk size: 1.0G

Comment 17 errata-xmlrpc 2017-08-02 04:38:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392