1751934 – Fail to install guest when xfs is the host filesystem

Bug 1751934 - Fail to install guest when xfs is the host filesystem

Summary: Fail to install guest when xfs is the host filesystem

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux Advanced Virtualization
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	8.1
Hardware:	All
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	rc
Target Release:	8.1
Assignee:	Hanna Czenczek
QA Contact:	qing.wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1624641 1711971 1743176 1758964 1772941
TreeView+	depends on / blocked

Reported:	2019-09-13 06:40 UTC by Lukáš Doktor
Modified:	2021-12-17 17:20 UTC (History)
CC List:	23 users (show)
Fixed In Version:	qemu-kvm-4.1.0-14.module+el8.1.0+4548+ed1300f4
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1772941 (view as bug list)
Environment:
Last Closed:	2019-11-06 12:58:42 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	knoel: mirror+

Attachments	(Terms of Use)
Avocado-vt results with Anaconda logs from guest attached as well as system information. (431.16 KB, application/x-bzip) 2019-09-13 06:40 UTC, Lukáš Doktor	no flags	Details
Execute FALLOC_FL_ZERO_RANGE and AIO pwrite in parallel (1.66 KB, text/plain) 2019-10-24 08:57 UTC, Hanna Czenczek	no flags	Details
debug info (1.56 MB, application/gzip) 2019-10-29 01:46 UTC, qing.wang	no flags	Details
Reply to comment 29 with fio raw output (20.21 KB, application/octet-stream) 2019-11-06 20:20 UTC, IBM Bug Proxy	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
IBM Linux Technology Center	181787	None	None	None	2019-10-05 07:54:26 UTC
Red Hat Issue Tracker	RHELPLAN-28684	None	None	None	2021-12-17 17:20:43 UTC
Red Hat Product Errata	RHBA-2019:3732	None	None	None	2019-11-06 12:58:55 UTC

Description Lukáš Doktor 2019-09-13 06:40:10 UTC

Created attachment 1614708 [details]
Avocado-vt results with Anaconda logs from guest attached as well as system information.

Description of problem:
Installation of RHEL-8.1.0-Snapshot-3.0 in qemu-kvm guest from URL is failing on our power8 machine with "DNF error: Error in POSTIN scriptlet in rpm package graphviz".

Version-Release number of selected component (if applicable):
qemu-kvm-4.1.0-8.module+el8.1.0+4199+446e40fc.ppc64le
kernel-4.18.0-141.el8.ppc64le

How reproducible:
3/3

Steps to Reproduce:
1. avocado --show=test run unattended_install.url.extra_cdrom_ks.default_install.aio_native --vt-type qemu --vt-guest-os Linux.RHEL.7.devel.ppc64le --vt-machine-type pseries --vt-extra-params url=http://download.devel.redhat.com/rhel-8/rel-eng/RHEL-8/RHEL-8.1.0-Snapshot-3.0/compose/BaseOS/ppc64le/os/ unattended_file=unattended/RHEL-8-new.ks kernel_params='ks=cdrom nic delay=60 console=hvc0 inst.repo=http://download.devel.redhat.com/rhel-8/rel-eng/RHEL-8/RHEL-8.1.0-Snapshot-3.0/compose/BaseOS/ppc64le/os/ inst.sshd' vga=none install_timeout=1600

Actual results:
On serial console we can see the mentioned error and installation stalls.

Expected results:
It should proceed without interruptions

Additional info:
Attaching Avocado-vt results that contain all the /tmp/*.log files (job-2019-09-13T02.04-ca42d91/test-results/1-io-github-autotest-qemu.unattended_install.url.extra_cdrom_ks.default_install.aio_native/data/avocado-vt-vm1/0/tmp) of the guest as well as many information about the environment.

Note that we are using slightly modified kickstart to enable the AppStream repos.

Comment 1 Lukáš Doktor 2019-09-13 06:50:17 UTC

Note I remember recent issue with similar symptoms, that was fixed by: http://patchwork.ozlabs.org/patch/1161598/. This system is using XFS, but as far as I know the downstream qemu-kvm does not enable xfsctl. Anyway mentioning it here in case the anaconda logs won't provide any useful info.

Comment 2 Cleber Rosa 2019-09-13 16:08:08 UTC

Worth noticing that it doesn't seem limited to graphviz.

From the attached job-2019-09-13T02.04-ca42d91/test-results/1-io-github-autotest-qemu.unattended_install.url.extra_cdrom_ks.default_install.aio_native/data/avocado-vt-vm1/0/tmp/packaging.log:

06:15:11,075 ERR dnf.rpm: Error in POSTIN scriptlet in rpm package graphviz
06:19:24,203 ERR dnf.rpm: Error in POSTTRANS scriptlet in rpm package kernel-core
06:19:56,890 ERR dnf.rpm: Error in <unknown> scriptlet in rpm package gtk2

Comment 3 Wainer dos Santos Moschetta 2019-09-13 21:23:29 UTC

In addition to comment#2 if you run the test consecutively then it fails on different package each time.

Also I found something interesting. I was connected at the VM with VNC to watch the installation. When it got failed I typed ctrl+b+2 to get in Anaconda's shell. On the shell I took look at some the files (i.e. did nothing special) then I reboot the VM with `shutdown -r now`. At this point the installation process started again -- apparently from the very begin -- and this time it finished with success.

Comment 7 Gu Nini 2019-09-24 08:50:53 UTC

Hi Lukas, does the problem also occur on x86 platform?

Comment 8 Wainer dos Santos Moschetta 2019-09-24 12:38:05 UTC

Hi Gu,

Yes, the problem occurs on x86_64 and ppc64le as well.

Comment 10 Gu Nini 2019-09-25 02:02:28 UTC

(In reply to Wainer dos Santos Moschetta from comment #8)
> Hi Gu,
> 
> Yes, the problem occurs on x86_64 and ppc64le as well.

Thanks. So change the QA Contact.

Comment 13 Hanna Czenczek 2019-09-25 17:16:19 UTC

Yes, I wrote that fix because Lukáš reported to me that installing RHEL failed with XFS on the host.  So yes, the patch should in theory fix it.

I did not backport that fix to RHEL, because it was my impression that we used --disable-xfsctl for qemu-kvm on RHEL 8 (both AV and non-AV) (according to qemu-kvm.spec.template).  --disable-xfsctl has basically the same effect as my patch.

So did I misunderstand something about how qemu-kvm is configured downstream?  Do we not use the ./configure line from qemu-kvm.spec.template?

Max

Comment 14 Ademar Reis 2019-09-25 17:53:35 UTC

I see --disable-xfsctl being passed to the configure in the specfile, so this shouldn't be an issue with downstream.

Lukas: have you reported a problem using a different build? (see comment #13)

Comment 15 Lukáš Doktor 2019-09-26 07:06:05 UTC

Hello Ademar, it was tested on kmezick's system, not sure about the setup. I can only tell for sure that it was qemu-4.1.0 (qemu-kvm-4.1.0-8.module+el8.1.0+4199+446e40fc). It was installed either from brew or AV repo and was definitely not self-compiled. According to mrezanin it should come with "--disable-xfsctl".

Maybe a stupid question, is it possible that xfs now supports some new features that are available even without xfsctl support and perhaps not tested with qemu? IIRC the upstream fix simply removed the handlers and relies on kernel, right? (at least that's what I understood)

Comment 16 Hanna Czenczek 2019-09-26 08:11:05 UTC

The upstream bug wasn’t due to xfsctl() itself.  The upstream bug was purely in qemu because of 50ba5b2d994853b38fed10e0841b119da0f8b8e5: When writing zeroes past the EOF, we expect the file to grow.  But xfsctl’s XFS_IOC_ZERO_RANGE does not do that.  Enter 50ba5b2d994853b38fed10e0841b119da0f8b8e5, which ftruncated the file before using XFS_IOC_ZERO_RANGE so that it would be long enough.

The problem is that this ftruncate() may interfere with concurrent write requests past the old and new EOF, and thus may discard data that is concurrently written past the new EOF.

By using fallocate() instead of xfsctl() + XFS_IOC_ZERO_RANGE, we don’t have the problem at all, because fallocate() will increase the file length as we want it.  So there is no ftruncate() in that code path.

With --disable-xfsctl, 50ba5b2d994853b38fed10e0841b119da0f8b8e5 just changed dead code, and as such did not introduce a bug.  Similarly, b2c6f23f4a9f6d8f1b648705cd46d3713b78d6a2 only removed (then) dead code.

Max

Comment 17 CongLi 2019-09-26 09:25:08 UTC

Hi Max,

QE met following error when try to reproduce this bug, it also happened during installing package, could you please if they are the same cause?

2019-09-23 04:50:32: [  187.883985] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b
2019-09-23 04:50:32: [  187.883985]
2019-09-23 04:50:32: [  187.885692] CPU: 0 PID: 1 Comm: systemd Not tainted 4.18.0-144.el8.x86_64 #1
2019-09-23 04:50:32: [  187.886893] Hardware name: Red Hat compose, BIOS 1.12.0-5.module+el8.1.0+4022+29a53beb 04/01/2014
2019-09-23 04:50:32: [  187.888665] Call Trace:
2019-09-23 04:50:32: [  187.889193]  dump_stack+0x5c/0x80
2019-09-23 04:50:32: [  187.889880]  panic+0xe7/0x247
2019-09-23 04:50:32: [  187.890490]  do_exit.cold.22+0x59/0x81
2019-09-23 04:50:32: [  187.891292]  do_group_exit+0x3a/0xa0
2019-09-23 04:50:32: [  187.892024]  get_signal+0x159/0x850
2019-09-23 04:50:32: [  187.892744]  ? async_page_fault+0x8/0x30
2019-09-23 04:50:32: [  187.893541]  do_signal+0x36/0x610
2019-09-23 04:50:32: [  187.894212]  ? async_page_fault+0x8/0x30
2019-09-23 04:50:32: [  187.895008]  exit_to_usermode_loop+0x89/0xf0
2019-09-23 04:50:32: [  187.895872]  prepare_exit_to_usermode+0x90/0xc0
2019-09-23 04:50:32: [  187.896798]  retint_user+0x8/0x8
2019-09-23 04:50:32: [  187.897559] RIP: 0033:0x55f5d41f1910
2019-09-23 04:50:32: [  187.898314] Code: Bad RIP value.
2019-09-23 04:50:32: [  187.899127] RSP: 002b:00007ffef7231378 EFLAGS: 00010202
2019-09-23 04:50:32: [  187.899880] RAX: 0000000000000000 RBX: 000055f5d503dec0 RCX: 000055f5d44ec050
2019-09-23 04:50:32: [  187.900890] RDX: 00007ffef7231380 RSI: 00007ffef72314b0 RDI: 0000000000000007
2019-09-23 04:50:32: [  187.902229] RBP: 00007ffef7a2fea0 R08: 00007ffef7a2fe10 R09: 614e794272657355
2019-09-23 04:50:32: [  187.903691] R10: 000055f5d50d2530 R11: 802ee188c6640f5e R12: 00007ffef7a2fe30
2019-09-23 04:50:32: [  187.904984] R13: 000055f5d50d2650 R14: 000055f5d5118b38 R15: 000055f5d50f57a0

Thanks.

Comment 18 Hanna Czenczek 2019-09-26 10:11:23 UTC

I cannot, because there is nothing specific about the guest error, so I have no way to tell what causes it on the host.

But could you try just running the test that accompanies the upstream fix?

$ rm -f test.qcow2
$ qemu-img create -f qcow2 -o cluster_size=4096 test.qcow2 1M
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16

$ (for ((kb = 1024 - 4; kb >= 0; kb -= 4)); do \
    echo "aio_write -P 42 $((kb + 1))k 2k"; \
done) \
| qemu-io test.qcow2 > /dev/null

$ (for ((kb = 0; kb < 1024; kb += 4)); do \
    echo "read -P 0 ${kb}k 1k"; \
    echo "read -P 42 $((kb + 1))k 2k"; \
    echo "read -P 0 $((kb + 3))k 1k"; \
done) \
| qemu-io test.qcow2 | grep verification

If you run that test a couple of times (on XFS) and you see “Pattern verification failed” messages, then that would hint at it being the same bug.


Or if you run this on XFS:

$ qemu-img create -f raw test.raw 1M
$ strace -f -e trace=ioctl qemu-io -f raw -c 'write -z 0M 1M' test.raw 2>&1 \
      | grep '_IOC(_IOC_WRITE, 0x58, 0x39'

And the grep matches something, then we’d know that the qemu on your system does not have xfsctl disabled (and uses it when writing zeroes).


Max

Comment 19 Lukáš Doktor 2019-09-30 15:18:07 UTC

I briefly tried this on my arm machine and it hasn't reproduced. Karen, can you please elaborate on your setup and ideally try the above commands?

Btw I tried the stack trace thing:

```
[root@gigabyte-r120-15 test]# rm test.raw
rm: cannot remove 'test.raw': No such file or directory
[root@gigabyte-r120-15 test]# qemu-img create -f raw test.raw 1M
Formatting 'test.raw', fmt=raw size=1048576
[root@gigabyte-r120-15 test]# strace -f -e trace=ioctl qemu-io -f raw -c 'write -z 0M 1M' test.raw 2>&1 \
>       | grep '_IOC(_IOC_WRITE, 0x58, 0x39'
```

As well as the checks (not all iterations in this output):

```
[root@gigabyte-r120-15 test]# while :; do sh a.sh; done
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
^C
[root@gigabyte-r120-15 test]# cat a.sh
rm -f test.qcow2
qemu-img create -f qcow2 -o cluster_size=4096 test.qcow2 1M
(for ((kb = 1024 - 4; kb >= 0; kb -= 4)); do \
    echo "aio_write -P 42 $((kb + 1))k 2k"; \
done) \
| qemu-io test.qcow2 > /dev/null
(for ((kb = 0; kb < 1024; kb += 4)); do \
    echo "read -P 0 ${kb}k 1k"; \
    echo "read -P 42 $((kb + 1))k 2k"; \
    echo "read -P 0 $((kb + 3))k 1k"; \
done) \
| qemu-io test.qcow2 | grep verification
```

Setup:
[root@gigabyte-r120-15 test]# uname -a
Linux gigabyte-r120-15.khw4.lab.eng.bos.redhat.com 4.18.0-146.el8.aarch64 #1 SMP Tue Sep 24 21:24:39 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux
[root@gigabyte-r120-15 test]# rpm -q qemu-kvm
qemu-kvm-4.1.0-12.module+el8.1.0+4292+1293ae0c.aarch64
[root@gigabyte-r120-15 test]# pwd
/home/test
[root@gigabyte-r120-15 test]# mount | grep /home
/dev/mapper/rhel_gigabyte--r120--15-home on /home type xfs (rw,relatime,seclabel,attr2,inode64,noquota)

Comment 20 SATOH Fumiyasu 2019-10-02 04:50:09 UTC

(In reply to Lukas Doktor from comment #19)

I can reproduce a similar (same?) problem on Debian sid (x86_64).

```
$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux bullseye/sid"
NAME="Debian GNU/Linux"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
$ uname -a
Linux sugar 5.2.0-2-amd64 #1 SMP Debian 5.2.9-2 (2019-08-21) x86_64 GNU/Linux
$ mount |grep /home
/dev/mapper/sugar.2018.12.nvme1-home on /home type xfs (rw,relatime,attr2,inode64,noquota)
$ xfs_info /home
meta-data=/dev/mapper/sugar.2018.12.nvme1-home isize=512    agcount=4, agsize=39321600 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=157286400, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=76800, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
$ dpkg -l |grep qemu
ii  ipxe-qemu					  1.0.0+git-20190125.36a4c85-1	      all	   PXE boot firmware - ROM images for qemu
ii  qemu-system-common				  1:4.1-1+b2			      amd64	   QEMU full system emulation binaries (common files)
ii  qemu-system-data				  1:4.1-1			      all	   QEMU full system emulation (data files)
ii  qemu-system-gui				  1:4.1-1+b2			      amd64	   QEMU full system emulation binaries (user interface and audio support)
ii  qemu-system-x86				  1:4.1-1+b2			      amd64	   QEMU full system emulation binaries (x86)
ii  qemu-utils					  1:4.1-1+b2			      amd64	   QEMU utilities
$ sh a.sh
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
qemu-io> Pattern verification failed at offset 128000, 2048 bytes
$ while :; do sh a.sh;done
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
qemu-io> Pattern verification failed at offset 50176, 2048 bytes
qemu-io> Pattern verification failed at offset 54272, 2048 bytes
qemu-io> Pattern verification failed at offset 70656, 2048 bytes
qemu-io> Pattern verification failed at offset 74752, 2048 bytes
qemu-io> Pattern verification failed at offset 78848, 2048 bytes
qemu-io> Pattern verification failed at offset 82944, 2048 bytes
qemu-io> Pattern verification failed at offset 103424, 2048 bytes
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
...snipped...
^C
```

FYI. I've created a $LD_PRELOAD hack for workaround: https://github.com/fumiyas/linux-statfs-faker

```
$ statfs-faker -t 0xEF53 sh -c 'while :; do sh a.sh; done'
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
Formatting 'test.qcow2', fmt=qcow2 size=1048576 cluster_size=4096 lazy_refcounts=off refcount_bits=16
^C
```

Comment 21 SATOH Fumiyasu 2019-10-02 04:58:28 UTC

(In reply to SATOH Fumiyasu from comment #20)
> I can reproduce a similar (same?) problem on Debian sid (x86_64).

I'm using CentOS-8-x86_64-1905-boot.iso (x86_64) on QEMU, not ppc64le.

Comment 22 Hanna Czenczek 2019-10-02 10:42:37 UTC

If you’re using upstream qemu, that is no surprise, unfortunately.  (Distributions could work around it by providing --disable-xfsctl to configure, like RHEL does – or at least it was my impression that RHEL does this, and that’s what the discussion in this BZ is about.)

The bug has been fixed upstream (https://git.qemu.org/?p=qemu.git;a=commitdiff;h=b2c6f23f4a9f6d8f1b648705cd46d3713b78d6a2) and we’ll now have to wait for 4.1.1 or 4.2 so the fix is released.

Max

Comment 23 Karen Mezick 2019-10-02 20:08:09 UTC

Hi Lukas

>> I briefly tried this on my arm machine and it hasn't reproduced. 
>> Karen, can you please elaborate on your setup and ideally try the above commands?

The setup for the machine was:

Hostname:ibm-p9wr-12.pnr.lab.eng.bos.redhat.com
Arch: ppc64le
Avocado Version: 69.0
Red Hat Enterprise Linux release 8.1 Beta (Ootpa)
ipxe-roms-qemu-20181214-3.git133f4c47.el8.noarch
qemu-kvm-4.1.0-9.module+el8.1.0+4210+23b2046a.ppc64le
libvirt-5.6.0-4.module+el8.1.0+4160+b50057dc.ppc64le
kernel-4.18.0-141.el8.ppc64le

The machine is no longer available to run the commands but I have attached a tar ball with 
the logs. 

We have implemented the temporary workaround you suggested for our CI, making Beaker format
the root partition ('/') with ext4 instead of xfs, this way qemu-kvm won't need to interface 
with xfs on host.

If you still need me to try to reproduce the problem and run the commands in Comment 18, 
please let me know. 

Thanks,
Karen

Comment 26 Hanna Czenczek 2019-10-04 11:07:42 UTC

If qemu-io doesn’t use xfsctl() for writing zeroes (as indicated by comment 25), then the issue is a different one from that fixed by b2c6f23f4a9f6d8f1b648.

Max

Comment 27 Hanna Czenczek 2019-10-04 13:14:36 UTC

Just a note: If the problem only appears on ppc64, then it may or may not be related to BZ 1745823.

Max

Comment 28 Hanna Czenczek 2019-10-04 19:23:12 UTC

The problem is that I don’t have any lead.  I could find the root of the previous bug just because I had a hunch that maybe some xfsctl() usage is the problem.  But apparently we don’t use xfsctl(), so there shouldn’t be any difference in behavior between XFS and ext4...

The only thing I know of that looks related is said BZ 1745823.  In that BZ, Laurent bisected the problematic commit to be c8bb23cbdbe3 (which is a qcow2 optimization).  So I’ll try to brew an RPM next week with that commit reverted; it would be great if any of you could test it.


Max

Comment 29 Hanna Czenczek 2019-10-07 10:37:43 UTC

Laurent has confirmed that reverting c8bb23cbdbe3 solves the performance issue.  I’ve created a scratch build with that commit reverted, the ppc64le RPMs are here:
http://brew-task-repos.usersys.redhat.com/repos/scratch/mreitz/qemu-kvm/4.1.0/13.el8.maxx201910071208/ppc64le/

Could you please test whether the issue persists with that version?

Max

Comment 31 Hanna Czenczek 2019-10-15 16:04:23 UTC

I’ve had problems the last week to get ppc64 machines from the virt pool (or to run the Avocado RHEL 8 installation on any ppc64 machine, for that matter); on the upside, I think I’ve reproduced the problem on x64.

I’ve bisected it (which took two days) and I have indeed arrived at c8bb23cbdbe32f5c326365e0a82e1b0e68cdcd8a that causes the installation to hang at least.  I see installation failures on master, I haven’t seen them on c8bb23cbdbe32f5c326365e0a82e1b0e68cdcd8a yet, only timeouts.  OTOH, I have seen the installation failures very shortly after c8bb23cbdbe32f5c326365e0a82e1b0e68cdcd8a, and there are no commits in between that look like they might cause failure.

Max

Comment 32 Eric Sandeen 2019-10-22 14:10:31 UTC

As a side note, please cc: filesystem folks on bugs like this so we can help as needed.

When the problematic qcow2 commit c8bb23c is in place, does it go through the xfs_write_zeroes() function that was removed in https://patchwork.ozlabs.org/patch/1161598/ ?  I guess I should look at all that a bit more closely to try to understand what was going wrong.  My understanding of this bug is that either with https://patchwork.ozlabs.org/patch/1161598/ applied, or with c8bb23c reverted, the problem goes away, is that correct?

Comment 33 Hanna Czenczek 2019-10-22 14:46:16 UTC

No, it doesn’t go through that function.  Downstream, that function was always configured out anyway, and upstream I see the problem even with that patch (1161598) applied.

So the problem exists with 1161598 (commit b2c6f23f4a9f6d8) applied, as long as c8bb23c is present.  Only with c8bb23c reverted does the problem go away.

The upstream mail thread is here: https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg00560.html

(CentOS-based non-avocado reproducer is in https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg01282.html)


So what I can see right now is that the problem persists even if c8bb23c is partially reverted: That commit replaces an explicit write of zeroes at the beginning and end of a qcow2 cluster (64 kB in total by default) by FALLOC_FL_ZERO_RANGE of the whole cluster.  But the problem is still there even when FALLOC_FL_ZERO_RANGE is done only on that part of the cluster that will later be overwritten by guest data, and the explicit zero writing is done as before the commit.

It only appears with the aio=native qemu option.

Max

Comment 34 Hanna Czenczek 2019-10-23 07:49:41 UTC

Unfortunately, applying Eric’s “xfs: don't fragment files with ZERO_RANGE calls” series to the kernel hasn’t changed the result.

Max

Comment 35 Hanna Czenczek 2019-10-24 08:57:02 UTC

It looks like performing FALLOC_FL_ZERO_RANGE and pwrite() (through the AIO interface) in parallel effectively discards the pwrite() if:
- Both are beyond the EOF, so both need to increase the file length,
- The pwrite() concerns an area after the zeroed range,
- The fallocate() settles after the pwrite().

Notably, both requests do not overlap.

Afterwards, the file length only includes the zeroed range, but not the data written (so the data isn’t there).  It looks like the fallocate() truncates the file to the new length after the write has settled, because it assumes this would increase its length but actually it doesn’t.  (It would have increased the length when fallocate() was started.)


I’ve added a test case (parallel-falloc-and-pwrite.c) that does the following to reproduce the problem:
(1) Open a scratch file with O_CREAT | O_TRUNC | O_RDWR | O_DIRECT,
(2) Launch a thread that performs FALLOC_FL_ZERO_RANGE on the first 4 kB,
(3) Submit an AIO pwrite request on the second 4 kB,
(4) Wait for the pwrite, join the fallocate thread,
(5) Run hexdump -C on the file.

Whenever the fallocate() and pwrite() are both started before the other is finished and the fallocate() is done before pwrite(), the hexdump shows the file to only contain 4 kB of zeroes.  (This happens pretty much 100 % on my system.)

You can uncomment IN_ORDER to make the requests happen sequentially, which shows how the file should really look like.

On ext4 and btrfs, the result is as expected (4 kB of zeroes, then 4 kB of data).

So it looks to me like a bug in the XFS driver.

Max

Comment 36 Hanna Czenczek 2019-10-24 08:57:58 UTC

Created attachment 1628790 [details]
Execute FALLOC_FL_ZERO_RANGE and AIO pwrite in parallel

Comment 37 Hanna Czenczek 2019-10-24 09:02:19 UTC

Comment on attachment 1628790 [details]
Execute FALLOC_FL_ZERO_RANGE and AIO pwrite in parallel

Compile this with:

$ gcc parallel-falloc-and-pwrite.c -pthread -laio -Wall -Wextra -pedantic -std=c11

And run like this:

$ ./a.out tmp-file

Comment 38 Hanna Czenczek 2019-10-24 11:48:01 UTC

Requesting blocker:

This bug prevents installation of RHEL 8 on XFS when using qcow2 (most of the time).  It may also cause guest corruption in general.  As such, it is something we should really fix.

I think we’ve found the root cause.  However, I don’t think we can address this in time for 8.1.  What we can do is revert the upstream commit that made this problem apparent (namely c8bb23cbdbe3).
For qcow2, this should fix the problem, as there shouldn’t be any other zero-writes that it performs on the protocol layer.
For raw, there is no problem because the bug only becomes appearent for post-EOF writes (which cannot happen for raw images, as they have a fixed size).

This would also fix BZ 1745823 for the time being.

(Upstream, we should also work on a solution to work around the problem while it isn’t fixed in XFS.  But that will take more time than we have for 8.1.)

Max

Comment 43 Eric Sandeen 2019-10-25 02:48:20 UTC

(In reply to Max Reitz from comment #36)
> Created attachment 1628790 [details]
> Execute FALLOC_FL_ZERO_RANGE and AIO pwrite in parallel

thanks for this, we'll take a look.

Comment 52 Hanna Czenczek 2019-10-25 11:50:47 UTC

I’ve created BZ 1765547 to track the XFS driver part of the problem.

Max

Comment 56 qing.wang 2019-10-29 01:46:05 UTC

Created attachment 1629939 [details]
debug info

Comment 59 qing.wang 2019-10-30 02:12:38 UTC

Hi Lukas, could you please help to verify this issue on https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=998947.
This issue is hard to reproduce in my testing. 

Hi Max, could you please help to check attachment https://bugzilla.redhat.com/attachment.cgi?id=1629939, installation seems blocked in beginning.
Do you know the reason?  it related to this issue?

Comment 61 Hanna Czenczek 2019-10-30 08:36:18 UTC

Hi Qing,

(In reply to qing.wang from comment #59)
> Hi Max, could you please help to check attachment
> https://bugzilla.redhat.com/attachment.cgi?id=1629939, installation seems
> blocked in beginning.
> Do you know the reason?  it related to this issue?

It looks like Anaconda didn’t really start in the 39-* test, right?  I don’t think that’s related to this issue, because there shouldn’t have been any accesses to the installation disk yet.  (Well, at least I can’t see any installation progress in any of the logs or screen dumps.)

As for 57-*, from what I can see (from debug.log) it looks like a network timeout.  (So that failure is unrelated, too.)

Were those test runs on qemu-kvm 4.1.0-13?

Max

Comment 62 Hanna Czenczek 2019-10-30 08:37:36 UTC

*** Bug 1743176 has been marked as a duplicate of this bug. ***

Comment 68 Laurent Vivier 2019-11-06 09:16:56 UTC

*** Bug 1745823 has been marked as a duplicate of this bug. ***

Comment 69 IBM Bug Proxy 2019-11-06 09:27:06 UTC

------- Comment From muriloo.com 2019-09-06 11:33 EDT-------
I observed different performance on my P9 DD2.3 Witherspoon (running
latest GA firmware OP930.00).  None of my tests had significant
differences as 60%.

On my runs:
- virt:rhel seems ~7% better than virt:8.1 for "rw=write bs=4k iodepth=8" fio case.
- virt:8.1 seems ~4% better than virt:rhel for "rw=write bs=16k iodepth=1" fio case.

#1 - virt:rhel
Host:
kernel-4.18.0-135.el8.ppc64le
qemu-kvm-2.12.0-85.module+el8.1.0+4010+d6842f29.ppc64le
-machine pseries (alias of pseries-rhel7.6.0)

Guest:
kernel-4.18.0-135.el8.ppc64le

rw=write bs=4k iodepth=8
WRITE: bw=43.1MiB/s (45.2MB/s), 43.1MiB/s-43.1MiB/s (45.2MB/s-45.2MB/s), io=2586MiB (2711MB), run=60009-60009msec

rw=write bs=16k iodepth=1
WRITE: bw=62.9MiB/s (65.9MB/s), 62.9MiB/s-62.9MiB/s (65.9MB/s-65.9MB/s), io=3772MiB (3955MB), run=60003-60003msec

#2 - virt:8.1
Host:
kernel-4.18.0-135.el8.ppc64le
qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc.ppc64le
-machine pseries (alias of pseries-rhel8.1.0)

Guest:
kernel-4.18.0-135.el8.ppc64le

rw=write bs=4k iodepth=8
WRITE: bw=40.4MiB/s (42.3MB/s), 40.4MiB/s-40.4MiB/s (42.3MB/s-42.3MB/s), io=2423MiB (2540MB), run=60038-60038msec

rw=write bs=16k iodepth=1
WRITE: bw=65.3MiB/s (68.5MB/s), 65.3MiB/s-65.3MiB/s (68.5MB/s-68.5MB/s), io=3921MiB (4111MB), run=60004-60004msec

QEMU command line I used:

$ MALLOC_PERTURB_=1 numactl -m 8 \
/usr/libexec/qemu-kvm \
-name 'muriloo-vm1' \
-machine pseries \
-nodefaults \
-nographic \
-serial mon:stdio \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=$HOME/rhel-guest-image-8.1-167.ppc64le.qcow2 \
-device scsi-hd,id=image1,drive=drive_image1 \
-drive id=drive_disk1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=$HOME/storage2.qcow2 \
-device scsi-hd,id=disk1,drive=drive_disk1 \
-netdev user,id=n0,hostfwd=tcp::5555-:22 \
-device virtio-net-pci,netdev=n0 \
-m 4096 \
-smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \
-rtc base=utc,clock=host \
-enable-kvm

------- Comment From muriloo.com 2019-10-08 14:32 EDT-------
(In reply to comment #55)
> IBM KVM team - any thoughts on this?  Seems to be Power specific...

This issue is not seen on our P9 Boston and Witherspoon machines: boston16 and ltcgen11.
And it's also not seen on P8, as Laurent Vivier reported.
It was seen on Red Hat Boston machines so far.

Even though the issue is not seen on our machines, I also confirmed that reverting the commit c8bb23cbdb improved the performance on boston16 machine.

The next step here would be investigating why commit c8bb23cbdb causes the performance harm.

------- Comment From fnovak.com 2019-10-08 16:17 EDT-------
(In reply to comment #57)
> (In reply to comment #55)
> > IBM KVM team - any thoughts on this?  Seems to be Power specific...
> This issue is not seen on our P9 Boston and Witherspoon machines: boston16
> and ltcgen11.
> And it's also not seen on P8, as Laurent Vivier reported.
> It was seen on Red Hat Boston machines so far.
>
> Even though the issue is not seen on our machines, I also confirmed that
> reverting the commit c8bb23cbdb improved the performance on boston16 machine.
>
> The next step here would be investigating why commit c8bb23cbdb causes the
> performance harm.
YEAH!  And it seems this is Power unique, or at least Power really impacted..
Who's on point on Power side to work with RH?

------- Comment From fnovak.com 2019-10-16 19:28 EDT-------
chandan, Aneesh - any thoughts on why fallloc of block of zeros significantly slower than writing out zeros?  Problem on Power but not x86.  Some alignment problem, or ???

Comment 71 errata-xmlrpc 2019-11-06 12:58:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3732

Comment 72 IBM Bug Proxy 2019-11-06 20:20:29 UTC

Created attachment 1633449 [details]
Reply to comment 29 with fio raw output

Comment 74 IBM Bug Proxy 2020-04-29 12:22:23 UTC

------- Comment From hannsj_uhl.com 2020-04-29 08:14 EDT-------
..

Note You need to log in before you can comment on or make changes to this bug.

anascko
bugproxy
coli
crosa
ddepaula
dgibson
esandeen
fumiyas-u-fedoraproject
hannsj_uhl
hreitz
jen
jskarvad
juzhang
kmezick
knoel
mkolman
mtessun
phou
qinwang
virt-maint
wainersm
ymankad
zhenyzha