Bug 1922248 - TCG change breaks running "mount" binary on s390x
Summary: TCG change breaks running "mount" binary on s390x
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.4
Hardware: s390x
OS: Linux
unspecified
unspecified
Target Milestone: rc
: 8.5
Assignee: David Hildenbrand
QA Contact: virt-qe-z
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-29 14:17 UTC by Miroslav Rezanina
Modified: 2021-02-24 12:58 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 12:58:09 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Miroslav Rezanina 2021-01-29 14:17:22 UTC
When running module build for weekly rebase build, it fails on libguestfs [1][2]

This build is part of module build 9654 [3].

Not sure of exact reason of failure but probable candidate is this line:

   guestfsd: failed to initialize device name translation cache


[1] http://download.eng.bos.redhat.com/brewroot/work/tasks/8808/34588808/build.log
[2] http://batcave.lab.eng.brq.redhat.com/www/log_bck/libguestfs-s390x-wrb-build.log
[3] https://mbs.engineering.redhat.com/module-build-service/2/module-builds/9654

Comment 1 Miroslav Rezanina 2021-01-29 14:20:27 UTC
Fix is targeted to 8.5.0 where the rebase should land. However, we will need private branch with this patch based on 8.4 to be able to run weekly rebase build.

Comment 2 Richard W.M. Jones 2021-02-01 09:13:26 UTC
I didn't realise this was failing on s390 :-(  I tried it on x86-64
locally and of course there was no problem.  Unfortunately none of
the usual s390 servers in Boston are reachable at the moment.

However looking closely at the log I can see there is something
really very broken:

(1) The libguestfs /init script fails to mount anything:

mount: /proc: wrong fs type, bad option, bad superblock on /proc, missing codepage or helper program, or other error.
/init: line 38: /proc/cmdline: No such file or directory
mount: /sys: wrong fs type, bad option, bad superblock on /sys, missing codepage or helper program, or other error.
mount: /run: wrong fs type, bad option, bad superblock on tmpfs, missing codepage or helper program, or other error.
mount: /dev: wrong fs type, bad option, bad superblock on /dev, missing codepage or helper program, or other error.
mount: /dev/pts: wrong fs type, bad option, bad superblock on /dev/pts, missing codepage or helper program, or other error.
mount: /dev/shm: wrong fs type, bad option, bad superblock on shmfs, missing codepage or helper program, or other error.

(2) As a result of this, /dev is not populated.  Because we do not
manage to mount the devtmpfs over /dev, /dev starts with only a few files
(ones that happen to be present in RPMs).  There are therefore lots of errors
such as:

  dd: failed to open '/dev/urandom': No such file or directory
  Failed to redirect standard streams to /dev/null: No such file or directory

indicating that /dev is completely broken.  I guess udevd is also not running
or is running but not able to create device nodes.

(3) Lots of things break because of lack of /dev, and the final guestfsd
error message is just a side-effect of this.

So yeah .. why the heck doesn't that mount command work?

  https://github.com/libguestfs/libguestfs/blob/30f74f38bd6e42e783ba80895f4d6826abddd417/appliance/init#L35
  mount -t proc /proc /proc
  mount -t sysfs /sys /sys
  mount -t tmpfs -o "nosuid,size=20%,mode=0755" tmpfs /run

My feeling is the kernel is broken or lacking a kernel module.

Comment 3 Richard W.M. Jones 2021-02-01 19:00:47 UTC
I was able to reproduce this with upstream qemu.  It only occurs
under TCG (not KVM), so I guess it's an emulation bug.  I will
see if I can bisect the commit which causes the problem next.

Comment 4 Richard W.M. Jones 2021-02-01 20:43:34 UTC
8fe35e0444be88de4e3ab80a2a0e210a1f6d663d is the first bad commit
commit 8fe35e0444be88de4e3ab80a2a0e210a1f6d663d
Author: Richard Henderson <richard.henderson>
Date:   Mon Mar 30 20:42:43 2020 -0700

    tcg/optimize: Use tcg_constant_internal with constant folding
    
    Signed-off-by: Richard Henderson <richard.henderson>

 tcg/optimize.c | 108 ++++++++++++++++++++++++++-------------------------------
 1 file changed, 49 insertions(+), 59 deletions(-)

Comment 6 Richard W.M. Jones 2021-02-02 08:29:10 UTC
How to reproduce:

(1) Install /usr/bin/libguestfs-test-tool from RHEL on s390x.

(2) Compile qemu from git (I am using commit 74208cd252c5).

(3) Run the following command, all on one line.  Note you have to
adjust the path to your local copy of qemu as appropriate:

LIBGUESTFS_BACKEND=direct LIBGUESTFS_BACKEND_SETTINGS=force_tcg LIBGUESTFS_HV=/path/to/qemu/build/s390x-softmmu/qemu-system-s390x libguestfs-test-tool

(4) If you have reproduced the bug you will see the following in the output
(and there will be a cascade of failures following this):

  Starting /init script ...
  mount: /proc: wrong fs type, bad option, bad superblock on /proc, missing codepage or helper program, or other error.

Comment 7 Thomas Huth 2021-02-02 08:39:17 UTC
(In reply to Richard W.M. Jones from comment #5)
> See also: https://bugs.launchpad.net/qemu/+bug/1912065

The fix for that bug has been merged here:

https://git.qemu.org/?p=qemu.git;a=commitdiff;h=ae30e86661b0f48562cd95

So if you are still seeing problems, I guess it's a different bug instead?

Comment 8 Richard W.M. Jones 2021-02-02 09:59:50 UTC
Yes it's something different.  I don't see any warning, assertion or crash.
What happens is a failure in guest emulation affecting the mount command
(possibly other commands too, but that's the first command we run).  However
I bisected it yesterday to the same commit.

Comment 9 smitterl 2021-02-02 10:43:45 UTC
qa-ack'ing scope discussed on IRC with Miroslav: regression only

Comment 10 Richard W.M. Jones 2021-02-04 16:52:07 UTC
Philippe posted this patch which worked for me:
https://lists.nongnu.org/archive/html/qemu-devel/2021-02/msg01640.html

Comment 11 Thomas Huth 2021-02-10 12:55:33 UTC
If I've got that right, the fix has been merged here:
https://gitlab.com/qemu-project/qemu/-/commit/8e43c5a1f289ce002e9d2610
... could you please check whether this final version fixes the issue for you?

Comment 12 Richard W.M. Jones 2021-02-10 13:03:01 UTC
The hardware has gone back to Beaker now, but that is the exact version
I tested on s390x last week.  Now that it's upstream it's my understanding
that it will be picked up in the next WRB build (today?)

Comment 13 Philippe Mathieu-Daudé 2021-02-10 21:47:25 UTC
(In reply to Thomas Huth from comment #11)
> If I've got that right, the fix has been merged here:
> https://gitlab.com/qemu-project/qemu/-/commit/8e43c5a1f289ce002e9d2610
> ... could you please check whether this final version fixes the issue for
> you?

Yes, this is the correct (merged) commit.

Comment 14 Thomas Huth 2021-02-24 11:56:02 UTC
Mirek, is the latest rebase build working again?

Comment 15 Miroslav Rezanina 2021-02-24 12:58:09 UTC
(In reply to Thomas Huth from comment #14)
> Mirek, is the latest rebase build working again?

Yes, it is working.


Note You need to log in before you can comment on or make changes to this bug.