Bug 2303267 - Rare hangs while starting the appliance, at 'echo noop' into /sys/block/{h,s,ub,v}d*/queue/scheduler
Summary: Rare hangs while starting the appliance, at 'echo noop' into /sys/block/{h,s,...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: libguestfs
Version: rawhide
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Richard W.M. Jones
QA Contact: Fedora Extras Quality Assurance
URL: https://kojipkgs.fedoraproject.org//w...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-08-06 20:38 UTC by Richard W.M. Jones
Modified: 2024-08-16 09:49 UTC (History)
11 users (show)

Fixed In Version: libguestfs-1.53.5-4.fc42 libguestfs-1.53.5-4.fc41
Clone Of:
Environment:
Last Closed: 2024-08-16 09:49:24 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
guests-all-good.xml (7.01 KB, text/plain)
2024-08-06 20:39 UTC, Richard W.M. Jones
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Linux Kernel 219166 0 P3 NEW ext4 hang when setting echo noop > /sys/block/sda/queue/scheduler 2024-08-16 07:48:38 UTC

Description Richard W.M. Jones 2024-08-06 20:38:00 UTC
guestfs-tools runs a test which parses an XML file using the libvirt test driver.  Recently this has started to fail in Rawhide with these errors:

I/O warning : failed to load external entity "/builddir/build/BUILD/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/test-data/phony-guests/guests-all-good.xml"
libvirt: Test Driver error : XML error: failed to parse xml document '/builddir/build/BUILD/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/test-data/phony-guests/guests-all-good.xml'
virt-alignment-scan: could not connect to libvirt (code 27, domain 12): XML error: failed to parse xml document '/builddir/build/BUILD/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/test-data/phony-guests/guests-all-good.xml'

The errors only seem to happen in Koji (where there is no network connection) and seem to happen after a very long time out (perhaps 60 minutes).

This is reproducible by building guestfs-tools in Koji.

Reproducible: Always

Steps to Reproduce:
1. Build guestfs-tools in Fedora Rawhide.

Comment 1 Richard W.M. Jones 2024-08-06 20:39:52 UTC
Created attachment 2043603 [details]
guests-all-good.xml

This is the XML file being parsed.

The exact command being run is:

$ virt-alignment-scan -c test://$PWD/test-data/phony-guests/guests-all-good.xml

However when run locally (presumably because with a network connection) it does not fail.

Comment 2 Richard W.M. Jones 2024-08-06 20:41:38 UTC
libvirt 10.5.0-2.fc41
libxml2 2.12.8-2.fc41

Other versions as here: https://kojipkgs.fedoraproject.org//work/tasks/2119/121562119/root.log

Comment 3 Andrea Bolognani 2024-08-07 13:20:36 UTC
The network connection bit is probably a red herring. I've just tried
building the package using "fedpkg local" inside a Rawhide container
and it also failed, with many other errors being reported in addition
to this specific one.

From inside the build directory, I can run the command manually and
get the same error:

  # ./align/virt-alignment-scan -c test://$PWD/test-data/phony-guests/guests-all-good.xml
  I/O warning : failed to load external entity "/root/guestfs-tools/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/test-data/phony-guests/guests-all-good.xml"
  libvirt: Test Driver error : XML error: failed to parse xml document '/root/guestfs-tools/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/test-data/phony-guests/guests-all-good.xml'
  ./align/virt-alignment-scan: could not connect to libvirt (code 27, domain 12): XML error: failed to parse xml document '/root/guestfs-tools/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/test-data/phony-guests/guests-all-good.xml'

virsh is similarly unhappy when asked to connect to that test:// URI.
The reason is ultimately very straightforward:

  # ls $PWD/test-data/phony-guests/guests-all-good.xml
  ls: cannot access '/root/guestfs-tools/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/test-data/phony-guests/guests-all-good.xml': No such file or directory

Looking at the build log[1] we can find this:

  /builddir/build/BUILD/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/run: command timed out after 4h
  make[3]: *** [Makefile:992: windows.img] Error 124
  make[3]: Leaving directory '/builddir/build/BUILD/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/test-data/phony-guests'
  make[3]: Target 'guests-all-good.xml' not remade because of errors.

So I don't think there's anything wrong in libvirt specifically, it's
just that it's been asked to load a non-existing file and it can't
really do anything sensible when faced with that request.

You could argue that "no such file or directory" would be a much
better error message than "failed to load external entity", and I
would tend to agree. Unfortunately it looks like that error message
is coming from libxml2, not libvirt, so I don't think there's much we
can do to there either.


[1] https://kojipkgs.fedoraproject.org//work/tasks/2119/121562119/build.log

Comment 4 Daniel Berrangé 2024-08-07 13:30:25 UTC
> You could argue that "no such file or directory" would be a much
> better error message than "failed to load external entity", and I
> would tend to agree. Unfortunately it looks like that error message
> is coming from libxml2, not libvirt, so I don't think there's much we
> can do to there either.

I presume we're in the virXMLParseHelper method at this stage where we do


    if (filename) {
        xml = xmlCtxtReadFile(pctxt, filename, NULL, parseFlags);
    } else {
        xml = xmlCtxtReadDoc(pctxt, BAD_CAST xmlStr, url, NULL, parseFlags);
    }

If so, we could take libxml out of the equation here by calling 'virFileReadAll(filename)' where we have useful error reporting and then we only ever have to use  xmlCtxtReadDoc() in libxml

Comment 5 Richard W.M. Jones 2024-08-07 13:33:38 UTC
Thanks for the analysis.  Now I see that the error is in fact caused by
an earlier build failure:

  /builddir/build/BUILD/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/run: command timed out after 4h
  make[3]: *** [Makefile:992: windows.img] Error 124
  make[3]: Leaving directory '/builddir/build/BUILD/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/test-data/phony-guests'
  make[3]: Target 'guests-all-good.xml' not remade because of errors.

preventing the guests-all-good.xml file from being generated.

Comment 6 Richard W.M. Jones 2024-08-12 14:50:34 UTC
Finally I managed to reproduce this using:

$ LIBGUESTFS_BACKEND_SETTINGS=force_tcg fedpkg mockbuild

The hang happens in appliance/init when we are altering the
hard disk schedulers to be noop, and just before we set up
the network, in this code:

for f in /sys/block/{h,s,ub,v}d*/queue/scheduler; do echo noop > $f; done  <-- hangs here
shopt -u nullglob

# Set up the network.
ip addr add 127.0.0.1/8 brd + dev lo scope host

Comment 7 Richard W.M. Jones 2024-08-16 07:36:06 UTC
This is actually a new kernel bug, but instead of trying to track it down, I
removed the lines from libguestfs appliance/init.  Patch coming up ...

Comment 8 Richard W.M. Jones 2024-08-16 07:48:38 UTC
Upstream bug report:
https://bugzilla.kernel.org/show_bug.cgi?id=219166

Comment 10 Fedora Update System 2024-08-16 09:20:57 UTC
FEDORA-2024-40b4f44bd1 (libguestfs-1.53.5-4.fc42) has been submitted as an update to Fedora 42.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-40b4f44bd1

Comment 11 Fedora Update System 2024-08-16 09:44:13 UTC
FEDORA-2024-5e40799695 (libguestfs-1.53.5-4.fc41) has been submitted as an update to Fedora 41.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-5e40799695

Comment 12 Fedora Update System 2024-08-16 09:49:24 UTC
FEDORA-2024-40b4f44bd1 (libguestfs-1.53.5-4.fc42) has been pushed to the Fedora 42 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 13 Fedora Update System 2024-08-16 09:49:29 UTC
FEDORA-2024-5e40799695 (libguestfs-1.53.5-4.fc41) has been pushed to the Fedora 41 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.