Bug 2303267

Summary: Rare hangs while starting the appliance, at 'echo noop' into /sys/block/{h,s,ub,v}d*/queue/scheduler
Product: [Fedora] Fedora Reporter: Richard W.M. Jones <rjones>
Component: libguestfsAssignee: Richard W.M. Jones <rjones>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rawhideCC: abologna, berrange, clalancette, crobinso, jforbes, jiyin, laine, libvirt-maint, rjones, suraj.ghimire7, virt-maint
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
URL: https://kojipkgs.fedoraproject.org//work/tasks/2119/121562119/build.log
Whiteboard:
Fixed In Version: libguestfs-1.53.5-4.fc42 libguestfs-1.53.5-4.fc41 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-08-16 09:49:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
guests-all-good.xml none

Description Richard W.M. Jones 2024-08-06 20:38:00 UTC
guestfs-tools runs a test which parses an XML file using the libvirt test driver.  Recently this has started to fail in Rawhide with these errors:

I/O warning : failed to load external entity "/builddir/build/BUILD/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/test-data/phony-guests/guests-all-good.xml"
libvirt: Test Driver error : XML error: failed to parse xml document '/builddir/build/BUILD/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/test-data/phony-guests/guests-all-good.xml'
virt-alignment-scan: could not connect to libvirt (code 27, domain 12): XML error: failed to parse xml document '/builddir/build/BUILD/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/test-data/phony-guests/guests-all-good.xml'

The errors only seem to happen in Koji (where there is no network connection) and seem to happen after a very long time out (perhaps 60 minutes).

This is reproducible by building guestfs-tools in Koji.

Reproducible: Always

Steps to Reproduce:
1. Build guestfs-tools in Fedora Rawhide.

Comment 1 Richard W.M. Jones 2024-08-06 20:39:52 UTC
Created attachment 2043603 [details]
guests-all-good.xml

This is the XML file being parsed.

The exact command being run is:

$ virt-alignment-scan -c test://$PWD/test-data/phony-guests/guests-all-good.xml

However when run locally (presumably because with a network connection) it does not fail.

Comment 2 Richard W.M. Jones 2024-08-06 20:41:38 UTC
libvirt 10.5.0-2.fc41
libxml2 2.12.8-2.fc41

Other versions as here: https://kojipkgs.fedoraproject.org//work/tasks/2119/121562119/root.log

Comment 3 Andrea Bolognani 2024-08-07 13:20:36 UTC
The network connection bit is probably a red herring. I've just tried
building the package using "fedpkg local" inside a Rawhide container
and it also failed, with many other errors being reported in addition
to this specific one.

From inside the build directory, I can run the command manually and
get the same error:

  # ./align/virt-alignment-scan -c test://$PWD/test-data/phony-guests/guests-all-good.xml
  I/O warning : failed to load external entity "/root/guestfs-tools/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/test-data/phony-guests/guests-all-good.xml"
  libvirt: Test Driver error : XML error: failed to parse xml document '/root/guestfs-tools/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/test-data/phony-guests/guests-all-good.xml'
  ./align/virt-alignment-scan: could not connect to libvirt (code 27, domain 12): XML error: failed to parse xml document '/root/guestfs-tools/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/test-data/phony-guests/guests-all-good.xml'

virsh is similarly unhappy when asked to connect to that test:// URI.
The reason is ultimately very straightforward:

  # ls $PWD/test-data/phony-guests/guests-all-good.xml
  ls: cannot access '/root/guestfs-tools/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/test-data/phony-guests/guests-all-good.xml': No such file or directory

Looking at the build log[1] we can find this:

  /builddir/build/BUILD/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/run: command timed out after 4h
  make[3]: *** [Makefile:992: windows.img] Error 124
  make[3]: Leaving directory '/builddir/build/BUILD/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/test-data/phony-guests'
  make[3]: Target 'guests-all-good.xml' not remade because of errors.

So I don't think there's anything wrong in libvirt specifically, it's
just that it's been asked to load a non-existing file and it can't
really do anything sensible when faced with that request.

You could argue that "no such file or directory" would be a much
better error message than "failed to load external entity", and I
would tend to agree. Unfortunately it looks like that error message
is coming from libxml2, not libvirt, so I don't think there's much we
can do to there either.


[1] https://kojipkgs.fedoraproject.org//work/tasks/2119/121562119/build.log

Comment 4 Daniel Berrangé 2024-08-07 13:30:25 UTC
> You could argue that "no such file or directory" would be a much
> better error message than "failed to load external entity", and I
> would tend to agree. Unfortunately it looks like that error message
> is coming from libxml2, not libvirt, so I don't think there's much we
> can do to there either.

I presume we're in the virXMLParseHelper method at this stage where we do


    if (filename) {
        xml = xmlCtxtReadFile(pctxt, filename, NULL, parseFlags);
    } else {
        xml = xmlCtxtReadDoc(pctxt, BAD_CAST xmlStr, url, NULL, parseFlags);
    }

If so, we could take libxml out of the equation here by calling 'virFileReadAll(filename)' where we have useful error reporting and then we only ever have to use  xmlCtxtReadDoc() in libxml

Comment 5 Richard W.M. Jones 2024-08-07 13:33:38 UTC
Thanks for the analysis.  Now I see that the error is in fact caused by
an earlier build failure:

  /builddir/build/BUILD/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/run: command timed out after 4h
  make[3]: *** [Makefile:992: windows.img] Error 124
  make[3]: Leaving directory '/builddir/build/BUILD/guestfs-tools-1.53.2-build/guestfs-tools-1.53.2/test-data/phony-guests'
  make[3]: Target 'guests-all-good.xml' not remade because of errors.

preventing the guests-all-good.xml file from being generated.

Comment 6 Richard W.M. Jones 2024-08-12 14:50:34 UTC
Finally I managed to reproduce this using:

$ LIBGUESTFS_BACKEND_SETTINGS=force_tcg fedpkg mockbuild

The hang happens in appliance/init when we are altering the
hard disk schedulers to be noop, and just before we set up
the network, in this code:

for f in /sys/block/{h,s,ub,v}d*/queue/scheduler; do echo noop > $f; done  <-- hangs here
shopt -u nullglob

# Set up the network.
ip addr add 127.0.0.1/8 brd + dev lo scope host

Comment 7 Richard W.M. Jones 2024-08-16 07:36:06 UTC
This is actually a new kernel bug, but instead of trying to track it down, I
removed the lines from libguestfs appliance/init.  Patch coming up ...

Comment 8 Richard W.M. Jones 2024-08-16 07:48:38 UTC
Upstream bug report:
https://bugzilla.kernel.org/show_bug.cgi?id=219166

Comment 10 Fedora Update System 2024-08-16 09:20:57 UTC
FEDORA-2024-40b4f44bd1 (libguestfs-1.53.5-4.fc42) has been submitted as an update to Fedora 42.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-40b4f44bd1

Comment 11 Fedora Update System 2024-08-16 09:44:13 UTC
FEDORA-2024-5e40799695 (libguestfs-1.53.5-4.fc41) has been submitted as an update to Fedora 41.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-5e40799695

Comment 12 Fedora Update System 2024-08-16 09:49:24 UTC
FEDORA-2024-40b4f44bd1 (libguestfs-1.53.5-4.fc42) has been pushed to the Fedora 42 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 13 Fedora Update System 2024-08-16 09:49:29 UTC
FEDORA-2024-5e40799695 (libguestfs-1.53.5-4.fc41) has been pushed to the Fedora 41 stable repository.
If problem still persists, please make note of it in this bug report.