Bug 2241293
| Summary: | virt-df hangs using 100% of one core | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Community] Virtualization Tools | Reporter: | Martin J. <martinfjohansen> | ||||||||||||||||
| Component: | libguestfs | Assignee: | Richard W.M. Jones <rjones> | ||||||||||||||||
| Status: | CLOSED NOTABUG | QA Contact: | |||||||||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||||||||
| Priority: | unspecified | ||||||||||||||||||
| Version: | unspecified | CC: | mhicks, ptoscano | ||||||||||||||||
| Target Milestone: | --- | ||||||||||||||||||
| Target Release: | --- | ||||||||||||||||||
| Hardware: | x86_64 | ||||||||||||||||||
| OS: | Linux | ||||||||||||||||||
| Whiteboard: | |||||||||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||
| Last Closed: | 2023-10-12 19:23:33 UTC | Type: | Bug | ||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
| Embargoed: | |||||||||||||||||||
| Attachments: |
|
||||||||||||||||||
|
Description
Martin J.
2023-09-29 07:10:04 UTC
Can you run 'libguestfs-test-tool' and attach the complete, unedited output. Created attachment 1991048 [details]
libguestfs-test-tool output
OK now run: LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1 virt-df [etc] (for whatever virt-df command which hangs) and attach that complete output. Wer are investigating the exact command being run and will come back with this information as soon as we can. The exact command being run is: "virt-df --csv" I have attached the output of stdout and stderr of running "LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1 virt-df --csv". The run is a non-failing run. I am trying to reproduce a failing run, but it happens seldom. Created attachment 1991988 [details]
err.log of success-run
Created attachment 1991989 [details]
out.log of success-run
We finally managed to get a failing run reproduced. see err5.log for stderr, out5.log for stdout. This is the command we ran: LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1 virt-df --csv 2> err5.log > out5.log It hung on line 47993 "libguestfs: command: run: \ -rf /tmp/libguestfsfFUBad" untill i killed it with sudo kill. It hung on 100% CPU usage for a long time untill we killed it. Created attachment 1993586 [details]
err5.log - stderr of failing run
Created attachment 1993587 [details]
out5.log -- stdout of failing run
There's something up with qemu and/or the kernel which causes intermittent hangs, so this isn't really a virt-df or libguestfs issue. I would try something like this: while guestfish -vx -a /dev/null run >& /tmp/log1 ; do echo -n . ; done and leave that to run for a long time. If it hangs, examine /tmp/log1. If it appears to run forever then try running several of those loops in parallel (with differently named log files). Another tool to look at is: https://people.redhat.com/~rjones/qemu-sanity-check/ > There's something up with qemu and/or the kernel which causes intermittent hangs, so this isn't really a virt-df or libguestfs issue. Thanks a lot for investigating this for us. A quick follow up, what in the logs points towards your conclusion? > I would try something like this: Thanks! We will try! > libguestfs: error: guestfs_launch failed, see earlier error messages However because all of the messages from all the appliances are mixed together, it's hard to see exactly which appliance failed to start or why. Therefore having separated logs as suggested in comment 11 will help to diagnose exactly where the kernel boot is hanging. BTW if this is TCG then you may be hitting the infamous https://rwmj.wordpress.com/2023/06/18/follow-up-to-i-booted-linux-292612-times/ (However that only affects TCG, and is fixed in recent kernels) I ran the command you suggested several thousand times, and it hit the hanging bug. guestfish -vx -a /dev/null run >& /tmp/log2099 I attached a successfull run and a failing/hanging run. Might this be the infamous bug you are refering to? If so, could a kernel update fix it? Created attachment 1993649 [details]
failing run on guestfish
Created attachment 1993650 [details]
successfull run of guestfish
It's not the infamous bug because it seems this is not TCG. From the good/bad outputs I would guess that the next line should be: [ 0.169181] smpboot: CPU0: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz (family: 0x6, model: 0x4f, stepping: 0x1) This is classically either a kernel or qemu bug. As these are quite old versions of both (especially qemu) I'd suggest upgrading them. The latest versions are very stable. Else open a bug against Ubuntu LTS, since qemu should always be able to boot the kernel reliably. For RHEL we have started to use qemu-sanity-check to verify this. Closing as this is not a bug in libguestfs itself. |