Bug 1317843
| Summary: | `virt-builder --update` fails with: "dnf -y --best upgrade: command exited with an error" | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Kashyap Chamarthy <kchamart> | ||||
| Component: | dnf | Assignee: | Packaging Maintenance Team <packaging-team-maint> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 23 | CC: | awilliam, jsilhan, mluscon, packaging-team-maint, pnemade, ptoscano, rbalakri, rjones, vmukhame | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-03-29 22:13:28 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
(In reply to Kashyap Chamarthy from comment #0) [...] > Expected results > ---------------- > > `virt-builder` fails to successfully update a Fedora-23 machine. Copy+paste error above :-) Of course I meant: `virt-builder` *successfully* creates and updates a Fedora-23 VM If I remove "--update" flag, the command succeeds: $ virt-builder fedora-23 -o devstack1.qcow2 --format qcow2 --root-password password:fedora --selinux-relabel --size 100G [ 1.2] Downloading: http://libguestfs.org/download/builder/fedora-23.xz [ 2.0] Planning how to build this image [ 2.0] Uncompressing [ 5.5] Resizing (using virt-resize) to expand the disk to 100.0G [ 34.9] Opening the new disk [ 46.1] Setting a random seed [ 46.1] Setting passwords [ 47.2] SELinux relabelling [ 47.3] Finishing off Output file: devstack1.qcow2 Output size: 100.0G Output format: qcow2 Total usable space: 99.4G Free space: 98.6G (99%) *** Bug 1320754 has been marked as a duplicate of this bug. *** This is brute-force and slow, but it does find the true cause of the
error (eventually):
$ LIBGUESTFS_MEMSIZE=4096 LIBGUESTFS_BACKEND=direct \
virt-builder fedora-23 --arch x86_64 \
-o disk_f23_minimal_x86_64.img --size 20G \
--no-delete-on-failure \
--install strace \
--run-command "strace -o /tmp/log -f -s 1500 dnf -y -v --best update"
$ virt-cat disk_f23_minimal_x86_64.img /tmp/log | less
I believe it is caused by dnf not setting up a SIGPIPE handler,
and therefore dying (in librpm, but that's a coincidence) after
a SIGPIPE signal is received.
The log file is absolutely massive, so I'll upload it to my website
shortly.
Thanks a lot for looking into it, Richard! http://oirase.annexia.org/tmp/bz1317843/ I've just uploaded the last 100000 lines of the log file now. It's still compressing the whole log file, and I'll upload that when done. The log is 648MB. Try not to kill my webserver :-/ sha256 of the compressed log: 5a7a96b944d14cb55c42627dce16c1591f737c5fd176606fd472e451ed004e40 /tmp/log.xz I tried to capture a stack trace from gdb on exit. It wasn't completely successful, but it does still point to something to do with SIGPIPE. ------------------- [...] Cleanup : glibc-2.22-3.fc23.x86_64 283/285Detaching after fork from child process 1193. Cleanup : tzdata-2015g-1.fc23.noarch 284/285 Cleanup : libgcc-5.1.1-4.fc23.x86_64 285/285Detaching after fork from child process 1194. Detaching after fork from child process 1195. Detaching after fork from child process 1196. Detaching after fork from child process 1199. 1039 blocks Detaching after fork from child process 1588. No '/dev/log' or 'logger' included for syslog logging No '/dev/log' or 'logger' included for syslog logging Detaching after fork from child process 26818. Detaching after fork from child process 26822. Detaching after fork from child process 26823. Detaching after fork from child process 26825. Detaching after fork from child process 26826. Detaching after fork from child process 1040. Program received signal SIGPIPE, Broken pipe. 0x00007ffff6d56d5d in write () from /lib64/libc.so.6 Thread 1 (Thread 0x7ffff7fee700 (LWP 628)): #0 0x00007ffff6d56d5d in write () from /lib64/libc.so.6 No symbol table info available. #1 0x0000000000000000 in ?? () No symbol table info available. A debugging session is active. Inferior 1 [process 628] will be killed. Quit anyway? (y or n) [answered Y; input not from terminal] ------------------- The reproducer command you can try for yourself is this (all on a single line): LIBGUESTFS_MEMSIZE=4096 LIBGUESTFS_BACKEND=direct virt-builder fedora-23 --arch x86_64 -o disk_f23_minimal_x86_64.img --size 20G --no-delete-on-failure --install gdb --run-command 'gdb --batch -ex "break _exit" -ex "handle SIGPIPE pass" -ex "run" -ex "t a a bt full" -ex "quit" --args python3 /usr/bin/dnf -y -v --best update; exit 1' (Refer to the log.tail.xz file in http://oirase.annexia.org/tmp/bz1317843/ ) The failure happens in librpm's runExtScript() function. The dnf parent process writes a script called /var/tmp/rpm-tmp.PDQj6U. This script contains: ---------------------- /usr/bin/mandb -q # update cache ---------------------- Then dnf creates a pipe to write to this script. It then forks and execs the script. The shell is PID 1004. mandb is PID 1005. Into the pipe (connected to stdin of the above script) dnf writes: /usr/share/man/aa /usr/share/man/aa/man0p /usr/share/man/aa/man1 /usr/share/man/aa/man1p /usr/share/man/aa/man1x ... many many more lines of the same thing However as far as I can tell mandb never reads its stdin at all. Also mandb does NOT exit with any error. It appears that mandb is not expecting any input (also the mandb man page indicates that it is not expecting to read from stdin). Eventually when control is returned to dnf's write() syscall, it returns EPIPE and dnf gets SIGPIPE and dnf then crashes. OK I see. In the latest librpm, this commit was added:
commit 90d8cc16486479441477e89c2e09bd4f9f7604bb
Author: Lubos Kardos <lkardos>
Date: Fri Oct 30 14:42:32 2015 +0100
Ignore SIGPIPE signals during execucton of scriptlets (rhbz:1264198)
It looks as if this is a duplicate of bug 1264198.
*** This bug has been marked as a duplicate of bug 1264198 ***
There is an updated virt-builder fedora-23 image which fixes this problem. |
Created attachment 1136536 [details] Not truncated debugging plus tracing output (virt-builder -v -x [...]) Description of problem ----------------------- Running this: $ virt-builder fedora-23 -o devstack1.qcow2 --format qcow2 \ --root-password password:123456 --update --selinux-relabel \ --size 80G results in (not truncated verbose plus tracing output is attached): [...] No '/dev/log' or 'logger' included for syslog logging No '/dev/log' or 'logger' included for syslog logging virt-builder: error: dnf -y --best upgrade: command exited with an error If reporting bugs, run virt-builder with debugging enabled and include the complete output: virt-builder -v -x [...] However, the above command returns _success_ $ echo $? 0 Version ------- libguestfs-tools-c-1.32.3-2.fc23.x86_64 qemu-system-x86-2.4.1-7.fc23.x86_64 libvirt-daemon-kvm-1.2.18.2-2.fc23.x86_64 How reproducible: Consistently. Steps to Reproduce ------------------ $ virt-builder fedora-23 -o devstack1.qcow2 --format qcow2 \ --root-password password:123456 --update --selinux-relabel \ --size 80G Actual results -------------- `virt-builder` fails to successfully build and update a Fedora-23 machine. (Also, refer the attached complete debugging output.) Expected results ---------------- `virt-builder` fails to successfully update a Fedora-23 machine.