Red Hat Bugzilla – Full Text Bug Listing
|Summary:||New coreutils on rawhide appears to cause failures in package building|
|Product:||[Fedora] Fedora||Reporter:||Alex Lancaster <alexl>|
|Component:||coreutils||Assignee:||Ondrej Vasik <ovasik>|
|Status:||CLOSED RAWHIDE||QA Contact:||Fedora Extras Quality Assurance <extras-qa>|
|Version:||rawhide||CC:||bnocera, meyering, mtasaka, rjones, rpm, twaugh|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2008-06-09 08:26:58 EDT||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:|
Description Alex Lancaster 2008-06-04 04:48:46 EDT
The problem may be that something in the new coreutils causes problems with some commands such as touch which causes many rawhide build failures, here's just one: http://koji.fedoraproject.org/koji/getfile?taskID=644607&name=build.log See also the thread on fedora-devel-list http://www.redhat.com/archives/fedora-devel-list/2008-June/thread.html#00153 At least 5 or so other packages have been affected by this. (Also reported in fedora-infrastructure: https://fedorahosted.org/fedora-infrastructure/ticket/593 )
Comment 1 Alex Lancaster 2008-06-04 05:00:20 EDT
Comment 2 Ondrej Vasik 2008-06-04 05:52:36 EDT
As I said in devel list, problem likely caused by combination of old RHEL5 xen kernel in koji and gl_futimens() . I'm not able to reproduce it on my machine, still trying to get strace of the failure. It would be really helpful to have one. It seems to be some kind race condition - f-spot reported failing passed in scratch build with coreutils-6.12-1.fc10 without troubles ( http://koji.fedoraproject.org/koji/taskinfo?taskID=644691 ).
Comment 3 Alex Lancaster 2008-06-04 08:18:55 EDT
Another failing build, LabPlot: http://koji.fedoraproject.org/koji/getfile?taskID=645053&name=build.log Don't know how to get a trace on a "real" build on koji itself, probably needs koji admins.
Comment 4 Ondrej Vasik 2008-06-04 08:46:55 EDT
You could get strace easily by adding strace before the failing touch/cp command (and BuildRequires: strace). Strace output will be available in the build.log as is written on stderr. I tried to do that in scratch build but the issue didn't show in three scratch builds I tried so far.
Comment 5 Ondrej Vasik 2008-06-04 10:32:17 EDT
Created attachment 308346 [details] Strace of the failure So far most of the failures were on xenbuilder2 machine.
Comment 6 Jim Meyering 2008-06-04 11:05:22 EDT
hi, as I just replied on fedora-devel, This looks like the same problem reported in this thread: http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/13684 Eric Blake fixed that with this change to gnulib's utimens.c: http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=93f08406537 This is probably the result of configuring/building coreutils on a new kernel and running the resulting the binaries on an old kernel; but with the gnulib fix, the tools work around that particular abuse at run-time.
Comment 7 Ondrej Vasik 2008-06-04 11:23:10 EDT
Hi Jim, thanks for joining the bugzilla and suggesting fix but Eric Blake's patch is not fixing issue with koji build (I spotted that patch before I built coreutils-6.12-1.fc10) and therefore the patch is already applied in Fedora. As you could see in strace, koji xen kernel has utimensat() call, so there is no fallback necessary - but it seems that that call is broken/buggy and returns error code 280 instead of 0 and therefore gl_futimens() function in cp/touch/mv/install is failing.
Comment 8 Jim Meyering 2008-06-04 13:20:30 EDT
Ahh... thanks for that strace. And you're right that it looks like a kernel problem. I looked the kernel code in fs/utimes.c's do_utimes function. There are several places where the returned variable, "error", is set to non-literal values: error = __user_walk_fd(dfd, filename, (flags & AT_SYMLINK_NOFOLLOW) ? 0 : LOOKUP_FOLLOW, &nd); error = vfs_permission(&nd, MAY_WRITE); error = notify_change(dentry, &newattrs); gnulib *could* provide a utimensat wrapper that detects this bogus return value and maps it to 0, but the minimal-impact (configure-time run-test) approach would work only if configured/build on a losing system. The alternative is to make every system pay the price of the extra comparison. Ugly, but probably the only useful work-around. I'll attach a patch.
Comment 9 Jim Meyering 2008-06-04 13:23:01 EDT
Created attachment 308374 [details] possible work-around This makes every utimensat-using application incur the cost (albeit small) of detecting and working around the kernel bug. Not pretty, but maybe what we need. Untested.
Comment 10 Ondrej Vasik 2008-06-04 13:36:07 EDT
Thanks for the patch Jim, that is exactly what I meant by an easy workaround for Fedora on fedora-devel-list. Will use it at least until the kernel koji issue (#442352) will be addressed correctly.
Comment 11 Matthias Clasen 2008-06-04 19:14:45 EDT
Ondrej, coreutils-6.12-2.fc10 brought back the same build failures that we've had with 6.12-1. I've asked Jeremy to untag it, to allow builds to succeed.
Comment 12 Ondrej Vasik 2008-06-05 03:11:02 EDT
Matthias, but do you have at least strace of the failure with 6-12-2.fc10? There is now no build log of the new failure, no strace of the new failure... and without any tag I doubt I will be able to find out why it is still failing - as the exit code 280 from futimens call should be now changed to 0. I have no chance to reproduce it outside the koji - as the build pass correctly on rawhide and FC-6 kernel without troubles. Is there any koji dist-tag which would allow me to get strace with rawhide/dist-f10 packages with coreutils-6.12-2.fc10?
Comment 13 Ondrej Vasik 2008-06-06 14:59:57 EDT
Ok, got necessary informations from scratch build strace tests, coreutils-6.12-3.fc10 should fix the problem properly.
Comment 14 Ondrej Vasik 2008-06-06 15:09:15 EDT
Created attachment 308560 [details] Better workaround with testcase Old workaround patch is not correct as it is causing not preserved timestamps and is not covering all failure cases. This one contains testcase and provides fallback to other systemcall functions. Works ok so far...
Comment 15 Ondrej Vasik 2008-06-09 08:26:58 EDT
Seems to work properly for a few days, closing RAWHIDE