Bug 449910 - New coreutils on rawhide appears to cause failures in package building
Summary: New coreutils on rawhide appears to cause failures in package building
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: coreutils
Version: rawhide
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Ondrej Vasik
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 449653
TreeView+ depends on / blocked
 
Reported: 2008-06-04 08:48 UTC by Alex Lancaster
Modified: 2008-06-09 12:26 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-06-09 12:26:58 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Strace of the failure (1.39 KB, application/octet-stream)
2008-06-04 14:32 UTC, Ondrej Vasik
no flags Details
possible work-around (459 bytes, patch)
2008-06-04 17:23 UTC, Jim Meyering
no flags Details | Diff
Better workaround with testcase (2.92 KB, patch)
2008-06-06 19:09 UTC, Ondrej Vasik
no flags Details | Diff

Description Alex Lancaster 2008-06-04 08:48:46 UTC
The problem may be that something in the new coreutils causes problems with some
commands such as touch which causes many rawhide build failures, here's just one:

http://koji.fedoraproject.org/koji/getfile?taskID=644607&name=build.log

See also the thread on fedora-devel-list

http://www.redhat.com/archives/fedora-devel-list/2008-June/thread.html#00153 

At least 5 or so other packages have been affected by this.

(Also reported in fedora-infrastructure:

https://fedorahosted.org/fedora-infrastructure/ticket/593 )

Comment 1 Alex Lancaster 2008-06-04 09:00:20 UTC
Version: coreutils-6.12-1.fc10

Comment 2 Ondrej Vasik 2008-06-04 09:52:36 UTC
As I said in devel list, problem likely caused by combination of old RHEL5 xen
kernel in koji and gl_futimens() . I'm not able to reproduce it on my machine,
still trying to get strace of the failure. It would be really helpful to have
one. It seems to be some kind race condition - f-spot reported failing passed in
scratch build with coreutils-6.12-1.fc10 without troubles (
http://koji.fedoraproject.org/koji/taskinfo?taskID=644691 ).

Comment 3 Alex Lancaster 2008-06-04 12:18:55 UTC
Another failing build, LabPlot:

http://koji.fedoraproject.org/koji/getfile?taskID=645053&name=build.log

Don't know how to get a trace on a "real" build on koji itself, probably needs
koji admins.

Comment 4 Ondrej Vasik 2008-06-04 12:46:55 UTC
You could get strace easily by adding strace before the failing touch/cp command
(and BuildRequires: strace). Strace output will be available in the build.log as
is written on stderr. I tried to do that in scratch build but the issue didn't
show in three scratch builds I tried so far.

Comment 5 Ondrej Vasik 2008-06-04 14:32:17 UTC
Created attachment 308346 [details]
Strace of the failure

So far most of the failures were on xenbuilder2 machine.

Comment 6 Jim Meyering 2008-06-04 15:05:22 UTC
hi, as I just replied on fedora-devel,
This looks like the same problem reported in this thread:

  http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/13684

Eric Blake fixed that with this change to gnulib's utimens.c:

  http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=93f08406537

This is probably the result of configuring/building coreutils on a new kernel
and running the resulting the binaries on an old kernel; but with the gnulib
fix, the tools work around that particular abuse at run-time.



Comment 7 Ondrej Vasik 2008-06-04 15:23:10 UTC
Hi Jim, thanks for joining the bugzilla and suggesting fix but Eric Blake's
patch is not fixing issue with koji build (I spotted that patch before I built
coreutils-6.12-1.fc10) and therefore the patch is already applied in Fedora. As
you could see in strace, koji xen kernel has utimensat() call, so there is no
fallback necessary - but it seems that that call is broken/buggy and returns
error code 280 instead of 0 and therefore gl_futimens() function in
cp/touch/mv/install is failing.

Comment 8 Jim Meyering 2008-06-04 17:20:30 UTC
Ahh... thanks for that strace.  And you're right that it looks like a kernel
problem.  I looked the kernel code in fs/utimes.c's do_utimes function. There
are several places where the returned variable, "error", is set to non-literal
values:

        error = __user_walk_fd(dfd, filename, (flags & AT_SYMLINK_NOFOLLOW) ? 0
: LOOKUP_FOLLOW, &nd);
        error = vfs_permission(&nd, MAY_WRITE);
        error = notify_change(dentry, &newattrs);

gnulib *could* provide a utimensat wrapper that detects this bogus return value
and maps it to 0, but the minimal-impact (configure-time run-test) approach
would work only if configured/build on a losing system.  The alternative is to
make every system pay the price of the extra comparison.  Ugly, but probably the
only useful work-around.  I'll attach a patch.

Comment 9 Jim Meyering 2008-06-04 17:23:01 UTC
Created attachment 308374 [details]
possible work-around

This makes every utimensat-using application incur the cost (albeit small) of
detecting and working around the kernel bug.  Not pretty, but maybe what we
need.  Untested.

Comment 10 Ondrej Vasik 2008-06-04 17:36:07 UTC
Thanks for the patch Jim, that is exactly what I meant by an easy workaround for
Fedora on fedora-devel-list. Will use it at least until the kernel koji issue
(#442352) will be addressed correctly.

Comment 11 Matthias Clasen 2008-06-04 23:14:45 UTC
Ondrej, coreutils-6.12-2.fc10 brought back the same build failures that we've
had with 6.12-1. I've asked Jeremy to untag it, to allow builds to succeed.

Comment 12 Ondrej Vasik 2008-06-05 07:11:02 UTC
Matthias, but do you have at least strace of the failure with 6-12-2.fc10? There
is now no build log of the new failure, no strace of the new failure... and
without any tag I doubt I will be able to find out why it is still failing - as
the exit code 280 from futimens call should be now changed to 0. I have no
chance to reproduce it outside the koji - as the build pass correctly on rawhide
and FC-6 kernel without troubles. Is there any koji dist-tag which would allow
me to get strace with rawhide/dist-f10 packages with coreutils-6.12-2.fc10?

Comment 13 Ondrej Vasik 2008-06-06 18:59:57 UTC
Ok, got necessary informations from scratch build strace tests,
coreutils-6.12-3.fc10 should fix the problem properly.

Comment 14 Ondrej Vasik 2008-06-06 19:09:15 UTC
Created attachment 308560 [details]
Better workaround with testcase

Old workaround patch is not correct as it is causing not preserved timestamps
and is not covering all failure cases. This one contains testcase and provides
fallback to other systemcall functions. Works ok so far...

Comment 15 Ondrej Vasik 2008-06-09 12:26:58 UTC
Seems to work properly for a few days, closing RAWHIDE


Note You need to log in before you can comment on or make changes to this bug.