Bug 475876
Summary: | CMake 2.6.2 dies on PPC64 builders (std::out_of_range) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Lorenzo Villani <lorenzo> | ||||||
Component: | cmake | Assignee: | Orion Poplawski <orion> | ||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | rawhide | CC: | kevin, orion, pertusus, rdieter | ||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||
Target Release: | --- | ||||||||
Hardware: | ppc64 | ||||||||
OS: | Linux | ||||||||
URL: | http://public.kitware.com/Bug/view.php?id=8704 | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | 2.6.3-2.fc11 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2009-03-10 14:40:44 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 238953 | ||||||||
Attachments: |
|
Description
Lorenzo Villani
2008-12-10 21:25:50 UTC
I can't reproduce this on the only ppc/ppc64 machine I have access to, so I'm pretty much stuck. I'm building 2.6.3 RC-5 for rawhide, so you might try that once it completes and has been added to the repo. No idea if it will help, but worth a shot. I'm giving it a try tomorrow There's a snag in my cmake build (Bug 475887). We'll have to see what Enrico has to say about that. > We can't of course try again and again until it builds. :-)
Oh we can. ;-)
But of course it's not a good solution.
Okay, 2.6.3 RC-5 has been built and it looks like the newRepo task has finished. (In reply to comment #5) > Okay, 2.6.3 RC-5 has been built and it looks like the newRepo task has > finished. New task submitted by Kevin: http://koji.fedoraproject.org/koji/taskinfo?taskID=994835 It seems that the mockroot uses the 2.6.3 RC-5 version of cmake but the problem is still there. I've been unable to reproduce myself with various debug methods enabled. I think it would be helpful to add: LD_PRELOAD=libSegFault.so SEGFAULT_SIGNALS=abrt to that start of the %{cmake.. line to try to catch future failures and get a stack trace. FYI, as with the latest cmake/kdepimlibs combination it reproducibly (on Koji) crashes in the same file (according to the debugging output), I tried doing some debugging with messages: http://cvs.fedoraproject.org/viewvc/rpms/kdepimlibs/devel/kdepimlibs-4.1.85-debug-cmake-crash.patch?revision=1.1&view=markup and poof, there goes the bug. This is a highly elusive Heisenbug. I suspect an uninitialized variable somewhere. Have you tried running it through valgrind on ppc64? The Heisenbug disappeared somehow, I'm closing this bug report. It will be re-opened if necessary. It disappeared because we were trying to debug it (that's why it's a Heisenbug ;-) ), we just kept the debug patch there so it keeps building. Created attachment 328872 [details]
build.log on ppc64
The build log on ppc64
Link to job in koji: http://koji.fedoraproject.org/koji/taskinfo?taskID=1049740 André Wöbbeking e-mailed me this backtrace from: LD_PRELOAD=libSegFault.so SEGFAULT_SIGNALS=abrt (on x86_64): cmake(_ZN25cmIncludeDirectoryCommand12AddDirectoryEPKcbb+0x2ca)[0x4ee7ea] cmake(_ZN25cmIncludeDirectoryCommand11InitialPassERKSt6vectorISsSaISsEER17cmExecutionStatus+0xaa)[0x4fadfa] cmake(_ZN9cmCommand17InvokeInitialPassERKSt6vectorI18cmListFileArgumentSaIS1_EER17cmExecutionStatus+0x4e)[0x527fbe] cmake(_ZN10cmMakefile14ExecuteCommandERK18cmListFileFunctionR17cmExecutionStatus+0x2ec)[0x48c27c] cmake(_ZN10cmMakefile12ReadListFileEPKcS1_PSs+0x49d)[0x492e4d] cmake(_ZN16cmLocalGenerator9ConfigureEv+0xac)[0x59ef9c] cmake(_ZN29cmLocalUnixMakefileGenerator39ConfigureEv+0x87)[0x5a2747] cmake(_ZN10cmMakefile21ConfigureSubDirectoryEP16cmLocalGenerator+0xc6)[0x4956c6] cmake(_ZN10cmMakefile15AddSubDirectoryEPKcS1_bbb+0x1ca)[0x49596a] cmake(_ZN24cmAddSubDirectoryCommand11InitialPassERKSt6vectorISsSaISsEER17cmExecutionStatus+0x2d8)[0x506798] cmake(_ZN9cmCommand17InvokeInitialPassERKSt6vectorI18cmListFileArgumentSaIS1_EER17cmExecutionStatus+0x4e)[0x527fbe] cmake(_ZN10cmMakefile14ExecuteCommandERK18cmListFileFunctionR17cmExecutionStatus+0x2ec)[0x48c27c] cmake(_ZN10cmMakefile12ReadListFileEPKcS1_PSs+0x49d)[0x492e4d] cmake(_ZN16cmLocalGenerator9ConfigureEv+0xac)[0x59ef9c] cmake(_ZN29cmLocalUnixMakefileGenerator39ConfigureEv+0x87)[0x5a2747] cmake(_ZN17cmGlobalGenerator9ConfigureEv+0x2e5)[0x5758e5] cmake(_ZN5cmake15ActualConfigureEv+0xc3)[0x4d82b3] cmake(_ZN5cmake9ConfigureEv+0x44)[0x4d87c4] cmake(_ZN5cmake3RunERKSt6vectorISsSaISsEEb+0x17e)[0x4dfede] cmake(_Z8do_cmakeiPPc+0xcb8)[0x46b688] cmake(main+0x2c)[0x46c2cc] (He mailed me this on February 25, sorry for not posting it sooner, I was busy with other stuff.) The function at the top of the backtrace has this code: // remove any leading or trailing spaces and \r pos = ret.size()-1; while(ret[pos] == ' ' || ret[pos] == '\r') { ret.erase(pos); pos--; } pos = 0; while(ret.size() && ret[pos] == ' ' || ret[pos] == '\r') { ret.erase(pos,1); } I think this should be: // remove any leading or trailing spaces and \r pos = ret.size()-1; while(ret.size() && (ret[pos] == ' ' || ret[pos] == '\r')) { ret.erase(pos); pos--; } pos = 0; while(ret.size() && (ret[pos] == ' ' || ret[pos] == '\r')) { ret.erase(pos,1); } But I haven't tested at all if this helps. Unfortunately, the backtrace lacks details, I've asked André Wöbbeking if he can produce a Valgrind log with debugging information. Note: This is cmIncludeDirectoryCommand::AddDirectory in Source/cmIncludeDirectoryCommand.cxx. Created attachment 334448 [details]
Valgrind log provided by André Wöbbeking
Here's a Valgrind log, unfortunately also without debugging information.
The Valgrind log says the bug is caused by a std::string created within cmIncludeDirectoryCommand::AddDirectory. There's only one such string: the string ret. It then crashes on a call to erase from within that same function. This can only be one of the ret.erase calls mentioned in comment #14. So I think my suggested fix should fix this problem. Should be fixed in Rawhide now. (Still waiting for kdepimlibs to build against the new cmake though, the chainbuild is currently stuck in the waitrepo phase.) This should also be fixed in F9 and F10. Can we just sync 2.6.3-2 or should it be backported to 2.6.2? I'm for pushing 2.6.3. Looks like my patch really fixed it. Successful kdepimlibs build here: http://koji.fedoraproject.org/koji/buildinfo?buildID=93380 Now to get it upstreamed... Upstream bug report (with patch): http://public.kitware.com/Bug/view.php?id=8704 Kevin - thanks for driving this. I see no reason not to sync 2.6.3-2 to F10. Do you want to keep driving this? Yes, I'll handle it. I'd like to sync it to F9 as well (also currently on 2.6.2), is that OK with you? (In reply to comment #22) > Yes, I'll handle it. I'd like to sync it to F9 as well (also currently on > 2.6.2), is that OK with you? Yes. Sounds good. Thanks again. cmake-2.6.3-2.fc10 has been submitted as an update for Fedora 10. http://admin.fedoraproject.org/updates/cmake-2.6.3-2.fc10 cmake-2.6.3-2.fc9 has been submitted as an update for Fedora 9. http://admin.fedoraproject.org/updates/cmake-2.6.3-2.fc9 cmake-2.6.3-3.fc9 has been pushed to the Fedora 9 stable repository. If problems still persist, please make note of it in this bug report. cmake-2.6.3-3.fc10 has been pushed to the Fedora 10 stable repository. If problems still persist, please make note of it in this bug report. |