Bug 475876 - CMake 2.6.2 dies on PPC64 builders (std::out_of_range)
CMake 2.6.2 dies on PPC64 builders (std::out_of_range)
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: cmake (Show other bugs)
rawhide
ppc64 Linux
high Severity high
: ---
: ---
Assigned To: Orion Poplawski
Fedora Extras Quality Assurance
http://public.kitware.com/Bug/view.ph...
: Reopened
Depends On:
Blocks: FE-ExcludeArch-ppc64/F-ExcludeArch-ppc64
  Show dependency treegraph
 
Reported: 2008-12-10 16:25 EST by Lorenzo Villani
Modified: 2009-03-25 12:14 EDT (History)
4 users (show)

See Also:
Fixed In Version: 2.6.3-2.fc11
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-03-10 10:40:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
build.log on ppc64 (38.05 KB, application/octet-stream)
2009-01-13 10:43 EST, Lorenzo Villani
no flags Details
Valgrind log provided by André Wöbbeking (11.25 KB, text/plain)
2009-03-08 14:54 EDT, Kevin Kofler
no flags Details

  None (edit)
Description Lorenzo Villani 2008-12-10 16:25:50 EST
CMake 2.6.2 fails to build packages on ppc64 builders this is a recurring and apparently well reproducible bug.
We should still test if the bug is reproducible on ppc32

** Build tasks in koji (from earlies to the latest):
http://koji.fedoraproject.org/koji/taskinfo?taskID=989531
http://koji.fedoraproject.org/koji/taskinfo?taskID=989611
http://koji.fedoraproject.org/koji/taskinfo?taskID=991850
http://koji.fedoraproject.org/koji/taskinfo?taskID=991912

The last build uses the --debug-output trick, Kevin reported it to work once but unfortunately it didn't work this time. We can't of course try again and again until it builds. :-)
Comment 1 Orion Poplawski 2008-12-10 16:47:31 EST
I can't reproduce this on the only ppc/ppc64 machine I have access to, so I'm pretty much stuck.

I'm building 2.6.3 RC-5 for rawhide, so you might try that once it completes and has been added to the repo.  No idea if it will help, but worth a shot.
Comment 2 Lorenzo Villani 2008-12-10 17:21:02 EST
I'm giving it a try tomorrow
Comment 3 Orion Poplawski 2008-12-10 17:23:35 EST
There's a snag in my cmake build (Bug 475887).  We'll have to see what Enrico has to say about that.
Comment 4 Kevin Kofler 2008-12-10 20:57:34 EST
> We can't of course try again and again until it builds. :-)

Oh we can. ;-)
But of course it's not a good solution.
Comment 5 Orion Poplawski 2008-12-11 12:24:36 EST
Okay, 2.6.3 RC-5 has been built and it looks like the newRepo task has finished.
Comment 6 Lorenzo Villani 2008-12-12 09:45:19 EST
(In reply to comment #5)
> Okay, 2.6.3 RC-5 has been built and it looks like the newRepo task has
> finished.

New task submitted by Kevin:
http://koji.fedoraproject.org/koji/taskinfo?taskID=994835

It seems that the mockroot uses the 2.6.3 RC-5 version of cmake but the problem is still there.
Comment 7 Orion Poplawski 2008-12-12 11:44:10 EST
I've been unable to reproduce myself with various debug methods enabled.  I think it would be helpful to add:

LD_PRELOAD=libSegFault.so SEGFAULT_SIGNALS=abrt

to that start of the %{cmake..  line to try to catch future failures and get a stack trace.
Comment 8 Kevin Kofler 2008-12-12 11:54:06 EST
FYI, as with the latest cmake/kdepimlibs combination it reproducibly (on Koji) crashes in the same file (according to the debugging output), I tried doing some debugging with messages:
http://cvs.fedoraproject.org/viewvc/rpms/kdepimlibs/devel/kdepimlibs-4.1.85-debug-cmake-crash.patch?revision=1.1&view=markup
and poof, there goes the bug. This is a highly elusive Heisenbug. I suspect an uninitialized variable somewhere. Have you tried running it through valgrind on ppc64?
Comment 9 Lorenzo Villani 2009-01-13 10:32:18 EST
The Heisenbug disappeared somehow, I'm closing this bug report. It will be re-opened if necessary.
Comment 10 Kevin Kofler 2009-01-13 10:39:22 EST
It disappeared because we were trying to debug it (that's why it's a Heisenbug ;-) ), we just kept the debug patch there so it keeps building.
Comment 11 Lorenzo Villani 2009-01-13 10:43:32 EST
Created attachment 328872 [details]
build.log on ppc64

The build log on ppc64
Comment 12 Lorenzo Villani 2009-01-13 10:44:19 EST
Link to job in koji: http://koji.fedoraproject.org/koji/taskinfo?taskID=1049740
Comment 13 Kevin Kofler 2009-03-07 18:06:43 EST
André Wöbbeking e-mailed me this backtrace from:

LD_PRELOAD=libSegFault.so SEGFAULT_SIGNALS=abrt

(on x86_64):

cmake(_ZN25cmIncludeDirectoryCommand12AddDirectoryEPKcbb+0x2ca)[0x4ee7ea]
cmake(_ZN25cmIncludeDirectoryCommand11InitialPassERKSt6vectorISsSaISsEER17cmExecutionStatus+0xaa)[0x4fadfa]
cmake(_ZN9cmCommand17InvokeInitialPassERKSt6vectorI18cmListFileArgumentSaIS1_EER17cmExecutionStatus+0x4e)[0x527fbe]
cmake(_ZN10cmMakefile14ExecuteCommandERK18cmListFileFunctionR17cmExecutionStatus+0x2ec)[0x48c27c]
cmake(_ZN10cmMakefile12ReadListFileEPKcS1_PSs+0x49d)[0x492e4d]
cmake(_ZN16cmLocalGenerator9ConfigureEv+0xac)[0x59ef9c]
cmake(_ZN29cmLocalUnixMakefileGenerator39ConfigureEv+0x87)[0x5a2747]
cmake(_ZN10cmMakefile21ConfigureSubDirectoryEP16cmLocalGenerator+0xc6)[0x4956c6]
cmake(_ZN10cmMakefile15AddSubDirectoryEPKcS1_bbb+0x1ca)[0x49596a]
cmake(_ZN24cmAddSubDirectoryCommand11InitialPassERKSt6vectorISsSaISsEER17cmExecutionStatus+0x2d8)[0x506798]
cmake(_ZN9cmCommand17InvokeInitialPassERKSt6vectorI18cmListFileArgumentSaIS1_EER17cmExecutionStatus+0x4e)[0x527fbe]
cmake(_ZN10cmMakefile14ExecuteCommandERK18cmListFileFunctionR17cmExecutionStatus+0x2ec)[0x48c27c]
cmake(_ZN10cmMakefile12ReadListFileEPKcS1_PSs+0x49d)[0x492e4d]
cmake(_ZN16cmLocalGenerator9ConfigureEv+0xac)[0x59ef9c]
cmake(_ZN29cmLocalUnixMakefileGenerator39ConfigureEv+0x87)[0x5a2747]
cmake(_ZN17cmGlobalGenerator9ConfigureEv+0x2e5)[0x5758e5]
cmake(_ZN5cmake15ActualConfigureEv+0xc3)[0x4d82b3]
cmake(_ZN5cmake9ConfigureEv+0x44)[0x4d87c4]
cmake(_ZN5cmake3RunERKSt6vectorISsSaISsEEb+0x17e)[0x4dfede]
cmake(_Z8do_cmakeiPPc+0xcb8)[0x46b688]
cmake(main+0x2c)[0x46c2cc]

(He mailed me this on February 25, sorry for not posting it sooner, I was busy with other stuff.)
Comment 14 Kevin Kofler 2009-03-07 18:21:43 EST
The function at the top of the backtrace has this code:
  // remove any leading or trailing spaces and \r
  pos = ret.size()-1;
  while(ret[pos] == ' ' || ret[pos] == '\r')
    {
    ret.erase(pos);
    pos--;
    }
  pos = 0;
  while(ret.size() && ret[pos] == ' ' || ret[pos] == '\r')
    {
    ret.erase(pos,1);
    }

I think this should be:
  // remove any leading or trailing spaces and \r
  pos = ret.size()-1;
  while(ret.size() && (ret[pos] == ' ' || ret[pos] == '\r'))
    {
    ret.erase(pos);
    pos--;
    }
  pos = 0;
  while(ret.size() && (ret[pos] == ' ' || ret[pos] == '\r'))
    {
    ret.erase(pos,1);
    }

But I haven't tested at all if this helps. Unfortunately, the backtrace lacks details, I've asked André Wöbbeking if he can produce a Valgrind log with debugging information.
Comment 15 Kevin Kofler 2009-03-07 18:23:43 EST
Note: This is cmIncludeDirectoryCommand::AddDirectory in Source/cmIncludeDirectoryCommand.cxx.
Comment 16 Kevin Kofler 2009-03-08 14:54:33 EDT
Created attachment 334448 [details]
Valgrind log provided by André Wöbbeking

Here's a Valgrind log, unfortunately also without debugging information.
Comment 17 Kevin Kofler 2009-03-08 15:00:04 EDT
The Valgrind log says the bug is caused by a std::string created within cmIncludeDirectoryCommand::AddDirectory. There's only one such string: the string ret. It then crashes on a call to erase from within that same function. This can only be one of the ret.erase calls mentioned in comment #14. So I think my suggested fix should fix this problem.
Comment 18 Kevin Kofler 2009-03-08 22:39:41 EDT
Should be fixed in Rawhide now. (Still waiting for kdepimlibs to build against the new cmake though, the chainbuild is currently stuck in the waitrepo phase.)

This should also be fixed in F9 and F10. Can we just sync 2.6.3-2 or should it be backported to 2.6.2? I'm for pushing 2.6.3.
Comment 19 Kevin Kofler 2009-03-09 03:59:32 EDT
Looks like my patch really fixed it. Successful kdepimlibs build here:
http://koji.fedoraproject.org/koji/buildinfo?buildID=93380

Now to get it upstreamed...
Comment 20 Kevin Kofler 2009-03-09 04:14:55 EDT
Upstream bug report (with patch): http://public.kitware.com/Bug/view.php?id=8704
Comment 21 Orion Poplawski 2009-03-09 17:16:14 EDT
Kevin - thanks for driving this.  I see no reason not to sync 2.6.3-2 to F10.  Do you want to keep driving this?
Comment 22 Kevin Kofler 2009-03-09 17:19:21 EDT
Yes, I'll handle it. I'd like to sync it to F9 as well (also currently on 2.6.2), is that OK with you?
Comment 23 Orion Poplawski 2009-03-10 10:40:44 EDT
(In reply to comment #22)
> Yes, I'll handle it. I'd like to sync it to F9 as well (also currently on
> 2.6.2), is that OK with you?  

Yes.  Sounds good.  Thanks again.
Comment 24 Fedora Update System 2009-03-11 23:15:42 EDT
cmake-2.6.3-2.fc10 has been submitted as an update for Fedora 10.
http://admin.fedoraproject.org/updates/cmake-2.6.3-2.fc10
Comment 25 Fedora Update System 2009-03-11 23:16:50 EDT
cmake-2.6.3-2.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/cmake-2.6.3-2.fc9
Comment 26 Fedora Update System 2009-03-25 12:10:43 EDT
cmake-2.6.3-3.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 27 Fedora Update System 2009-03-25 12:14:29 EDT
cmake-2.6.3-3.fc10 has been pushed to the Fedora 10 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.