565305 – test-case 1112 fails when building on ppc/ppc64

Bug 565305 - test-case 1112 fails when building on ppc/ppc64

Summary: test-case 1112 fails when building on ppc/ppc64

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	curl
Sub Component:
Version:	13
Hardware:	powerpc
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	---
Assignee:	Kamil Dudka
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-02-14 15:53 UTC by Josh Boyer
Modified:	2010-04-30 23:41 UTC (History)
CC List:	2 users (show)
Fixed In Version:	curl-7.20.0-4.fc13
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-02-25 18:34:04 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
curl build log (526.27 KB, text/plain) 2010-02-15 21:20 UTC, Josh Boyer	no flags	Details
latest build attempt (429.49 KB, text/plain) 2010-02-23 14:40 UTC, Josh Boyer	no flags	Details
tweak hard-wired timeout for slow buildhosts (392 bytes, patch) 2010-02-24 13:40 UTC, Kamil Dudka	no flags	Details \| Diff
Show Obsolete (1) View All

Description Josh Boyer 2010-02-14 15:53:09 UTC

Description of problem:

It seems some make check tests fail.

Comment 1 Daniel Stenberg 2010-02-15 21:04:27 UTC

Any chance we can see some output from the failure?

Comment 2 Josh Boyer 2010-02-15 21:20:25 UTC

Created attachment 394407 [details]
curl build log

Comment 3 Josh Boyer 2010-02-15 21:21:59 UTC

(In reply to comment #1)
> Any chance we can see some output from the failure?    

Build log attached.  Sorry about that, I thought I had attached it when I opened the bug and didn't notice it was missing.  You deserve kudos for not just closing this as CLOSED->WANKER.

Comment 4 Kamil Dudka 2010-02-15 21:32:34 UTC

I have seen a different failure dedicated to ppc64, it was an invalid read of size 8 within sscanf():

Invalid read of size 8
   at 0x41EB414: _IO_vfscanf@@GLIBC_2.4 (in /lib64/libc-2.11.1.so)
   by 0x41F78BB: __isoc99_sscanf (in /lib64/libc-2.11.1.so)
   by 0x40C762B: ???
   by 0x40C8963: Curl_connect
   by 0x40D1597: Curl_perform
   by 0x40D2697: curl_easy_perform
   by 0x10009467: main (main.c:5177)
 Address 0x7ff00e080 is just below the stack ptr.
 To suppress, use: --workaround-gcc296-bugs=yes

http://koji.fedoraproject.org/koji/getfile?taskID=1977189&name=build.log

This one looks more likely as glibc/ppc64 issue, which can be just ignored by the valgrind's parameter listed above.  But first I want to make sure it's not an actual error in the curl package.

Comment 5 Daniel Stenberg 2010-02-15 21:34:22 UTC

The build fail output looks totally crazy. In the logs for test 1, there are tracks of test 197. In the log for test 2, there are traces of test 198 and so on.

It looks like perhaps two tests were run at the same time in the same dir or something like that. I can't explain it otherwise.

Comment 6 Kamil Dudka 2010-02-15 21:47:49 UTC

(In reply to comment #5)
> The build fail output looks totally crazy. In the logs for test 1, there are
> tracks of test 197. In the log for test 2, there are traces of test 198 and so
> on.

Good point.  It may be just caused by running two ppc64 builds on the same build hosts.  There used to be similar issue while building ppc+ppc64 (also i686+x86_64 or s390+s390x) together on a build host, but it should be resolved.  I've abused the test-suite as follows:

# replace hard wired port numbers in the test suite
sed -i s/899\\\([0-9]\\\)/%{?__isa_bits}9\\1/ tests/data/test*
./runtests.pl -a -b%{?__isa_bits}90 -p -v

I'll try to run the same again and perhaps track down the invalid read issue...

Comment 7 Daniel Stenberg 2010-02-15 22:15:26 UTC

The primary test script tests/runtests.pl has a -b option that is meant to be able to change the "base" port, exactly to allow multiple tests to run at the same time (assuming they execute in different file trees).

Comment 8 Kamil Dudka 2010-02-15 22:38:39 UTC

Sorry, I meant to explain the hack a bit.  %{?__isa_bits} is an rpm macro, which is expanded to either "32" or "64", depending on the build arch.  So that we of course use the -b option with either 3290 or 6490.  The only problem is some port numbers are hard-wired in tests/data/test* - try this:

$ grep 899[0-9] tests/data/test*

Thus the port numbers are replaced by sed accordingly before running the test suite, just to not exclude the affected test cases from the list because of that setup.

Comment 9 Josh Boyer 2010-02-23 14:40:26 UTC

Created attachment 395727 [details]
latest build attempt

This is the latest build attempt, which still fails in the checks.  It reached 99% completion instead of 97% though.

This build is really holding up a bunch of other stuff on the shawdow ppc/ppc64 instance.  Would it be possible to get some attention on this, or even comment out the %check section for now?

Comment 10 Kamil Dudka 2010-02-23 15:05:29 UTC

(In reply to comment #9)
Josh, thank you for the reminder.  I'll have a go at that tomorrow if nothing urgent happens.

> Created an attachment (id=395727) [details]
> latest build attempt
> 
> This is the latest build attempt, which still fails in the checks.  It reached
> 99% completion instead of 97% though.

It looks much better.  Do you have any idea about what has improved the ratio?  I've changed nothing in the curl package itself...

> This build is really holding up a bunch of other stuff on the shawdow ppc/ppc64
> instance.  Would it be possible to get some attention on this, or even comment
> out the %check section for now?    

The second is *not* recommended.  If you need a stable package, use 7.19.7, which builds fine.  Disabling the test-suite without any investigation of its failure is always bad idea.

Is the log 100% reproducible?  Does always fail the same test case?  How exactly do you build the package?  I'll need to do the same with a modified package myself.

Comment 11 Josh Boyer 2010-02-23 15:23:19 UTC

(In reply to comment #10)
> > This is the latest build attempt, which still fails in the checks.  It reached
> > 99% completion instead of 97% though.
> 
> It looks much better.  Do you have any idea about what has improved the ratio? 
> I've changed nothing in the curl package itself...

No, I have no idea.  Honestly, I have no idea how any of that stuff works in the koji buildroots, as I was under the impression network access was not allowed.

> > This build is really holding up a bunch of other stuff on the shawdow ppc/ppc64
> > instance.  Would it be possible to get some attention on this, or even comment
> > out the %check section for now?    
> 
> The second is *not* recommended.  If you need a stable package, use 7.19.7,
> which builds fine.  Disabling the test-suite without any investigation of its
> failure is always bad idea.

7.19.7 does me no good.  The buildsystem needs the newer curl package to make progress on building other packages due to how the koji shawdow buildroots work.

> Is the log 100% reproducible?  Does always fail the same test case?

Unknown.  The only thing I've noticed is that every time koji-shadow tries to build it, it fails.  The results should be browseable at http://66.227.170.160:8008/koji/index

> How exactly do you build the package?  I'll need to do the same with a modified package myself.

koji -s https://66.227.170.160/kojihub build dist-f13-updates-candidate <srpm>

(i think).  If that doesn't work let me know and we'll figure something out.

Comment 12 Kamil Dudka 2010-02-24 11:55:43 UTC

(In reply to comment #10)
> Is the log 100% reproducible?

AFAICT, yes.

> Does always fail the same test case?

Yes, it's the test-case 1112.

Comment 13 Kamil Dudka 2010-02-24 13:40:37 UTC

Created attachment 396059 [details]
tweak hard-wired timeout for slow buildhosts

It's not a bug in libcurl, but in the test-case 1112.  The buildhosts is simply too slow and the hard-wired timeout is too short for it to succeed.  With the attached patch it works. I ran the test-case 1112 16x in a loop - 100% failure without the patch, 100% success with the patch applied.

I'll just disable the test-case 1112.

Comment 14 Kamil Dudka 2010-02-24 20:44:59 UTC

reported upstream: http://curl.haxx.se/mail/lib-2010-02/0195.html

Comment 15 Kamil Dudka 2010-02-25 09:47:05 UTC

Fixed in curl-7.20.0-2.fc13 by excluding the test1112, nevertheless it still seems to _hang_ indefinitely on test405 at times :-(

Comment 16 Josh Boyer 2010-02-25 15:57:17 UTC

(In reply to comment #15)
> Fixed in curl-7.20.0-2.fc13 by excluding the test1112, nevertheless it still
> seems to _hang_ indefinitely on test405 at times :-(    

curl-7.20.0-2.fc13 does indeed build on the secondary build system now.  Many thanks for working this.

The builders in the setup are rather slow, and they do have varied workloads running on them at the same time as the curl builds so I am not surprised timeouts occur.

Comment 17 Kamil Dudka 2010-02-25 18:34:04 UTC

Great!  The reported bug has been addressed, so I am closing it now.  Feel free to report the another hanging test-case separately if it wasn't an accident.

Comment 18 Kamil Dudka 2010-04-24 22:21:44 UTC

(In reply to comment #15)
> Fixed in curl-7.20.0-2.fc13 by excluding the test1112, nevertheless it still
> seems to _hang_ indefinitely on test405 at times :-(

The issue with hanging test405 should be resolved in curl-7.20.1-4.fc14, upstream commit 82e9b78:

http://github.com/bagder/curl/commit/82e9b78a388ab539c8784cd853adf6e4a97d52c5

Comment 19 Fedora Update System 2010-04-27 07:11:42 UTC

curl-7.20.0-4.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/curl-7.20.0-4.fc13

Comment 20 Fedora Update System 2010-04-30 23:41:10 UTC

curl-7.20.0-4.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.