1190978 – gcc 5.0.0 causes FTBFS of postgresql

Bug 1190978 - gcc 5.0.0 causes FTBFS of postgresql

Summary: gcc 5.0.0 causes FTBFS of postgresql

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	gcc
Sub Component:
Version:	rawhide
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:	http://koji.fedoraproject.org/koji/ta...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-02-10 07:47 UTC by Petr Pisar
Modified:	2015-02-15 17:36 UTC (History)
CC List:	11 users (show)
Fixed In Version:	gcc-5.0.0-0.13.fc22
Clone Of:
Environment:
Last Closed:	2015-02-15 15:50:19 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Small reproducer (75.18 KB, application/x-gzip) 2015-02-13 14:47 UTC, Pavel Raiskup	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
GNU Compiler Collection	65053	0	None	None	None	Never

Description Petr Pisar 2015-02-10 07:47:43 UTC

postgresql-9.4.1-1.fc22 fails to build in F22:

============== creating temporary installation        ==============
============== initializing database system           ==============
pg_regress: initdb failed
Examine /builddir/build/BUILD/postgresql-9.4.1/src/pl/plperl/log/initdb.log for the reason.
Command was: "/builddir/build/BUILD/postgresql-9.4.1/src/pl/plperl/./tmp_check/install//usr/bin/initdb" -D "/builddir/build/BUILD/postgresql-9.4.1/src/pl/plperl/./tmp_check/data" -L "/builddir/build/BUILD/postgresql-9.4.1/src/pl/plperl/./tmp_check/install//usr/share/pgsql" --noclean --nosync > "/builddir/build/BUILD/postgresql-9.4.1/src/pl/plperl/log/initdb.log" 2>&1
GNUmakefile:120: recipe for target 'check' failed
make[1]: Leaving directory '/builddir/build/BUILD/postgresql-9.4.1/src/pl/plperl'
make[1]: *** [check] Error 2
make: *** [check-plperl-recurse] Error 2
Makefile:35: recipe for target 'check-plperl-recurse' failed
make: Leaving directory '/builddir/build/BUILD/postgresql-9.4.1/src/pl'
+ test_failure=1
+ set +x
=== make failure: src/pl/plperl/regression.diffs ===
+ mv src/Makefile.global src/Makefile.global.save
+ cp src/Makefile.global.python3 src/Makefile.global
RPM build errors:
cp: error writing 'src/Makefile.global': No space left on device
cp: failed to extend 'src/Makefile.global': No space left on device
error: Bad exit status from /var/tmp/rpm-tmp.Usxj2a (%build)
    Bad exit status from /var/tmp/rpm-tmp.Usxj2a (%build)
Child return code was: 1
EXCEPTION: Command failed. See logs for output.

Difference between working and failing build root:

        perl-Encode 	2:2.68-1.fc22 	> 	2:2.70-1.fc22
	libgcc 	4.9.2-5.fc22 	> 	5.0.0-0.7.fc22
	libgomp 	4.9.2-5.fc22 	> 	5.0.0-0.7.fc22
	shared-mime-info 	1.4-1.fc22 	> 	1.4-2.fc22
	libstdc++ 	4.9.2-5.fc22 	> 	5.0.0-0.7.fc22
	gcc-c++ 	4.9.2-5.fc22 	> 	5.0.0-0.7.fc22
	gcc 	4.9.2-5.fc22 	> 	5.0.0-0.7.fc22
	isl 		> 	0.14-3.fc22
	cpp 	4.9.2-5.fc22 	> 	5.0.0-0.7.fc22
	libstdc++-devel 	4.9.2-5.fc22 	> 	5.0.0-0.7.fc22

Comment 1 Pavel Raiskup 2015-02-11 14:45:28 UTC

Thanks for the report.  Reproduced, initdb (its sub-process postgres) eats a
lot of space in PGDATA dir, when we use gcc-5.0.0 (not entirely sure gcc is
the real trigger), together with -O2.  With -O1/-O0 initdb works fine.

I'll try to figure out how to debug, and debug this properly.  Or at least
minimize minimal example.  Postgres process aborts while initdb tries to feed
the process with bki file:

#0  0x00007ffff7330187 in raise () from /lib64/libc.so.6
#1  0x00007ffff7331dea in abort () from /lib64/libc.so.6
#2  0x000000000073a1c9 in errfinish (dummy=dummy@entry=0) at elog.c:569
#3  0x000000000073bbb0 in elog_finish (elevel=elevel@entry=22, fmt=fmt@entry=0x772b70 "cannot abort transaction %u, it was already committed") at elog.c:1362
#4  0x00000000004b05f3 in RecordTransactionAbort (isSubXact=isSubXact@entry=0 '\000') at xact.c:1467
#5  0x00000000004b06b4 in AbortTransaction () at xact.c:2415
#6  0x00000000004b3955 in AbortOutOfAnyTransaction () at xact.c:4000
#7  0x00000000007447c9 in ShutdownPostgres (code=<optimized out>, arg=<optimized out>) at postinit.c:1058
#8  0x000000000064994d in shmem_exit (code=code@entry=1) at ipc.c:230
#9  0x0000000000649a35 in proc_exit_prepare (code=code@entry=1) at ipc.c:187
#10 0x0000000000649aa8 in proc_exit (code=code@entry=1) at ipc.c:102
#11 0x000000000073a1f5 in errfinish (dummy=<optimized out>) at elog.c:555
#12 0x0000000000660ef8 in mdextend (reln=0xc0cd90, forknum=FSM_FORKNUM, blocknum=<optimized out>, buffer=0xc16660 "", skipFsync=<optimized out>) at md.c:527
#13 0x00000000006471c4 in fsm_extend (fsm_nblocks=1055795, rel=0xbefb80) at freespace.c:587
#14 fsm_readbuf (rel=rel@entry=0xbefb80, addr=..., addr@entry=..., extend=extend@entry=1 '\001') at freespace.c:525
#15 0x00000000006472ee in fsm_set_and_search (rel=rel@entry=0xbefb80, addr=..., slot=slot@entry=3518, newValue=<optimized out>, minValue=minValue@entry=6 '\006') at freespace.c:615
#16 0x000000000064778d in RecordAndGetPageWithFreeSpace (rel=rel@entry=0xbefb80, oldPage=oldPage@entry=4294967295, oldSpaceAvail=oldSpaceAvail@entry=0, spaceNeeded=spaceNeeded@entry=176) at freespace.c:159
#17 0x000000000048f1b2 in RelationGetBufferForTuple (relation=relation@entry=0xbefb80, len=176, otherBuffer=otherBuffer@entry=0, options=options@entry=0, bistate=bistate@entry=0x0, vmbuffer=vmbuffer@entry=0x7fffffffdb0c, 
    vmbuffer_other=0x0) at hio.c:414
#18 0x0000000000488c9a in heap_insert (relation=0xbefb80, tup=tup@entry=0xc07800, cid=<optimized out>, options=options@entry=0, bistate=bistate@entry=0x0) at heapam.c:2082
#19 0x00000000004899be in simple_heap_insert (relation=<optimized out>, tup=tup@entry=0xc07800) at heapam.c:2572
#20 0x00000000004cf211 in InsertOneTuple (objectid=1242) at bootstrap.c:799
#21 0x00000000004cdcd9 in boot_yyparse () at bootparse.y:277
#22 0x00000000004ce83f in BootstrapModeMain () at bootstrap.c:491
#23 AuxiliaryProcessMain (argc=5, argc@entry=6, argv=0xba2998, argv@entry=0xba2990) at bootstrap.c:411
#24 0x00000000004609db in main (argc=6, argv=0xba2990) at main.c:219

Pavel

Comment 2 Tom Lane 2015-02-11 15:31:03 UTC

Judging from the stack trace, I'd say that something is busted in the logic that determines where in a relation (aka table, file) there is a page with enough free space to insert a new tuple.  For some reason it's repeatedly deciding it can't find enough space and then extending the relation by another page.  This probably points to a compiler bug or overenthusiastic optimization manifesting somewhere in the FSM (free space map) logic.

I'm a bit busy right now but am willing to help out if you can't isolate it quickly.

Comment 3 Tom Lane 2015-02-12 23:13:27 UTC

I dug into this a bit in a rawhide mock installation.  It appears that your stack trace above is telling the truth that RelationGetBufferForTuple is passing oldPage=oldPage@entry=4294967295 to RecordAndGetPageWithFreeSpace.  The latter then goes nuts extending the free space map out to such a high block number.  (So it's not really an infinite loop, but it is consuming unreasonable amounts of disk space.)

Now the thing is that the logic in RelationGetBufferForTuple() looks like this:

while (targetBlock != InvalidBlockNumber)
{
   ... do a bunch of stuff that does not change targetBlock ...

   targetBlock = RecordAndGetPageWithFreeSpace(relation,
					       targetBlock,
					       pageFreeSpace,
					       len + saveFreeSpace);
}

It is therefore impossible on its face that this code ever passes 4294967295
(a/k/a InvalidBlockNumber) to RecordAndGetPageWithFreeSpace.  And yet it is
doing that: I put a test for oldPage == InvalidBlockNumber into RecordAndGetPageWithFreeSpace, and it fired.

I think we can safely classify this as a gcc bug, and a pretty bad one too.

Comment 4 Pavel Raiskup 2015-02-13 12:29:14 UTC

Thanks Tom for looking at it.  Yes, I agree - clear gcc bug.  I'm trying to cut
out minimal example and I'll switch then to gcc.

Comment 5 Pavel Raiskup 2015-02-13 14:47:31 UTC

Created attachment 991396 [details]
Small reproducer

Ok, it can be definitely more "minimized", but for gcc purposes the attached
example should be good enough.

There is main() calling reproduce() from different module, that calls another
functions from yet another module.  Check the reproduce function
(reproducer.c), simplified:

    block = invalid;
    while (block != invalid) {
        // should never run
        printf("but this is run with -O2\n");
    }

=== wrong behavior ===

    $ ./configure CFLAGS="-O2 -g3" >/dev/null
    $ make >/dev/null
    $ ./hello
    equals(a,b): 0, a = 4294967295, b = 4294967295

=== correct behavior ===

    $ ./configure CFLAGS="-O2 -g3" >/dev/null
    $ make >/dev/null
    $ ./hello # should be silent

=== with -O0 program behaves correctly ===

    $ ./configure CFLAGS="-O0 -g3" >/dev/null
    $ make >/dev/null
    $ ./hello # should be silent

Comment 6 Marek Polacek 2015-02-13 15:56:49 UTC

Reduced:

int i;

__attribute__ ((noinline))
unsigned int foo (void)
{
  return 0;
}

int
main ()
{
  unsigned int u = -1;
  if (u == -1)
    {
      unsigned int n = foo ();
      if (n > 0)
	u = n - 1;
    }

  while (u != -1)
    {
      asm ("" : "+g" (u));
      u = -1;
      i = 1;
    }

  if (i)
    __builtin_abort ();
}

Comment 7 Marek Polacek 2015-02-13 15:58:24 UTC

Ok with -O and -O2 -fno-tree-vrp; fails with -O2.

Comment 8 Jakub Jelinek 2015-02-13 16:32:43 UTC

Tracking this upstream now.

Comment 9 Tom Lane 2015-02-15 17:36:44 UTC

Confirmed that postgresql-9.4.1-1.fc23 builds (including passing its self-tests) with gcc-5.0.0-0.13.fc23.x86_64, where it did not with gcc-5.0.0-0.12.fc23.x86_64.  Thanks for the quick turnaround!

Note You need to log in before you can comment on or make changes to this bug.