Bug 1190978 - gcc 5.0.0 causes FTBFS of postgresql
Summary: gcc 5.0.0 causes FTBFS of postgresql
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: gcc
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Fedora Extras Quality Assurance
URL: http://koji.fedoraproject.org/koji/ta...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-02-10 07:47 UTC by Petr Pisar
Modified: 2015-02-15 17:36 UTC (History)
11 users (show)

Fixed In Version: gcc-5.0.0-0.13.fc22
Clone Of:
Environment:
Last Closed: 2015-02-15 15:50:19 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Small reproducer (75.18 KB, application/x-gzip)
2015-02-13 14:47 UTC, Pavel Raiskup
no flags Details


Links
System ID Private Priority Status Summary Last Updated
GNU Compiler Collection 65053 0 None None None Never

Description Petr Pisar 2015-02-10 07:47:43 UTC
postgresql-9.4.1-1.fc22 fails to build in F22:

============== creating temporary installation        ==============
============== initializing database system           ==============
pg_regress: initdb failed
Examine /builddir/build/BUILD/postgresql-9.4.1/src/pl/plperl/log/initdb.log for the reason.
Command was: "/builddir/build/BUILD/postgresql-9.4.1/src/pl/plperl/./tmp_check/install//usr/bin/initdb" -D "/builddir/build/BUILD/postgresql-9.4.1/src/pl/plperl/./tmp_check/data" -L "/builddir/build/BUILD/postgresql-9.4.1/src/pl/plperl/./tmp_check/install//usr/share/pgsql" --noclean --nosync > "/builddir/build/BUILD/postgresql-9.4.1/src/pl/plperl/log/initdb.log" 2>&1
GNUmakefile:120: recipe for target 'check' failed
make[1]: Leaving directory '/builddir/build/BUILD/postgresql-9.4.1/src/pl/plperl'
make[1]: *** [check] Error 2
make: *** [check-plperl-recurse] Error 2
Makefile:35: recipe for target 'check-plperl-recurse' failed
make: Leaving directory '/builddir/build/BUILD/postgresql-9.4.1/src/pl'
+ test_failure=1
+ set +x
=== make failure: src/pl/plperl/regression.diffs ===
+ mv src/Makefile.global src/Makefile.global.save
+ cp src/Makefile.global.python3 src/Makefile.global
RPM build errors:
cp: error writing 'src/Makefile.global': No space left on device
cp: failed to extend 'src/Makefile.global': No space left on device
error: Bad exit status from /var/tmp/rpm-tmp.Usxj2a (%build)
    Bad exit status from /var/tmp/rpm-tmp.Usxj2a (%build)
Child return code was: 1
EXCEPTION: Command failed. See logs for output.

Difference between working and failing build root:

        perl-Encode 	2:2.68-1.fc22 	> 	2:2.70-1.fc22
	libgcc 	4.9.2-5.fc22 	> 	5.0.0-0.7.fc22
	libgomp 	4.9.2-5.fc22 	> 	5.0.0-0.7.fc22
	shared-mime-info 	1.4-1.fc22 	> 	1.4-2.fc22
	libstdc++ 	4.9.2-5.fc22 	> 	5.0.0-0.7.fc22
	gcc-c++ 	4.9.2-5.fc22 	> 	5.0.0-0.7.fc22
	gcc 	4.9.2-5.fc22 	> 	5.0.0-0.7.fc22
	isl 		> 	0.14-3.fc22
	cpp 	4.9.2-5.fc22 	> 	5.0.0-0.7.fc22
	libstdc++-devel 	4.9.2-5.fc22 	> 	5.0.0-0.7.fc22

Comment 1 Pavel Raiskup 2015-02-11 14:45:28 UTC
Thanks for the report.  Reproduced, initdb (its sub-process postgres) eats a
lot of space in PGDATA dir, when we use gcc-5.0.0 (not entirely sure gcc is
the real trigger), together with -O2.  With -O1/-O0 initdb works fine.

I'll try to figure out how to debug, and debug this properly.  Or at least
minimize minimal example.  Postgres process aborts while initdb tries to feed
the process with bki file:

#0  0x00007ffff7330187 in raise () from /lib64/libc.so.6
#1  0x00007ffff7331dea in abort () from /lib64/libc.so.6
#2  0x000000000073a1c9 in errfinish (dummy=dummy@entry=0) at elog.c:569
#3  0x000000000073bbb0 in elog_finish (elevel=elevel@entry=22, fmt=fmt@entry=0x772b70 "cannot abort transaction %u, it was already committed") at elog.c:1362
#4  0x00000000004b05f3 in RecordTransactionAbort (isSubXact=isSubXact@entry=0 '\000') at xact.c:1467
#5  0x00000000004b06b4 in AbortTransaction () at xact.c:2415
#6  0x00000000004b3955 in AbortOutOfAnyTransaction () at xact.c:4000
#7  0x00000000007447c9 in ShutdownPostgres (code=<optimized out>, arg=<optimized out>) at postinit.c:1058
#8  0x000000000064994d in shmem_exit (code=code@entry=1) at ipc.c:230
#9  0x0000000000649a35 in proc_exit_prepare (code=code@entry=1) at ipc.c:187
#10 0x0000000000649aa8 in proc_exit (code=code@entry=1) at ipc.c:102
#11 0x000000000073a1f5 in errfinish (dummy=<optimized out>) at elog.c:555
#12 0x0000000000660ef8 in mdextend (reln=0xc0cd90, forknum=FSM_FORKNUM, blocknum=<optimized out>, buffer=0xc16660 "", skipFsync=<optimized out>) at md.c:527
#13 0x00000000006471c4 in fsm_extend (fsm_nblocks=1055795, rel=0xbefb80) at freespace.c:587
#14 fsm_readbuf (rel=rel@entry=0xbefb80, addr=..., addr@entry=..., extend=extend@entry=1 '\001') at freespace.c:525
#15 0x00000000006472ee in fsm_set_and_search (rel=rel@entry=0xbefb80, addr=..., slot=slot@entry=3518, newValue=<optimized out>, minValue=minValue@entry=6 '\006') at freespace.c:615
#16 0x000000000064778d in RecordAndGetPageWithFreeSpace (rel=rel@entry=0xbefb80, oldPage=oldPage@entry=4294967295, oldSpaceAvail=oldSpaceAvail@entry=0, spaceNeeded=spaceNeeded@entry=176) at freespace.c:159
#17 0x000000000048f1b2 in RelationGetBufferForTuple (relation=relation@entry=0xbefb80, len=176, otherBuffer=otherBuffer@entry=0, options=options@entry=0, bistate=bistate@entry=0x0, vmbuffer=vmbuffer@entry=0x7fffffffdb0c, 
    vmbuffer_other=0x0) at hio.c:414
#18 0x0000000000488c9a in heap_insert (relation=0xbefb80, tup=tup@entry=0xc07800, cid=<optimized out>, options=options@entry=0, bistate=bistate@entry=0x0) at heapam.c:2082
#19 0x00000000004899be in simple_heap_insert (relation=<optimized out>, tup=tup@entry=0xc07800) at heapam.c:2572
#20 0x00000000004cf211 in InsertOneTuple (objectid=1242) at bootstrap.c:799
#21 0x00000000004cdcd9 in boot_yyparse () at bootparse.y:277
#22 0x00000000004ce83f in BootstrapModeMain () at bootstrap.c:491
#23 AuxiliaryProcessMain (argc=5, argc@entry=6, argv=0xba2998, argv@entry=0xba2990) at bootstrap.c:411
#24 0x00000000004609db in main (argc=6, argv=0xba2990) at main.c:219

Pavel

Comment 2 Tom Lane 2015-02-11 15:31:03 UTC
Judging from the stack trace, I'd say that something is busted in the logic that determines where in a relation (aka table, file) there is a page with enough free space to insert a new tuple.  For some reason it's repeatedly deciding it can't find enough space and then extending the relation by another page.  This probably points to a compiler bug or overenthusiastic optimization manifesting somewhere in the FSM (free space map) logic.

I'm a bit busy right now but am willing to help out if you can't isolate it quickly.

Comment 3 Tom Lane 2015-02-12 23:13:27 UTC
I dug into this a bit in a rawhide mock installation.  It appears that your stack trace above is telling the truth that RelationGetBufferForTuple is passing oldPage=oldPage@entry=4294967295 to RecordAndGetPageWithFreeSpace.  The latter then goes nuts extending the free space map out to such a high block number.  (So it's not really an infinite loop, but it is consuming unreasonable amounts of disk space.)

Now the thing is that the logic in RelationGetBufferForTuple() looks like this:

while (targetBlock != InvalidBlockNumber)
{
   ... do a bunch of stuff that does not change targetBlock ...

   targetBlock = RecordAndGetPageWithFreeSpace(relation,
					       targetBlock,
					       pageFreeSpace,
					       len + saveFreeSpace);
}

It is therefore impossible on its face that this code ever passes 4294967295
(a/k/a InvalidBlockNumber) to RecordAndGetPageWithFreeSpace.  And yet it is
doing that: I put a test for oldPage == InvalidBlockNumber into RecordAndGetPageWithFreeSpace, and it fired.

I think we can safely classify this as a gcc bug, and a pretty bad one too.

Comment 4 Pavel Raiskup 2015-02-13 12:29:14 UTC
Thanks Tom for looking at it.  Yes, I agree - clear gcc bug.  I'm trying to cut
out minimal example and I'll switch then to gcc.

Comment 5 Pavel Raiskup 2015-02-13 14:47:31 UTC
Created attachment 991396 [details]
Small reproducer

Ok, it can be definitely more "minimized", but for gcc purposes the attached
example should be good enough.

There is main() calling reproduce() from different module, that calls another
functions from yet another module.  Check the reproduce function
(reproducer.c), simplified:

    block = invalid;
    while (block != invalid) {
        // should never run
        printf("but this is run with -O2\n");
    }

=== wrong behavior ===

    $ ./configure CFLAGS="-O2 -g3" >/dev/null
    $ make >/dev/null
    $ ./hello
    equals(a,b): 0, a = 4294967295, b = 4294967295

=== correct behavior ===

    $ ./configure CFLAGS="-O2 -g3" >/dev/null
    $ make >/dev/null
    $ ./hello # should be silent

=== with -O0 program behaves correctly ===

    $ ./configure CFLAGS="-O0 -g3" >/dev/null
    $ make >/dev/null
    $ ./hello # should be silent

Comment 6 Marek Polacek 2015-02-13 15:56:49 UTC
Reduced:

int i;

__attribute__ ((noinline))
unsigned int foo (void)
{
  return 0;
}

int
main ()
{
  unsigned int u = -1;
  if (u == -1)
    {
      unsigned int n = foo ();
      if (n > 0)
	u = n - 1;
    }

  while (u != -1)
    {
      asm ("" : "+g" (u));
      u = -1;
      i = 1;
    }

  if (i)
    __builtin_abort ();
}

Comment 7 Marek Polacek 2015-02-13 15:58:24 UTC
Ok with -O and -O2 -fno-tree-vrp; fails with -O2.

Comment 8 Jakub Jelinek 2015-02-13 16:32:43 UTC
Tracking this upstream now.

Comment 9 Tom Lane 2015-02-15 17:36:44 UTC
Confirmed that postgresql-9.4.1-1.fc23 builds (including passing its self-tests) with gcc-5.0.0-0.13.fc23.x86_64, where it did not with gcc-5.0.0-0.12.fc23.x86_64.  Thanks for the quick turnaround!


Note You need to log in before you can comment on or make changes to this bug.