Bug 1039605

Summary: [abrt] postgresql-server-9.3.2-1.fc20: errfinish: Process /usr/bin/postgres was killed by signal 6 (SIGABRT)
Product: [Fedora] Fedora Reporter: Jozef Mlich <jmlich>
Component: postgresqlAssignee: Pavel Raiskup <praiskup>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 20CC: devrim, el, hhorak, jmlich, jstanek, praiskup, tgl
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
URL: https://retrace.fedoraproject.org/faf/reports/bthash/56fbedc48f7146c79a7084f760215c0ea65e097b
Whiteboard: abrt_hash:80fd2d181c6c1296ee663b9cdbfcbf317544d610
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-12 08:47:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
File: backtrace
none
File: cgroup
none
File: core_backtrace
none
File: dso_list
none
File: environ
none
File: limits
none
File: maps
none
File: open_fds
none
File: proc_pid_status
none
File: var_log_messages
none
backtrace none

Description Jozef Mlich 2013-12-09 15:22:01 UTC
Version-Release number of selected component:
postgresql-server-9.3.2-1.fc20

Additional info:
reporter:       libreport-2.1.9
backtrace_rating: 4
cmdline:        'postgres: checkpointer process   ' '' '' '' '' '' '' '' '' '' '' '' '' '' '' ''
crash_function: errfinish
executable:     /usr/bin/postgres
kernel:         3.11.9-300.fc20.x86_64
runlevel:       N 5
type:           CCpp
uid:            26

Truncated backtrace:
Thread no. 1 (10 frames)
 #2 errfinish
 #3 UpdateControlFile
 #4 CreateCheckPoint
 #5 ShutdownXLOG
 #6 CheckpointerMain
 #7 AuxiliaryProcessMain
 #8 StartChildProcess
 #9 reaper
 #11 __select_nocancel at ../sysdeps/unix/syscall-template.S:81
 #12 ServerLoop

Comment 1 Jozef Mlich 2013-12-09 15:22:08 UTC
Created attachment 834358 [details]
File: backtrace

Comment 2 Jozef Mlich 2013-12-09 15:22:11 UTC
Created attachment 834359 [details]
File: cgroup

Comment 3 Jozef Mlich 2013-12-09 15:22:13 UTC
Created attachment 834360 [details]
File: core_backtrace

Comment 4 Jozef Mlich 2013-12-09 15:22:15 UTC
Created attachment 834361 [details]
File: dso_list

Comment 5 Jozef Mlich 2013-12-09 15:22:17 UTC
Created attachment 834362 [details]
File: environ

Comment 6 Jozef Mlich 2013-12-09 15:22:19 UTC
Created attachment 834363 [details]
File: limits

Comment 7 Jozef Mlich 2013-12-09 15:22:21 UTC
Created attachment 834364 [details]
File: maps

Comment 8 Jozef Mlich 2013-12-09 15:22:23 UTC
Created attachment 834365 [details]
File: open_fds

Comment 9 Jozef Mlich 2013-12-09 15:22:25 UTC
Created attachment 834366 [details]
File: proc_pid_status

Comment 10 Jozef Mlich 2013-12-09 15:22:29 UTC
Created attachment 834367 [details]
File: var_log_messages

Comment 11 Tom Lane 2013-12-09 15:27:18 UTC
The stack trace is pretty uninformative (unless you can get one with debug symbols).  Can we see the postmaster log?

Comment 12 Jozef Mlich 2013-12-09 16:50:23 UTC
Created attachment 834394 [details]
backtrace

Backtrace created using proper debug symbols using gdb with "thread apply all bt full" command.

Comment 13 Jozef Mlich 2013-12-09 16:53:43 UTC
Unfortunately, I am not able to provide any other logs.

Comment 14 Tom Lane 2013-12-09 17:31:43 UTC
OK, so the error is being thrown from here:

#3  0x00000000004b5705 in UpdateControlFile () at xlog.c:3759

which in 9.3.2 is this code:

	fd = BasicOpenFile(XLOG_CONTROL_FILE,
					   O_RDWR | PG_BINARY,
					   S_IRUSR | S_IWUSR);
	if (fd < 0)
		ereport(PANIC,
				(errcode_for_file_access(),
				 errmsg("could not open control file \"%s\": %m",
						XLOG_CONTROL_FILE)));

It would sure be interesting to know what errno was reported, but without the postmaster log we're probably not going to find that out (unless you can dig into the core dump?  What we'd want to look at is the contents of elog.c's errordata[0] struct.)

In any case, the control file certainly ought to be there and be readable.  So this is looking like user error or filesystem misfeasance, and not anything particularly exciting in postgres itself.

Comment 15 Jozef Mlich 2013-12-11 08:47:15 UTC
I am not very skilled with gdb command line. Following output hopefully contains errordata[0] value:

(gdb) p errordata[0]
$1 = {elevel = 22, output_to_server = 1 '\001', output_to_client = 0 '\000', 
  show_funcname = 0 '\000', hide_stmt = 0 '\000', 
  filename = 0x757fc5 "xlog.c", lineno = 3762, 
  funcname = 0x75ed70 <__func__.18668> "UpdateControlFile", 
  domain = 0x82a435 "postgres-9.3", context_domain = 0x0, 
  sqlerrcode = 16908805, 
  message = 0x1f0af50 "could not open control file \"global/pg_control\": No such file or directory", detail = 0x0, detail_log = 0x0, hint = 0x0, 
  context = 0x0, schema_name = 0x0, table_name = 0x0, column_name = 0x0, 
  datatype_name = 0x0, constraint_name = 0x0, cursorpos = 0, internalpos = 0, 
  internalquery = 0x0, saved_errno = 2}

I can provide you the core dump privately. Also, we can discuss the exploring of the core dump via IRC ( freenode/#postgresql/jmlich ).

Comment 16 Tom Lane 2013-12-11 16:35:26 UTC
(In reply to Jozef Mlich from comment #15)
>   message = 0x1f0af50 "could not open control file \"global/pg_control\": No
> such file or directory",

OK, that's what we needed to know right there.  So the question becomes, what happened to pg_control?  Postgres certainly didn't remove that file.  I'm still thinking this is user error or a filesystem problem.

Comment 17 Jozef Mlich 2013-12-12 08:47:43 UTC
Since I cannot reproduce this bug, I close it. Feel free to reopen it, if you want to explore the core dump deeper

Comment 18 Pavel Raiskup 2014-11-24 11:19:30 UTC
*** Bug 1167105 has been marked as a duplicate of this bug. ***