Bug 1235902

Summary:	Segmentation fault on ARM with psql
Product:	[Fedora] Fedora	Reporter:	ToBeReplaced
Component:	gssproxy	Assignee:	Robbie Harwood <rharwood>
Status:	CLOSED ERRATA	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	22	CC:	dpal, gdeschner, rharwood, ssorce, ToBeReplaced
Target Milestone:	---
Target Release:	---
Hardware:	armv7l
OS:	Linux
Whiteboard:
Fixed In Version:	gssproxy-0.4.1-3.fc23 gssproxy-0.4.1-2.fc22 gssproxy-0.4.1-2.fc21	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1344518 (view as bug list)		Environment:
Last Closed:	2015-11-01 02:30:44 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1344518

Description ToBeReplaced 2015-06-26 04:34:11 UTC

Description of problem:
Using gss proxy on ARM with psql causes segmentation fault

Version-Release number of selected component (if applicable):
0.4.1-1

How reproducible:
Every time

Steps to Reproduce:
1. Add a service and corresponding keytab for psql/host.example.domain to gssproxy.conf as indicated below
2. (Re)start gssproxy
3. Run as uid 1000 $ GSS_USE_PROXY="yes" psql -h database.example.domain -U psql/host.example.domain -d test

Actual results:
Segmentation Fault

Expected results:
Forwarding of credentials for psql/host.example.domain to remote postgres server.

Additional info:

Here is /etc/gssproxy/gssproxy.conf:

[gssproxy]

[service/nfs-server]
  mechs = krb5
  socket = /run/gssproxy.sock
  cred_store = keytab:/etc/krb5.keytab
  trusted = yes
  kernel_nfsd = yes
  euid = 0

[service/psql]
  mechs = krb5
  cred_store = keytab:/etc/gssproxy/psql.keytab
  cred_store = ccache:/var/lib/gssproxy/clients/krb5cc_psql
  cred_store = client_keytab:/etc/gssproxy/psql.keytab
  cred_usage = initiate
  euid = 1000

[service/nfs-client]
  mechs = krb5
  cred_store = keytab:/etc/krb5.keytab
  cred_store = ccache:FILE:/var/lib/gssproxy/clients/krb5cc_%U
  cred_store = client_keytab:/var/lib/gssproxy/clients/%U.keytab
  cred_usage = initiate
  allow_any_uid = yes
  trusted = yes
  euid = 0

And here is a gdb trace:

$ GSS_USE_PROXY="yes" gdb --args psql -h database.example.domain -U psql/host.example.domain -d test
GNU gdb (GDB) Fedora 7.9.1-13.fc22
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "armv7hl-redhat-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from psql...Reading symbols from /usr/lib/debug/usr/bin/psql.debug...done.
done.
(gdb) run
Starting program: /usr/bin/psql -h database.example.domain -U psql/host.example.domain -d test
warning: Probes-based dynamic linker interface failed.
Reverting to original interface.

Cannot parse expression `.L976 4@r4'.
(gdb) bt
#0  dl_main (phdr=<optimized out>, phnum=<optimized out>, user_entry=<optimized out>, 
    auxv=<optimized out>) at rtld.c:2167
#1  0xb6fe7b90 in _dl_sysdep_start (start_argptr=start_argptr@entry=0xbefff660, 
    dl_main=0xb6fd13c8 <dl_main>) at ../elf/dl-sysdep.c:249
#2  0xb6fd1338 in _dl_start_final (arg=0xbefff660, arg@entry=0x0, info=0xbefff3e0, 
    info@entry=0xbefff3d8) at rtld.c:306
#3  0xb6fd4d0c in _dl_start (arg=0x0) at rtld.c:414
#4  0xb6fd0ad0 in _start () from /lib/ld-linux-armhf.so.3
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) continue
Continuing.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
__GI___libc_free (mem=0x8) at malloc.c:2936                                                     [3/1004]
2936      if (chunk_is_mmapped (p))                       /* release mmapped memory. */
Missing separate debuginfos, use: dnf debuginfo-install cyrus-sasl-lib-2.1.26-23.fc22.armv7hl ncurses-li
bs-5.9-18.20150214.fc22.armv7hl nspr-4.10.8-1.fc22.armv7hl nss-3.19.1-1.0.fc22.armv7hl nss-softokn-freeb
l-3.19.1-1.0.fc22.armv7hl nss-util-3.19.1-1.0.fc22.armv7hl openldap-2.4.40-12.fc22.armv7hl pcre-8.37-1.f
c22.armv7hl
(gdb) bt
#0  __GI___libc_free (mem=0x8) at malloc.c:2936
#1  0xb6357740 in gssrpc_xdr_bytes (xdrs=0xbeefced4, cpp=0xbeefcf78, sizep=<optimized out>, 
    maxsize=400) at xdr.c:478
#2  0xb637d14c in xdr_gp_rpc_opaque_auth (xdrs=0xbeefced4, objp=0xbeefcf70) at rpcgen/gp_rpc_xdr.c:21
#3  0xb637d1fc in xdr_gp_rpc_accepted_reply (xdrs=0xbeefced4, objp=0xbeefcf70)
    at rpcgen/gp_rpc_xdr.c:99
#4  0xb637d410 in xdr_gp_rpc_reply_header (xdrs=0xbeefced4, objp=0xbeefcf6c) at rpcgen/gp_rpc_xdr.c:202
#5  0xb637d45c in xdr_gp_rpc_msg_union (xdrs=0xbeefced4, objp=0xbeefcf68) at rpcgen/gp_rpc_xdr.c:226
#6  0xb637d4ac in xdr_gp_rpc_msg (xdrs=0xbeefced4, xdrs@entry=0xbeefcecc, objp=objp@entry=0xbeefcf64)
    at rpcgen/gp_rpc_xdr.c:240
#7  0xb6356f24 in gssrpc_xdr_free (proc=0xb637d488 <xdr_gp_rpc_msg>, objp=objp@entry=0xbeefcf64)
    at xdr.c:81
#8  0xb6382df4 in gpm_make_call (proc=550216, proc@entry=8, arg=arg@entry=0xbeffd040, res=0x8, 
    res@entry=0xbeffcfe8) at src/client/gpm_common.c:504
#9  0xb6381e44 in gpm_init_sec_context (minor_status=0xbeffd19c, minor_status@entry=0xaffb0, 
    cred_handle=<optimized out>, context_handle=0xaffa0, context_handle@entry=0x88f08, 
    target_name=0xafe98, mech_type=mech_type@entry=0xb1870, req_flags=2, req_flags@entry=3065765364, 
    time_req=0, time_req@entry=727152, input_cb=input_cb@entry=0x0, input_token=0x0, 
    input_token@entry=0x14, actual_mech_type=0x0, actual_mech_type@entry=0x86548 <__stack_chk_guard>, 
    output_token=0x88f08, output_token@entry=0x0, ret_flags=ret_flags@entry=0x0, 
    time_rec=time_rec@entry=0x0) at src/client/gpm_init_sec_context.c:91
#10 0xb6385814 in gssi_init_sec_context (minor_status=minor_status@entry=0xbeffd284, 
    claimant_cred_handle=0x0, claimant_cred_handle@entry=0xb23e0, context_handle=0xb23e8, 
    context_handle@entry=0x8, target_name=0xa12b0, target_name@entry=0xafd30, mech_type=0xb1870, 
    req_flags=2, req_flags@entry=0, time_req=0, time_req@entry=550216, input_cb=0x0, 
    input_cb@entry=0x2, input_token=input_token@entry=0x0, 
    actual_mech_type=actual_mech_type@entry=0x0, output_token=output_token@entry=0x88f08, 
    ret_flags=ret_flags@entry=0x0, time_rec=time_rec@entry=0x0)
    at src/mechglue/gpp_init_sec_context.c:177
#11 0xb6bbddf4 in gss_init_sec_context (minor_status=0xbeffd284, minor_status@entry=0xbeffd27c, 
    claimant_cred_handle=claimant_cred_handle@entry=0x0, context_handle=context_handle@entry=0x88ef8, 
    target_name=0xb5498, req_mech_type=req_mech_type@entry=0x0, req_flags=req_flags@entry=2, 
    time_req=time_req@entry=0, input_chan_bindings=input_chan_bindings@entry=0x0, input_token=0x0, 
    actual_mech_type=actual_mech_type@entry=0x0, output_token=output_token@entry=0x88f08, 
    ret_flags=ret_flags@entry=0x0, time_rec=time_rec@entry=0x0) at g_init_sec_context.c:210
#12 0xb6fa0688 in pg_GSS_continue (conn=conn@entry=0x88ca8) at fe-auth.c:108
#13 0xb6fa0a2c in pg_GSS_startup (conn=0x88ca8) at fe-auth.c:222
#14 pg_fe_sendauth (areq=<optimized out>, conn=conn@entry=0x88ca8) at fe-auth.c:596
#15 0xb6fa4df4 in PQconnectPoll (conn=conn@entry=0x88ca8) at fe-connect.c:2495
#16 0xb6fa5a44 in connectDBComplete (conn=conn@entry=0x88ca8) at fe-connect.c:1596
#17 0xb6fa61d4 in PQconnectdbParams (keywords=keywords@entry=0x88c58, values=values@entry=0x88c80, 
    expand_dbname=expand_dbname@entry=1) at fe-connect.c:462
#18 0x00014410 in main (argc=<optimized out>, argv=0x4c150) at startup.c:219
(gdb) continue
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.

Comment 1 ToBeReplaced 2015-06-26 04:38:33 UTC

This was unable to be reproduced on x86_64. The same configuration resulted in the expected (desired) behavior. The test on x86_64 was run on a machine that included a lot of other software (gnome-desktop, for example). The armv7l machine was based off of Fedora 22 Minimal, and only contained upgrades, installation and registration as a freeipa-client, and psql.

Comment 2 Roland Mainz 2015-07-03 16:02:48 UTC

(In reply to ToBeReplaced from comment #1)
> This was unable to be reproduced on x86_64. The same configuration resulted
> in the expected (desired) behavior. The test on x86_64 was run on a machine
> that included a lot of other software (gnome-desktop, for example). The
> armv7l machine was based off of Fedora 22 Minimal, and only contained
> upgrades, installation and registration as a freeipa-client, and psql.

Raw guessing: 32bit issue or compiler issue... ;-/

Reporter:
Could you please run *both* gssproxy AND psql under valgrind control to see if it produces any warnings/errors prior to the crash...

Comment 3 Roland Mainz 2015-07-03 16:05:00 UTC

Reporter:
I need to reproduce the problem... is there any recommended emulator (with ethernet network access) you can recommend (preferably with instructions how to install Fedora on it) ?

Comment 4 ToBeReplaced 2015-07-03 16:15:47 UTC

(In reply to Roland Mainz from comment #3)
> Reporter:
> I need to reproduce the problem... is there any recommended emulator (with
> ethernet network access) you can recommend (preferably with instructions how
> to install Fedora on it) ?

Fedora supports Versatile Express Emulation with QEMU, so you might try that. I have never used it, so I can't provide any additional commentary. Instructions here: https://fedoraproject.org/wiki/Architectures/ARM/F22/Installation#For_Versatile_Express_Emulation_with_QEMU

The problem presented itself on a BeagleBone Black. I imagine it would present on any of the armv7l devices that Fedora supports. (ex. PandaBoard, CubieTruck). Installation instructions are found on the same page as above.

As for valgrind; I'll do that, but it likely won't be until Monday.

Comment 5 Fedora Admin XMLRPC Client 2015-09-04 00:08:47 UTC

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 6 Robbie Harwood 2015-09-10 18:08:29 UTC

Did you get the chance to valgrind this?

Comment 7 ToBeReplaced 2015-09-11 17:55:09 UTC

No, I had to move forward. I still intend to get to this, but unfortunately I have to keep putting it off.

In case it's easier to reproduce, sudo 1.8.14p3-1.fc22 also segfaults on ARM devices that are IPA clients (and thus use gssproxy), which I suspect is related.

Comment 8 ToBeReplaced 2015-09-25 19:36:40 UTC

I had an old Fedora 21 machine I was able to brick, so I upgraded to rawhide packages (same versions). gssproxy 0.4.1

The psql side (with GSS_USE_PROXY="YES"):

==1017== Memcheck, a memory error detector
==1017== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==1017== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==1017== Command: psql -h database.example.domain -U psql/host.example.domain -d test
==1017== 
disInstr(arm): unhandled instruction: 0xEC510F1E
                 cond=14(0xE) 27:20=197(0xC5) 4:4=1 3:0=14(0xE)
==1017== valgrind: Unrecognised instruction at address 0x4ae0be8.
==1017==    at 0x4AE0BE8: _armv7_tick (armv4cpuid.S:94)
==1017== Your program just tried to execute an instruction that Valgrind
==1017== did not recognise.  There are two possible reasons for this.
==1017== 1. Your program has a bug and erroneously jumped to a non-code
==1017==    location.  If you are running Memcheck and you just saw a
==1017==    warning about a bad jump, it's probably your program's fault.
==1017== 2. The instruction is legitimate but Valgrind doesn't handle it,
==1017==    i.e. it's Valgrind's fault.  If you think this is the case or
==1017==    you are not sure, please let us know and we'll try to fix it.
==1017== Either way, Valgrind will now raise a SIGILL signal which will
==1017== probably kill your program.
==1017== Invalid free() / delete / delete[] / realloc()
==1017==    at 0x4836A08: free (in /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)
==1017==  Address 0x8 is not stack'd, malloc'd or (recently) free'd
==1017== 
psql: GSSAPI continuation error: Unspecified GSS failure.  Minor code may provide more information
GSSAPI continuation error: No Kerberos credentials available
==1017== 
==1017== HEAP SUMMARY:
==1017==     in use at exit: 88,579 bytes in 3,096 blocks
==1017==   total heap usage: 6,172 allocs, 3,077 frees, 548,384 bytes allocated
==1017== 
==1017== LEAK SUMMARY:
==1017==    definitely lost: 64 bytes in 4 blocks
==1017==    indirectly lost: 114 bytes in 6 blocks
==1017==      possibly lost: 0 bytes in 0 blocks
==1017==    still reachable: 88,401 bytes in 3,086 blocks
==1017==         suppressed: 0 bytes in 0 blocks
==1017== Rerun with --leak-check=full to see details of leaked memory
==1017== 
==1017== For counts of detected and suppressed errors, rerun with: -v
==1017== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 12 from 8)

The gssproxy side showed no errors:
==1055== Memcheck, a memory error detector
==1055== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==1055== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==1055== Command: gssproxy -i
==1055== 
gssproxy[1055]: (OID: { 1 2 840 113554 1 2 2 }) Unspecified GSS failure.  Minor code may provide more information, No credentials cache found
^C==1055== 
==1055== HEAP SUMMARY:
==1055==     in use at exit: 11,412 bytes in 95 blocks
==1055==   total heap usage: 1,917 allocs, 1,822 frees, 291,081 bytes allocated
==1055== 
==1055== LEAK SUMMARY:
==1055==    definitely lost: 66 bytes in 3 blocks
==1055==    indirectly lost: 118 bytes in 6 blocks
==1055==      possibly lost: 5,172 bytes in 35 blocks
==1055==    still reachable: 6,056 bytes in 51 blocks
==1055==         suppressed: 0 bytes in 0 blocks
==1055== Rerun with --leak-check=full to see details of leaked memory
==1055== 
==1055== For counts of detected and suppressed errors, rerun with: -v
==1055== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 14 from 2)

Looks like the `armv7_tick` might be the root of both issues.

On sudo 1.8.14p3, with sssd 1.13.0 running (no gssproxy required):
(gdb) run
Starting program: /usr/bin/sudo true
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
sss_sudo_free_result (result=0x19c) at src/sss_client/sudo/sss_sudo.c:204
204         sss_sudo_free_rules(result->num_rules, result->rules);
(gdb) bt
#0  sss_sudo_free_result (result=0x19c) at src/sss_client/sudo/sss_sudo.c:204
#1  0xb6a541d8 in sudo_sss_setdefs (nss=<optimized out>) at ./sssd.c:456
#2  0xb6a4cf30 in sudoers_policy_init (info=info@entry=0xbeb2e450, envp=envp@entry=0xb6f94eb0 <__stack_chk_guard>) at ./sudoers.c:195
#3  0xb6a47e38 in sudoers_policy_open (version=<optimized out>, conversation=<optimized out>, plugin_printf=<optimized out>, settings=0xb8506398, 
    user_info=0xb85008e8, envp=0xbeb2e7a0, args=0x0) at ./policy.c:621
#4  0xb6f99c94 in policy_open (plugin=0xb6fc5fac <policy_plugin>, user_env=0xd696914, user_info=0xb85008e8, settings=<optimized out>)
    at ./sudo.c:1189
#5  main (argc=<optimized out>, argv=<optimized out>, envp=0xd696914) at ./sudo.c:206
(gdb) c
Continuing.

On starting sssd:
(gdb) run
Starting program: /usr/sbin/sssd -i
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
_armv7_tick () at armv4cpuid.S:94
94              mrrc    p15,1,r0,r1,c14         @ CNTVCT
(gdb) bt
#0  _armv7_tick () at armv4cpuid.S:94
#1  0xb6a7af9c in OPENSSL_cpuid_setup () at armcap.c:157
#2  0xb6fe86b8 in call_init (l=<optimized out>, argc=argc@entry=2, argv=argv@entry=0xbefff794, env=env@entry=0xbefff7a0) at dl-init.c:76
#3  0xb6fe8814 in call_init (env=<optimized out>, argv=<optimized out>, argc=<optimized out>, l=<optimized out>) at dl-init.c:34
#4  _dl_init (main_map=0xb6fff908, argc=2, argv=0xbefff794, env=0xbefff7a0) at dl-init.c:124
#5  0xb6fd8b44 in _dl_start_user () from /lib/ld-linux-armhf.so.3
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) c
Continuing.
Detaching after fork from child process 1235.
Detaching after fork from child process 1236.
Detaching after fork from child process 1237.
Detaching after fork from child process 1238.
Detaching after fork from child process 1239.
Detaching after fork from child process 1240.
(gdb) c
Continuing.

Comment 9 ToBeReplaced 2015-09-25 19:44:43 UTC

Also of note, I was able to install the missing debuginfos if I removed postgresql-debuginfo (space limited). This yielded:

Missing separate debuginfos, use: debuginfo-install postgresql-9.3.9-1.fc21.armv7hl
(gdb) run
Starting program: /usr/bin/psql -h database.example.domain -U psql/host.example.domain -d test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
_armv7_tick () at armv4cpuid.S:94
94              mrrc    p15,1,r0,r1,c14         @ CNTVCT
(gdb) bt
#0  _armv7_tick () at armv4cpuid.S:94
#1  0xb6c39f9c in OPENSSL_cpuid_setup () at armcap.c:157
#2  0xb6fe86b8 in call_init (l=<optimized out>, argc=argc@entry=7, argv=argv@entry=0xbefff654, env=env@entry=0xbefff674) at dl-init.c:76
#3  0xb6fe8814 in call_init (env=<optimized out>, argv=<optimized out>, argc=<optimized out>, l=<optimized out>) at dl-init.c:34
#4  _dl_init (main_map=0xb6fff908, argc=7, argv=0xbefff654, env=0xbefff674) at dl-init.c:124
#5  0xb6fd8b44 in _dl_start_user () from /lib/ld-linux-armhf.so.3
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
__GI___libc_free (mem=0x8) at malloc.c:2934
2934      if (chunk_is_mmapped (p))                       /* release mmapped memory. */
(gdb) bt
#0  __GI___libc_free (mem=0x8) at malloc.c:2934
#1  0xb6407840 in gssrpc_xdr_bytes () from /lib/libgssrpc.so.4
#2  0xb642515c in xdr_gp_rpc_opaque_auth () from /usr/lib/gssproxy/proxymech.so
#3  0xb6425270 in xdr_gp_rpc_accepted_reply () from /usr/lib/gssproxy/proxymech.so
#4  0xb6425484 in xdr_gp_rpc_reply_header () from /usr/lib/gssproxy/proxymech.so
#5  0xb64254d0 in xdr_gp_rpc_msg_union () from /usr/lib/gssproxy/proxymech.so
#6  0xb6425520 in xdr_gp_rpc_msg () from /usr/lib/gssproxy/proxymech.so
#7  0xb6406fc8 in gssrpc_xdr_free () from /lib/libgssrpc.so.4
#8  0xb642ae20 in gpm_make_call () from /usr/lib/gssproxy/proxymech.so
#9  0xb6429e28 in gpm_init_sec_context () from /usr/lib/gssproxy/proxymech.so
#10 0xb642d82c in gssi_init_sec_context () from /usr/lib/gssproxy/proxymech.so
#11 0xb6af80f0 in gss_init_sec_context () from /lib/libgssapi_krb5.so.2
#12 0xb6fa8d48 in pg_GSS_continue () from /lib/libpq.so.5
#13 0xb6fa9108 in pg_fe_sendauth () from /lib/libpq.so.5
#14 0xb6fad3c4 in PQconnectPoll () from /lib/libpq.so.5
#15 0xb6fadfc8 in connectDBComplete () from /lib/libpq.so.5
#16 0xb6fae72c in PQconnectdbParams () from /lib/libpq.so.5
#17 0x0000c290 in main ()
(gdb) c
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.

Comment 10 Robbie Harwood 2015-10-20 17:32:55 UTC

I have reproduced this on a test VM (read: *extremely* slow, but I can install all the debuginfos I want).  I am intimately familiar with the postgres code, so I'm walking through that.

Some notes:
- actually configuring gssproxy seems to be unnecessary; the bug happens before that
- I'm running as an IPA user with non-1000 uid
- we really should release a new gssproxy so that we can use the symbolic UIDs everywhere
- bug does not trigger if GSS_USE_PROXY="no"
- a valid postgres server is needed because it doesn't access creds until two send-and-reply cycles have happened
- ... but since the handshake doesn't complete anyway, you don't actually need to be a user

Comment 11 Simo Sorce 2015-10-21 00:48:34 UTC

It would be really useful to have the corresponding gssproxy logs when this happens.

Comment 12 Simo Sorce 2015-10-21 00:51:07 UTC

Running the client under valgrind may also help, though if it is already slow, valgrind won't help for sure.

Comment 13 Simo Sorce 2015-10-21 00:59:47 UTC

Adding a memset(&msg, 0, sizeof(gp_rpc_msg)); just before we decode the header may be a good idea, I suspect the issue is some failure in the XDR layer that get then clobbered because the msg structure is dirty.

Comment 14 Robbie Harwood 2015-10-21 16:03:11 UTC

Logs from gssproxy:

Debug Enabled
Client connected (fd = 11) (pid = 27905) (uid = 0) (gid = 0) (context = system_u:system_r:kernel_t:s0)
Client connected (fd = 12) (pid = 27914) (uid = 1523400003) (gid = 1523400003) (context = unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023)
gp_rpc_execute: executing 8 (GSSX_INIT_SEC_CONTEXT) for service "psql", euid: 1523400003, socket: (null)


From valgrind (this matches what ToBeReplaced was seeing):

==27983== 2 errors in context 1 of 1:
==27983== Invalid free() / delete / delete[] / realloc()
==27983==    at 0x4846A08: free (in /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)
==27983==  Address 0x8 is not stack'd, malloc'd or (recently) free'd

I don't know what was going on before, but I don't seem to be able to reproduce this with a custom-built gssproxy anymore.  However, adding the memset in causes the segfault to occur.

Comment 15 Fedora Update System 2015-10-21 17:45:25 UTC

gssproxy-0.4.1-3.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-20d1e0d890

Comment 16 Fedora Update System 2015-10-21 17:45:32 UTC

gssproxy-0.4.1-2.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2015-91663ccfea

Comment 17 Robbie Harwood 2015-10-21 17:47:28 UTC

Whoops, that should read "causes the segfault to *not* occur".  Otherwise this chain of events is kind of confusing...

Comment 18 Fedora Update System 2015-10-24 12:09:18 UTC

gssproxy-0.4.1-3.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update gssproxy'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-20d1e0d890

Comment 19 Fedora Update System 2015-10-26 10:27:51 UTC

gssproxy-0.4.1-2.fc21 has been pushed to the Fedora 21 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update gssproxy'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-25ab0dae49

Comment 20 Fedora Update System 2015-10-26 18:30:00 UTC

gssproxy-0.4.1-2.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update gssproxy'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-91663ccfea

Comment 21 Fedora Update System 2015-11-01 02:30:41 UTC

gssproxy-0.4.1-3.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Comment 22 Fedora Update System 2015-11-01 21:49:47 UTC

gssproxy-0.4.1-2.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.

Comment 23 Fedora Update System 2015-11-04 22:50:55 UTC

gssproxy-0.4.1-2.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.

Comment 24 Dominik 'Rathann' Mierzejewski 2016-03-30 10:45:02 UTC

(In reply to ToBeReplaced from comment #9)
> Also of note, I was able to install the missing debuginfos if I removed
> postgresql-debuginfo (space limited). This yielded:
> 
> Missing separate debuginfos, use: debuginfo-install
> postgresql-9.3.9-1.fc21.armv7hl
> (gdb) run
> Starting program: /usr/bin/psql -h database.example.domain -U
> psql/host.example.domain -d test
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/libthread_db.so.1".
> 
> Program received signal SIGILL, Illegal instruction.
> _armv7_tick () at armv4cpuid.S:94
> 94              mrrc    p15,1,r0,r1,c14         @ CNTVCT
> (gdb) bt

FWIW, the crash at mrrc instruction is due to performance counters being inaccessible from userland on ARMv7 by default, see for example http://neocontra.blogspot.co.uk/2013/05/user-mode-performance-counters-for.html .