Bug 1536555 - gcc: Uses %ebx in stack probing loop in PIC mode on i386
Summary: gcc: Uses %ebx in stack probing loop in PIC mode on i386
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: gcc
Version: rawhide
Hardware: i686
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Jeff Law
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-19 16:08 UTC by Bruno Wolff III
Modified: 2018-01-27 10:34 UTC (History)
24 users (show)

Fixed In Version: gcc-7.2.1-8.fc28
Clone Of:
Environment:
Last Closed: 2018-01-27 10:34:18 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
coredump (442.92 KB, application/octet-stream)
2018-01-22 13:56 UTC, Vojtech Trefny
no flags Details


Links
System ID Private Priority Status Summary Last Updated
GNU Compiler Collection 83994 0 None None None 2018-01-23 14:53:31 UTC

Description Bruno Wolff III 2018-01-19 16:08:55 UTC
User-Agent:       
Build Identifier: 

openssh-server-7.6p1-5.1.fc28 does not work on i686, but openssh-server-7.6p1-4.fc28 does. openssh-server-7.6p1-5.1.fc28 does work on x86_64.
I got the following console warning:
Jan 18 22:27:29 wolff kernel: sshd[28582]: segfault at 0 ip   (null) sp bfb589cc error 4 in sshd[4be000+dc000]

Reproducible: Always

Comment 1 Jakub Jelen 2018-01-22 08:04:08 UTC
Please, provide a backtrace or coredump.

I don't have i386 rawhide server to reproduce the issue. Also I don't see any obvious error in the recent changes that could cause it.

Comment 2 Vojtech Trefny 2018-01-22 13:56:40 UTC
Created attachment 1384446 [details]
coredump

Comment 4 Jakub Jelen 2018-01-22 15:52:02 UTC
Backtrace on on the Vojtech's machine points to select() syscall:

(gdb) bt
#0  0x3839636d in ?? ()
#1  0x00471373 in wait_until_can_do_something (max_time_ms=0, nallocp=0xbf9131ac, maxfdp=0xbf9131a8, writesetp=0xbf9131a4, readsetp=0xbf9131a0, connection_out=3, connection_in=3, ssh=0x1898c90)
    at serverloop.c:267
#2  server_loop2 (ssh=<optimized out>, ssh@entry=0x1898c90, authctxt=<optimized out>, authctxt@entry=0x1899df0) at serverloop.c:403
#3  0x00479af4 in do_authenticated2 (authctxt=0x1899df0, ssh=0x1898c90) at session.c:2637
#4  do_authenticated (ssh=0x1898c90, authctxt=0x1899df0) at session.c:312
#5  0x00463a2e in main (ac=<optimized out>, av=<optimized out>) at sshd.c:2220

Strace does not show anything more interesting either:

[pid  4240] clock_gettime(CLOCK_BOOTTIME, {tv_sec=8263, tv_nsec=585187385}) = 0
[pid  4240] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x3839636d} ---
[pid  4240] +++ killed by SIGSEGV (core dumped) +++


I took the opportunity to rebuild the previous version on the new rawhide (with another fix that prevented building [1]) and the result was still the same -- same segfault and unable to connect.

So this is most probably not a bug in OpenSSH, but in some of its dependencies, which were updated since the last build (2017-12-14) so there is probably more  possibilities.

[1] https://src.fedoraproject.org/rpms/openssh/c/38b67ad

Comment 5 Jakub Jelen 2018-01-22 16:38:11 UTC
This is related to bug #1515865 which is modifying the CFLAGS during build and is breaking the packages on ix86. Downgrading the redhat-rpm-config to 

  redhat-rpm-config.noarch 74-1.fc28                                                                    

and rebuilding the openssh package made it working for me again.

We will probably need to exclude also the i686 architecture or find the underlying problem, because at this point, the rawhide OpenSSH is unable to authenticate users on this architecture.

Comment 6 Florian Weimer 2018-01-22 18:04:36 UTC
This is most likely a gcc bug if it is indeed caused by enabling -fstack-clash-protection.

Is it sufficient to upgrade an i386 VM to rawhide to reproduce this?

Comment 7 Jakub Jelen 2018-01-23 08:33:05 UTC
Yes, upgrading i386 VM to rawhide should do that (I tested on KVM VM). Or vtrefny can give you access to "ready-made" machine demonstrating the problem.

It sounds like gcc bug so feel free to reassign it to the appropriate component. But I was not able to collect more debug information then mentioned above.

Comment 8 Florian Weimer 2018-01-23 13:45:22 UTC
It's a GCC bug:

Dump of assembler code for function process_input:
   0x5656b210 <+0>:     lea    -0x4000(%esp),%ebx
   0x5656b217 <+7>:     sub    $0x1000,%esp
   0x5656b21d <+13>:    orl    $0x0,(%esp)
   0x5656b221 <+17>:    cmp    %ebx,%esp
   0x5656b223 <+19>:    jne    0x5656b217 <process_input+7>
   0x5656b225 <+21>:    sub    $0x3c,%esp
   0x5656b228 <+24>:    mov    %ebx,0x402c(%esp)
   0x5656b22f <+31>:    call   0x56560490 <__x86.get_pc_thunk.bx>

So this function clobbers %ebx, despite being compiled in PIC mode.

I still need to construct a reproducer from this.

Comment 9 Florian Weimer 2018-01-23 14:42:40 UTC
Reproducer (compile with -O2 -m32 -march=i686 -fpic -fstack-clash-protection):

void f1 (char *);

__attribute__ ((regparm (3)))
int
f2 (int arg1, int arg2, int arg3)
{
  char buf[16384];
  f1 (buf);
  f1 (buf);
  return 0;
}

Note that GCC automatically selects three register parameters for the static function because its address never leaks, so a custom ABI can be used.  -march=i686 appears to be required to trigger this.

Comment 10 Jeff Law 2018-01-23 23:39:16 UTC
Patch posted upstream.  Ideally we'll have this wrapped up tomorrow.

Comment 11 Jeff Law 2018-01-24 22:14:36 UTC
gcc-7.2.1-8.fc28 with a fix for this is building in koji.

Comment 12 Bruno Wolff III 2018-01-24 22:33:27 UTC
Are affected packages going to get rebuilt too? rpm and dnf are having problems for me after last night's updates and I'm probably going to need to go to some extra lengths to get updates installed. I'd like to fix everything at once.

Comment 13 Jeff Law 2018-01-24 22:34:41 UTC
I can't really speak for those packages....  Of course everything for F28 is going to be rebuilt next week as part of the gcc-8 mass rebuild...

Comment 14 Bruno Wolff III 2018-01-25 07:56:58 UTC
It does look like rpm probably got hit by this and the bad rpm went out, so i686 rawhide users are probably going to have issues recovering. Downgrading is broken, so falling back isn't easy. I'll figure out something, but getting a new rpm build out sooner rather than later might save some trouble for people that don't update daily.

Comment 15 Florian Weimer 2018-01-25 11:59:15 UTC
(In reply to Bruno Wolff III from comment #14)
> It does look like rpm probably got hit by this and the bad rpm went out, so
> i686 rawhide users are probably going to have issues recovering. Downgrading
> is broken, so falling back isn't easy. I'll figure out something, but
> getting a new rpm build out sooner rather than later might save some trouble
> for people that don't update daily.

Yes, looks like librpm contains a bad sequence:

Dump of assembler code for function fsmVerify:
   0x0002d140 <+0>:     lea    -0x10000(%esp),%ebx
   0x0002d147 <+7>:     sub    $0x1000,%esp
   0x0002d14d <+13>:    orl    $0x0,(%esp)
   0x0002d151 <+17>:    cmp    %ebx,%esp
   0x0002d153 <+19>:    jne    0x2d147 <fsmVerify+7>
   0x0002d155 <+21>:    sub    $0xcc,%esp
   0x0002d15b <+27>:    mov    %ebx,0x100bc(%esp)
   0x0002d162 <+34>:    call   0xc950 <__x86.get_pc_thunk.bx>
   0x0002d167 <+39>:    add    $0x48461,%ebx

gcc is still building, but I will rebuild rpm once gcc is done.

Comment 16 Bruno Wolff III 2018-01-25 20:31:28 UTC
Thanks. I see that rpm finished, but I can't test it until tonight. It looks like I can use rpm2cpio and cpio to overwrite the current rpm packages without using rpm. I'm hoping that will work well enough to give me a working rpm.
The openssh build failed because the pam_ssh_agent_auth (and debug version) use a different NVR that appears to be needed to be updately separately from the main NVR for the package. Though once I have rpm/dnf working again I can downgrade openssh to fix things.

Comment 17 Florian Weimer 2018-01-25 20:49:51 UTC
(In reply to Bruno Wolff III from comment #16)
> Thanks. I see that rpm finished, but I can't test it until tonight. It looks
> like I can use rpm2cpio and cpio to overwrite the current rpm packages
> without using rpm. I'm hoping that will work well enough to give me a
> working rpm.
> The openssh build failed because the pam_ssh_agent_auth (and debug version)
> use a different NVR that appears to be needed to be updately separately from
> the main NVR for the package. Though once I have rpm/dnf working again I can
> downgrade openssh to fix things.

Oops, I upgraded the wrong bug.  Please see this comment for recovery instructions:

https://bugzilla.redhat.com/show_bug.cgi?id=1538648#c9

Comment 18 Jakub Jelen 2018-01-26 14:44:55 UTC
I just tried to rebuild OpenSSH and the configure step already failed with error on x86_64 (with gcc.x86_64 7.3.1-1.fc28) [1]:

checking for gcc... gcc
checking whether the C compiler works... no
configure: error: in `/builddir/build/BUILD/openssh-7.6p1':
configure: error: C compiler cannot create executables
See `config.log' for more details
RPM build errors:

It looks like GCC is still not in a shape. Is this issue already tracked in the bug #1538648 or is it a new issue?

[1] https://koji.fedoraproject.org/koji/taskinfo?taskID=24463856

Comment 19 Florian Weimer 2018-01-26 14:51:28 UTC
(In reply to Jakub Jelen from comment #18)
> I just tried to rebuild OpenSSH and the configure step already failed with
> error on x86_64 (with gcc.x86_64 7.3.1-1.fc28) [1]:
> 
> checking for gcc... gcc
> checking whether the C compiler works... no
> configure: error: in `/builddir/build/BUILD/openssh-7.6p1':
> configure: error: C compiler cannot create executables
> See `config.log' for more details
> RPM build errors:
> 
> It looks like GCC is still not in a shape. Is this issue already tracked in
> the bug #1538648 or is it a new issue?
> 
> [1] https://koji.fedoraproject.org/koji/taskinfo?taskID=24463856

No, I think this was fixed in annobin-3.1-3.fc28, which had to be rebuilt for gcc-7.3.1.

Comment 20 Jakub Jelen 2018-01-26 15:28:46 UTC
Thanks. The build goes fine now. I will rebuild the package in rawhide to have working OpenSSH.

Bruno, Vojtech, can you verify that the this build solves the problems in your environment?

https://koji.fedoraproject.org/koji/taskinfo?taskID=24470109

Comment 21 Bruno Wolff III 2018-01-26 15:51:12 UTC
The rpm file you gave me fixed rpm (and via that dnf) and was easier than what I had been planning. I updated to the compose from Fedora-Rawhide-20180125.n.0 overnight and am in the process of updating to Fedora-Rawhide-20180126.n.0.
I was also able to downgrade openssh-server so that I could access the machine remotely.
I still am seeing something weird with su where 'less' appears to be getting run before I get the new shell prompt. But other than timing of when this changed, there isn't anything tying it too this bug.
So as far as I can tell you have fixed the root cause and I just need to hold openssh-server until there is a new build. Updating to the bad version doesn't break existing connections so it isn't a big deal as I can downgrade it back remotely.

Comment 22 Bruno Wolff III 2018-01-26 20:01:00 UTC
openssh-7.6p1-6.fc28 fixes openssh for me.

Comment 23 Jakub Jelen 2018-01-27 09:45:38 UTC
Thank you for the verification.

Since this issue looks resolved, we should be able to close this bug. But it is now on gcc, so I don't want to touch it, if there is something I miss.

Comment 24 Florian Weimer 2018-01-27 10:34:18 UTC
Note that there is a closely related bug 1538648, not yet fixed but worked around with a redhat-rpm-config change (but which is not applied to all builds, see bug 1217376, bug 1284684, bug1539092 for a few known settings where older redhat-rpm-config flags are used).


Note You need to log in before you can comment on or make changes to this bug.