User-Agent: Build Identifier: openssh-server-7.6p1-5.1.fc28 does not work on i686, but openssh-server-7.6p1-4.fc28 does. openssh-server-7.6p1-5.1.fc28 does work on x86_64. I got the following console warning: Jan 18 22:27:29 wolff kernel: sshd[28582]: segfault at 0 ip (null) sp bfb589cc error 4 in sshd[4be000+dc000] Reproducible: Always
Please, provide a backtrace or coredump. I don't have i386 rawhide server to reproduce the issue. Also I don't see any obvious error in the recent changes that could cause it.
Created attachment 1384446 [details] coredump
Backtrace on on the Vojtech's machine points to select() syscall: (gdb) bt #0 0x3839636d in ?? () #1 0x00471373 in wait_until_can_do_something (max_time_ms=0, nallocp=0xbf9131ac, maxfdp=0xbf9131a8, writesetp=0xbf9131a4, readsetp=0xbf9131a0, connection_out=3, connection_in=3, ssh=0x1898c90) at serverloop.c:267 #2 server_loop2 (ssh=<optimized out>, ssh@entry=0x1898c90, authctxt=<optimized out>, authctxt@entry=0x1899df0) at serverloop.c:403 #3 0x00479af4 in do_authenticated2 (authctxt=0x1899df0, ssh=0x1898c90) at session.c:2637 #4 do_authenticated (ssh=0x1898c90, authctxt=0x1899df0) at session.c:312 #5 0x00463a2e in main (ac=<optimized out>, av=<optimized out>) at sshd.c:2220 Strace does not show anything more interesting either: [pid 4240] clock_gettime(CLOCK_BOOTTIME, {tv_sec=8263, tv_nsec=585187385}) = 0 [pid 4240] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x3839636d} --- [pid 4240] +++ killed by SIGSEGV (core dumped) +++ I took the opportunity to rebuild the previous version on the new rawhide (with another fix that prevented building [1]) and the result was still the same -- same segfault and unable to connect. So this is most probably not a bug in OpenSSH, but in some of its dependencies, which were updated since the last build (2017-12-14) so there is probably more possibilities. [1] https://src.fedoraproject.org/rpms/openssh/c/38b67ad
This is related to bug #1515865 which is modifying the CFLAGS during build and is breaking the packages on ix86. Downgrading the redhat-rpm-config to redhat-rpm-config.noarch 74-1.fc28 and rebuilding the openssh package made it working for me again. We will probably need to exclude also the i686 architecture or find the underlying problem, because at this point, the rawhide OpenSSH is unable to authenticate users on this architecture.
This is most likely a gcc bug if it is indeed caused by enabling -fstack-clash-protection. Is it sufficient to upgrade an i386 VM to rawhide to reproduce this?
Yes, upgrading i386 VM to rawhide should do that (I tested on KVM VM). Or vtrefny can give you access to "ready-made" machine demonstrating the problem. It sounds like gcc bug so feel free to reassign it to the appropriate component. But I was not able to collect more debug information then mentioned above.
It's a GCC bug: Dump of assembler code for function process_input: 0x5656b210 <+0>: lea -0x4000(%esp),%ebx 0x5656b217 <+7>: sub $0x1000,%esp 0x5656b21d <+13>: orl $0x0,(%esp) 0x5656b221 <+17>: cmp %ebx,%esp 0x5656b223 <+19>: jne 0x5656b217 <process_input+7> 0x5656b225 <+21>: sub $0x3c,%esp 0x5656b228 <+24>: mov %ebx,0x402c(%esp) 0x5656b22f <+31>: call 0x56560490 <__x86.get_pc_thunk.bx> So this function clobbers %ebx, despite being compiled in PIC mode. I still need to construct a reproducer from this.
Reproducer (compile with -O2 -m32 -march=i686 -fpic -fstack-clash-protection): void f1 (char *); __attribute__ ((regparm (3))) int f2 (int arg1, int arg2, int arg3) { char buf[16384]; f1 (buf); f1 (buf); return 0; } Note that GCC automatically selects three register parameters for the static function because its address never leaks, so a custom ABI can be used. -march=i686 appears to be required to trigger this.
Patch posted upstream. Ideally we'll have this wrapped up tomorrow.
gcc-7.2.1-8.fc28 with a fix for this is building in koji.
Are affected packages going to get rebuilt too? rpm and dnf are having problems for me after last night's updates and I'm probably going to need to go to some extra lengths to get updates installed. I'd like to fix everything at once.
I can't really speak for those packages.... Of course everything for F28 is going to be rebuilt next week as part of the gcc-8 mass rebuild...
It does look like rpm probably got hit by this and the bad rpm went out, so i686 rawhide users are probably going to have issues recovering. Downgrading is broken, so falling back isn't easy. I'll figure out something, but getting a new rpm build out sooner rather than later might save some trouble for people that don't update daily.
(In reply to Bruno Wolff III from comment #14) > It does look like rpm probably got hit by this and the bad rpm went out, so > i686 rawhide users are probably going to have issues recovering. Downgrading > is broken, so falling back isn't easy. I'll figure out something, but > getting a new rpm build out sooner rather than later might save some trouble > for people that don't update daily. Yes, looks like librpm contains a bad sequence: Dump of assembler code for function fsmVerify: 0x0002d140 <+0>: lea -0x10000(%esp),%ebx 0x0002d147 <+7>: sub $0x1000,%esp 0x0002d14d <+13>: orl $0x0,(%esp) 0x0002d151 <+17>: cmp %ebx,%esp 0x0002d153 <+19>: jne 0x2d147 <fsmVerify+7> 0x0002d155 <+21>: sub $0xcc,%esp 0x0002d15b <+27>: mov %ebx,0x100bc(%esp) 0x0002d162 <+34>: call 0xc950 <__x86.get_pc_thunk.bx> 0x0002d167 <+39>: add $0x48461,%ebx gcc is still building, but I will rebuild rpm once gcc is done.
Thanks. I see that rpm finished, but I can't test it until tonight. It looks like I can use rpm2cpio and cpio to overwrite the current rpm packages without using rpm. I'm hoping that will work well enough to give me a working rpm. The openssh build failed because the pam_ssh_agent_auth (and debug version) use a different NVR that appears to be needed to be updately separately from the main NVR for the package. Though once I have rpm/dnf working again I can downgrade openssh to fix things.
(In reply to Bruno Wolff III from comment #16) > Thanks. I see that rpm finished, but I can't test it until tonight. It looks > like I can use rpm2cpio and cpio to overwrite the current rpm packages > without using rpm. I'm hoping that will work well enough to give me a > working rpm. > The openssh build failed because the pam_ssh_agent_auth (and debug version) > use a different NVR that appears to be needed to be updately separately from > the main NVR for the package. Though once I have rpm/dnf working again I can > downgrade openssh to fix things. Oops, I upgraded the wrong bug. Please see this comment for recovery instructions: https://bugzilla.redhat.com/show_bug.cgi?id=1538648#c9
I just tried to rebuild OpenSSH and the configure step already failed with error on x86_64 (with gcc.x86_64 7.3.1-1.fc28) [1]: checking for gcc... gcc checking whether the C compiler works... no configure: error: in `/builddir/build/BUILD/openssh-7.6p1': configure: error: C compiler cannot create executables See `config.log' for more details RPM build errors: It looks like GCC is still not in a shape. Is this issue already tracked in the bug #1538648 or is it a new issue? [1] https://koji.fedoraproject.org/koji/taskinfo?taskID=24463856
(In reply to Jakub Jelen from comment #18) > I just tried to rebuild OpenSSH and the configure step already failed with > error on x86_64 (with gcc.x86_64 7.3.1-1.fc28) [1]: > > checking for gcc... gcc > checking whether the C compiler works... no > configure: error: in `/builddir/build/BUILD/openssh-7.6p1': > configure: error: C compiler cannot create executables > See `config.log' for more details > RPM build errors: > > It looks like GCC is still not in a shape. Is this issue already tracked in > the bug #1538648 or is it a new issue? > > [1] https://koji.fedoraproject.org/koji/taskinfo?taskID=24463856 No, I think this was fixed in annobin-3.1-3.fc28, which had to be rebuilt for gcc-7.3.1.
Thanks. The build goes fine now. I will rebuild the package in rawhide to have working OpenSSH. Bruno, Vojtech, can you verify that the this build solves the problems in your environment? https://koji.fedoraproject.org/koji/taskinfo?taskID=24470109
The rpm file you gave me fixed rpm (and via that dnf) and was easier than what I had been planning. I updated to the compose from Fedora-Rawhide-20180125.n.0 overnight and am in the process of updating to Fedora-Rawhide-20180126.n.0. I was also able to downgrade openssh-server so that I could access the machine remotely. I still am seeing something weird with su where 'less' appears to be getting run before I get the new shell prompt. But other than timing of when this changed, there isn't anything tying it too this bug. So as far as I can tell you have fixed the root cause and I just need to hold openssh-server until there is a new build. Updating to the bad version doesn't break existing connections so it isn't a big deal as I can downgrade it back remotely.
openssh-7.6p1-6.fc28 fixes openssh for me.
Thank you for the verification. Since this issue looks resolved, we should be able to close this bug. But it is now on gcc, so I don't want to touch it, if there is something I miss.
Note that there is a closely related bug 1538648, not yet fixed but worked around with a redhat-rpm-config change (but which is not applied to all builds, see bug 1217376, bug 1284684, bug1539092 for a few known settings where older redhat-rpm-config flags are used).