Bug 1298760
Summary: | ssh_exchange_identification: read: Connection reset by peer | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Bogdan <bogdan> |
Component: | openssh | Assignee: | Jakub Jelen <jjelen> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 23 | CC: | jjelen, mattias.ellert, mgrepl, plautrba, tmraz |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-01-18 15:10:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Bogdan
2016-01-14 23:41:40 UTC
This is not caused by any firewall as port 22 is open on the box. Before the rsync run, all is ok. Also noticed same issue with SSH when toggling the windows in mc with CTRL+O. This looks to me like a memory overlapping/sharing issue where rsync rides/overwrites the memory segment for ssh and once the daemon gets corrupt, daemon becomes not functional. Without getting any more information from the server (logs, core dumps, retraces) or reliable reproducer we are not able to investigate this issue any further. At least verbose log from client might be helpful, debug log from server should give us some clue. Can you provide something like this? From these words, it sounds for me as disk space exhaustion or similar resource problem. If it would be memory issue, it would be more serious problem than only openssh. I just got to reproduce it. After the rsync run, I got locked out. This is what ssh -vvv root@fileserver gives me: [sysadmin@gateway ~]$ ssh -vvv root@fileserver OpenSSH_7.1p1, OpenSSL 1.0.2e-fips 3 Dec 2015 debug1: Reading configuration data /etc/ssh/ssh_config debug1: /etc/ssh/ssh_config line 56: Applying options for * debug2: ssh_connect: needpriv 0 debug1: Connecting to fileserver [192.168.100.253] port 22. debug1: Connection established. debug1: key_load_public: No such file or directory debug1: identity file /home/sysadmin/.ssh/id_rsa type -1 debug1: key_load_public: No such file or directory debug1: identity file /home/sysadmin/.ssh/id_rsa-cert type -1 debug1: key_load_public: No such file or directory debug1: identity file /home/sysadmin/.ssh/id_dsa type -1 debug1: key_load_public: No such file or directory debug1: identity file /home/sysadmin/.ssh/id_dsa-cert type -1 debug1: key_load_public: No such file or directory debug1: identity file /home/sysadmin/.ssh/id_ecdsa type -1 debug1: key_load_public: No such file or directory debug1: identity file /home/sysadmin/.ssh/id_ecdsa-cert type -1 debug1: key_load_public: No such file or directory debug1: identity file /home/sysadmin/.ssh/id_ed25519 type -1 debug1: key_load_public: No such file or directory debug1: identity file /home/sysadmin/.ssh/id_ed25519-cert type -1 debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_7.1 ssh_exchange_identification: read: Connection reset by peer I was logged in all the time when the rsync happened. The nemory sits at around 900MB usage out of 6GB or so. Memory cache is at around 500MB max and I am emptying the cache after every rsync to make sure the memory is fine. There is plenty of disk space. The rsync syncs from /home to /storage that is a second physical drive and to /nas that is a samba drive mounted locally. [root@fileserver scripts]# df Filesystem 1K-blocks Used Available Use% Mounted on devtmpfs 3523556 4 3523552 1% /dev tmpfs 3535452 0 3535452 0% /dev/shm tmpfs 3535452 760 3534692 1% /run tmpfs 3535452 0 3535452 0% /sys/fs/cgroup /dev/mapper/fedora-root 52403200 5696356 46706844 11% / tmpfs 3535452 64 3535388 1% /tmp /dev/sda1 487652 160792 297164 36% /boot /dev/mapper/fedora-home 1863643824 251882132 1517070980 15% /home /dev/sdb 2884285104 420890532 2316881244 16% /storage //nas.domain/Administration/backup 2870309064 479849064 2390357600 17% /nas tmpfs 707092 0 707092 0% /run/user/0 If I try to reload ssh, I get this: [root@fileserver scripts]# systemctl reload sshd Job for sshd.service failed because the control process exited with error code. See "systemctl status sshd.service" and "journalctl -xe" for details. [root@fileserver scripts]# systemctl status sshd.service ● sshd.service - OpenSSH server daemon Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor preset: enabled) Active: active (running) (Result: exit-code) since Fri 2016-01-15 10:01:17 MST; 6h ago Docs: man:sshd(8) man:sshd_config(5) Process: 12618 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=208/STDIN) Main PID: 932 (sshd) CGroup: /system.slice/sshd.service └─932 /usr/sbin/sshd -D Jan 15 15:27:24 fileserver.domain sshd[11428]: pam_succeed_if(sshd:auth): requirement "uid >= 1000" not met by user "root" Jan 15 15:27:26 fileserver.domain sshd[11428]: Failed password for root from 192.168.100.1 port 54808 ssh2 Jan 15 15:27:33 fileserver.domain sshd[11428]: Accepted password for root from 192.168.100.1 port 54808 ssh2 Jan 15 15:37:06 fileserver.domain sshd[11620]: Accepted password for root from 192.168.100.1 port 54852 ssh2 Jan 15 15:39:15 fileserver.domain sshd[11693]: Accepted password for root from 192.168.100.1 port 58822 ssh2 Jan 15 16:12:39 fileserver.domain sshd[932]: Couldn't open /dev/null: No such file or directory Jan 15 16:13:07 fileserver.domain sshd[932]: Couldn't open /dev/null: No such file or directory Jan 15 16:13:31 fileserver.domain sshd[932]: Couldn't open /dev/null: No such file or directory Jan 15 16:23:40 fileserver.domain systemd[1]: sshd.service: Control process exited, code=exited status=208 Jan 15 16:23:40 fileserver.domain systemd[1]: Reload failed for OpenSSH server daemon. The strange part is that SSH comes back after 30 mins of refusing the connections... This looks really weird. > Jan 15 16:12:39 fileserver.domain sshd[932]: Couldn't open /dev/null: No such file or directory > Jan 15 16:13:07 fileserver.domain sshd[932]: Couldn't open /dev/null: No such file or directory > Jan 15 16:13:31 fileserver.domain sshd[932]: Couldn't open /dev/null: No such file or directory Another idea that came to my mind is that it might be hardware problem, that some part of memory might get corrupted, but if you managed to reproduce it again it sounds quite unlikely (though some memtest would not hurt). To understand the errors, you issued three connection respective to the three quoted logs above? Or is it from the one connection? /dev/null should be available. Unless the data segment with the value _PATH_DEVNULL is overwritten by something else (as you tried to propose) and the actual opening file is different. This would be serious issue with kernel. I build scratch build to add what file is actually opening, if it occurs in `sanitise_stdfd` and the constant is malformed: http://koji.fedoraproject.org/koji/taskinfo?taskID=12590686 diff --git a/misc.c b/misc.c index c5d2686..bcc5722 100644 --- a/misc.c +++ b/misc.c @@ -728,8 +728,8 @@ sanitise_stdfd(void) int nullfd, dupfd; if ((nullfd = dupfd = open(_PATH_DEVNULL, O_RDWR)) == -1) { - fprintf(stderr, "Couldn't open /dev/null: %s\n", - strerror(errno)); + fprintf(stderr, "Couldn't open %s: %s\n", + _PATH_DEVNULL, strerror(errno)); exit(1); } while (++dupfd <= 2) { Could you try to install this build as a server and re-reproduce it, if the path in error message changes? Also it would be useful to see move verbose log from server (`LogLevel` option in `sshd_config`) and how we did get to the point of this error. Then it should be more clear from the `/var/log/secure` or journal. It is very possible that the bug I am seeing is caused by the kernel itself. I have a script that outputs the errors to /dev/null. As soon as I took the redirection out, I do not need to drop the disk caches anymore and it seems to be stable. So, for some reason outputting to /dev/null causes the issue. On the other hand, there is not much error dropped, so I do not understand of why /dev/null would become inaccessible. To get back on track, this does not seem to be an SSH bug after all, ssh just gets affected by it. As I would not go through the effort of posting this as kernel bug, I will close the issue, but the issue is still present and caused by redirection of 2> /dev/null. |