Bug 75108

Summary: openssh is getting ENOBUFS and dying
Product: [Retired] Red Hat Linux Reporter: Need Real Name <aander07>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 7.2   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-07-29 13:46:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
strace of a failing sshd showing ENOBUFS from write()
none
strace of a failing sshd showing ENOBUFS from write()
none
another strace of a failing sshd showing ENOBUFS from write() none

Description Need Real Name 2002-10-04 16:10:25 UTC
Description of Problem:

This problem has was not seen until we started using 2.4.x kernels.  Machine A
is running 2.2.19-6.2.7, machine B is running 2.4.9-34.  When using rsync over
ssh or scp, approximately 200K into the file, the ssh process will receive an
ENOBUFS from the kernel, and die.

This does not appear to be a low memory situation:

             total       used       free     shared    buffers     cached
Mem:       1027984    1012184      15800         52      45964     705656
-/+ buffers/cache:     260564     767420
Swap:      2096376      18400    2077976

A work-around for this has been to traffic shape machine A so that it does not
send data as fast to machine B.  This tends to allow machine B to copy more
data before getting into an ENOBUFS situation.

We have tried using an openssh to retry when it gets and ENOBUFS, but then it
just gets into a write/ENOBUFS loop, and spins on the CPU instead of dying.

Version-Release number of selected component (if applicable):

Known to have been encountered in 2.4.9-31 and 2.4.9-34.  2.4.18-* exhibits
problems under our workloads and is not an option at this point in time.

How Reproducible:

Fairly high reproducibility given a large working set.  Smaller files tend to
not trigger this.

Comment 1 Need Real Name 2002-10-04 16:13:32 UTC
To be clear, ssh is getting ENOBUFS and dying on machine B, and the memory
output from above is from machine B, and represents a typical state when this
issue is encountered.

Comment 2 Need Real Name 2002-10-04 16:20:03 UTC
Machine A configuration:

             total       used       free     shared    buffers     cached
Mem:        523856     521624       2232          0      10868     468352
-/+ buffers/cache:      42404     481452
Swap:      2104432          0    2104432

Module                  Size  Used by
cls_u32                 4528   0
sch_tbf                 2192   0
sch_cbq                11248   0
vfat                    9712   0 (autoclean) (unused)
fat                    31968   0 (autoclean) [vfat]
e100                   37968   2 (autoclean)
nfs                    73472   7 (autoclean)
lockd                  45168   1 (autoclean) [nfs]
sunrpc                 64816   1 (autoclean) [nfs lockd]
dummy0                   960   0 (autoclean) (unused)
raid0                   3136   1 (autoclean)Description               

e100 driver reports:

Description               e100 - Intel(R) PRO/100+ Server Adapter
Driver_Name               Intel(R) PRO/100 Fast Ethernet Adapter - Loadable driver
Driver_Version            1.3.20
PCI_Vendor                0x8086
PCI_Device_ID             0x1229
PCI_Subsystem_Vendor      0x8086
PCI_Subsystem_ID          0x100c
PCI_Revision_ID           0x08

Machine B configuration:

Module                  Size  Used by
cls_u32                 4528   0
sch_tbf                 2192   0
sch_cbq                11248   0
vfat                    9712   0 (autoclean) (unused)
fat                    31968   0 (autoclean) [vfat]
e100                   37968   2 (autoclean)
nfs                    73472   7 (autoclean)
lockd                  45168   1 (autoclean) [nfs]
sunrpc                 64816   1 (autoclean) [nfs lockd]
dummy0                   960   0 (autoclean) (unused)
raid0                   3136   1 (autoclean)

(The tainted flag comes from the e100 driver.)

e100 driver reports:

Description               Intel(R) PRO/100+ Server Adapter (PILA8470B)
Driver_Name               e100
Driver_Version            1.6.22
PCI_Vendor                0x8086
PCI_Device_ID             0x1229
PCI_Subsystem_Vendor      0x8086
PCI_Subsystem_ID          0x100c
PCI_Revision_ID           0x0008


Comment 3 Need Real Name 2002-10-04 16:26:46 UTC
During the file copy process, the data is being read from system A, and copied
to system B.  Again, this behavior has been noted both with straight scp from
system A to system B, and with rsync over ssh from system A to system B.

We have noted two other things of interest: 

1) pure scp tends to trigger this faster, rsync over ssh tends to do marginally
better.

2) if we strace the rsync over ssh on machine A or traffic shape machine A down
to 2Mb/s output to machine B, it tends to complete more often.  We still get
failures, but the key is slowing down the rate at which data is sent to machine B.

Comment 4 Bill Nottingham 2002-10-04 16:27:25 UTC
The push is from machine A -> machine B; most of the data goes that direction.

Comment 5 Need Real Name 2002-10-04 19:35:05 UTC
One more data point, I have been unable to trigger this so far just using
netcat of large files.  This appears isolated to the use of openssh.

Comment 6 David Miller 2002-10-04 23:24:35 UTC
Even though 2.4.18 has problems with your workload, can you at
least test that 2.4.18 makes this problem go away?

Can you also make sure to enter into bugzilla the problems that
the 2.4.18 kernel causes because that problem needs to be taken
care of eventually should we use 2.4.18+ kernels for future
errata.


Comment 7 David Miller 2002-10-04 23:53:20 UTC
After reviewing the code paths in question, the only way that
a TCP socket sendmsg() call can return ENOBUFS is if sendmsg():

1) Is given a msg_controllen > INT_MAX
2) A sock_kmalloc of size msg_controllen fails

What is msg_controllen when openssh gets these ENOBUFS errors
back from a TCP sendmsg() call?

Also, in the future it would be really nice if captured strace
output was provided not just "gets ENOBUFS" as the latter does
not tell us what system call the error is being returned from.


Comment 8 Need Real Name 2002-10-05 15:22:07 UTC
Created attachment 78906 [details]
strace of a failing sshd showing ENOBUFS from write()

Comment 9 Need Real Name 2002-10-05 15:22:13 UTC
Created attachment 78907 [details]
strace of a failing sshd showing ENOBUFS from write()

Comment 10 Need Real Name 2002-10-05 15:22:26 UTC
Created attachment 78908 [details]
another strace of a failing sshd showing ENOBUFS from write()

Comment 11 Need Real Name 2002-10-05 15:26:25 UTC
What do I need to do to supply the value for msg_controllen?  I also have a
~250MB tcpdump taken during one of these failures if you need that, but trying
to upload that to bugzilla does not seem the smartest approach.


Comment 12 Need Real Name 2002-10-18 17:26:59 UTC
As a data point, 2.4.18-17.7.x had not exhibited this same behavior yet.

Comment 13 william ewing 2003-07-28 15:27:11 UTC
Has there been any update on this call as I am experiencing a similiar problem. 
I am using Kernel 2.4.9-34enterprise and when a burst of users are logging on 
can get Kernel errors but also certain commands within the application (HP 
OpenMAil) nolonger work. I logged this initially with HP and they got me to do 
straces on certain processes and their report is as follows

> > 
> > I also did a strace on omsessd on the system here to compare the output.
> > 
> > I also got occurences of "kill(..., SIG_)) = -1 EPERM" and the
> > "accept(3,...[110]) = -1 EAGAIN", and don't think that these 
> > show what the problem is.
> > 
> > Looking at the trace for the failing case, there is firstly 
> > the expected read
> > 
> > read (4, 
> > "\30\0\0\0\5\0\0\0\0\0\0\0\1\0\0\0\377\377\0\0\0\0\0\0\0",40)=24
> > 
> > This matches the write from omstat.
> > 
> > The write from omsessd then gives the following
> > 
> > write(4, "........", 35068) = -1 ENOBUFS (No buffer space available)
> > 
> > compared to
> > 
> > write(4, ".......", 18912) = 18912
> > 
> > when omstat works.
> > 
> > So if looks like the full data is being written out to the 
> > socket, but there is a problem with buffer space.
> > --



Comment 14 Arjan van de Ven 2003-07-29 13:46:54 UTC
william ewing: you're running a WAAAAAAAAAAAY too old kernel.


Comment 15 william ewing 2003-07-30 12:58:39 UTC
Are you syaing that this problem has been resolved in a later Kernel ?

Comment 16 Arjan van de Ven 2003-07-30 12:59:54 UTC
yes