Bug 84212 - iptables + vlans = availability problems
Summary: iptables + vlans = availability problems
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: iptables
Version: 7.3
Hardware: i686
OS: Linux
high
high
Target Milestone: ---
Assignee: Thomas Woerner
QA Contact: Ben Levenson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-02-13 16:12 UTC by Need Real Name
Modified: 2005-10-31 22:00 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-10-08 10:59:23 UTC
Embargoed:


Attachments (Terms of Use)

Description Need Real Name 2003-02-13 16:12:43 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3a)
Gecko/20021207 Phoenix/0.5

Description of problem:
I have a box here working as firewall - router at layer 3.
It has 5 NICs, two of them has been separated into some sub-interfaces, these
sub-interfaces are the vlan interfaces.

We start to detect availability problems a week ago more or less.
The system hangs withour any reason, there was no stranges messages into the
console or into the logs.

I search for the cause and I discover that the box loses memory and it can't
recover it.

Here is a capture from sar program (sysstat package):


00:00:00    kbmemfree kbmemused  %memused
00:10:00       225612     30268     11,83
00:20:00       225556     30324     11,85
00:30:00       225212     30668     11,99
00:40:00       225484     30396     11,88
00:50:00       225428     30452     11,90
01:00:00       225080     30800     12,04
01:10:00       225332     30548     11,94
01:20:00       225276     30604     11,96
01:30:00       224928     30952     12,10
01:40:00       223800     32080     12,54
01:50:00       223696     32184     12,58
02:00:00       223352     32528     12,71
02:10:00       224304     31576     12,34
02:20:00       224248     31632     12,36
02:30:00       223904     31976     12,50
02:40:00       224180     31700     12,39
02:50:00       224124     31756     12,41
03:00:00       223784     32096     12,54
03:10:00       224032     31848     12,45
03:20:00       223980     31900     12,47
03:30:00       223628     32252     12,60
03:40:00       223908     31972     12,49
03:50:00       223844     32036     12,52
04:00:00       223492     32388     12,66
04:10:00       184592     71288     27,86
04:20:00       184540     71340     27,88
04:30:00       184192     71688     28,02
04:40:00       184472     71408     27,91
04:50:00       184412     71468     27,93
05:00:00       184068     71812     28,06
05:10:00       184324     71556     27,96
05:20:00       184264     71616     27,99
05:30:00       183924     71956     28,12
05:40:00       184144     71736     28,04
05:50:00       184092     71788     28,06
06:00:00       183744     72136     28,19
06:10:00       183988     71892     28,10
06:20:00       183932     71948     28,12
06:30:00       183592     72288     28,25
06:40:00       183860     72020     28,15
06:50:00       183808     72072     28,17
07:00:00       183464     72416     28,30
07:10:00       183720     72160     28,20
07:20:00       183668     72212     28,22
07:30:00       183308     72572     28,36
07:40:00       183588     72292     28,25
07:50:00       183536     72344     28,27
08:00:00       183192     72688     28,41
08:10:00       183440     72440     28,31
08:20:00       183388     72492     28,33
08:30:00       183044     72836     28,46
08:40:00       183316     72564     28,36
08:50:00       183260     72620     28,38
09:00:00       182916     72964     28,51
09:10:00       183160     72720     28,42
09:20:00       183104     72776     28,44
09:30:00       182760     73120     28,58
09:40:00       182988     72892     28,49
09:50:00       182936     72944     28,51
10:00:00       182592     73288     28,64
10:10:00       182844     73036     28,54
10:20:00       182784     73096     28,57
10:30:00       182440     73440     28,70
10:40:00       182716     73164     28,59
10:50:00       182660     73220     28,61
11:00:00       181344     74536     29,13
11:10:00       181180     74700     29,19
11:20:00       181144     74736     29,21
11:30:00       180780     75100     29,35
11:40:00       181060     74820     29,24
11:50:00       179776     76104     29,74
12:00:00       180108     75772     29,61
12:10:00       180348     75532     29,52
12:20:00       180152     75728     29,60
12:30:00       179764     76116     29,75
12:40:00       180040     75840     29,64
12:50:00       179980     75900     29,66
13:00:00       178672     77208     30,17
13:10:00       178924     76956     30,08
13:20:00       179128     76752     30,00
13:30:00       179396     76484     29,89
13:40:00       178956     76924     30,06
13:50:00       178904     76976     30,08
14:00:00       178256     77624     30,34
14:10:00       178508     77372     30,24
14:20:00       178460     77420     30,26
14:30:00       178116     77764     30,39
14:40:00       178388     77492     30,28
14:50:00       178332     77548     30,31
15:00:00       177980     77900     30,44
15:10:00       178232     77648     30,35
15:20:00       178176     77704     30,37
15:30:00       178736     77144     30,15
15:40:00       179008     76872     30,04
15:50:00       178952     76928     30,06
16:00:00       178584     77296     30,21
16:10:00       178840     77040     30,11
16:20:00       178736     77144     30,15
16:30:00       178364     77516     30,29
16:40:00       178636     77244     30,19

The box is not running any service, only sshd for remote administration.

The only way to recover the memory is doing a hard reboot.
This box is running a 2.4.19 kernel, with a 3com patch to avoid MTU problems,
this patch is avaible at http://www.bewley.net/linux/vlan/patches/vlan-3c59x.patch



Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.using kernel 2.4.19 with vlan as module and patched with this
http://www.bewley.net/linux/vlan/patches/vlan-3c59x.patch
2.create a firewall script using netfilter
3.wait to the crash.
    

Actual Results:  System crash after X time because the is no memory free, maybe
there is another problem too? I don't know.

Expected Results:  long uptime, no availability problems at all.

Additional info:

This problem has bee reported to netfilter core team at:
https://bugzilla.netfilter.org/cgi-bin/bugzilla/show_bug.cgi?id=40
Maybe you find any other interesting information there.

Comment 1 Need Real Name 2003-04-28 15:28:21 UTC
Same results with kernel 2.4.20

Comment 2 acount closed by user 2003-05-22 01:08:50 UTC
vlan in rh-LiNUX is nearly  unusable

Comment 3 Need Real Name 2003-06-26 09:02:06 UTC
Same results with 2.4.21.
I get more uptime, but at the ends I obtain the same result, the box hangs.
I will upgrade to iptables 1.2.8 today.

The last resource is to replace 3Com NICs, I am thinking on Intel Pro NICs, as I
know they have 802.1q native support so they don't need to be patched.


Comment 4 Thomas Woerner 2003-07-01 10:17:44 UTC
Are you sure that this is not a bug of the vlan-3c59x patch? I have never seen
this before.

Comment 5 Need Real Name 2003-07-01 10:53:17 UTC
I don't know if the problem is the patch for the 3com nics.

I would like to try http://www.scyld.com/network/ drivers, they seems to be more
powerful than the kernel drivers. At least scyld drivers has 802.1q support
included and they don't need any type of patch.

I will try them as soon as possible, not today, my desktop crash, exactly my hard 
disk die :-( AAAAHH!!! If anything can go wrong, it will.

The last info of the box is...

Linux version 2.4.21-v2 (root.es) (gcc version 2.96 20000731 (Red
Hat Linux 7.3 2.96-113)) #2 jue jun 26 11:14:18
CEST 2003
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000000fef0000 (usable)
 BIOS-e820: 000000000fef0000 - 000000000fef3000 (ACPI NVS)
 BIOS-e820: 000000000fef3000 - 000000000ff00000 (ACPI data)
 BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
254MB LOWMEM available.
On node 0 totalpages: 65264
zone(0): 4096 pages.
zone(1): 61168 pages.
zone(2): 0 pages.
Kernel command line: initrd=initrd.img root=/dev/hda8 BOOT_IMAGE=vmlinuz auto
Local APIC disabled by BIOS -- reenabling.
Found and enabled local APIC!
Initializing CPU#0
Detected 601.378 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 1199.30 BogoMIPS
Memory: 255736k/261056k available (1227k kernel code, 4936k reserved, 405k data,
252k init, 0k highmem)
Dentry cache hash table entries: 32768 (order: 6, 262144 bytes)
Inode cache hash table entries: 16384 (order: 5, 131072 bytes)
Mount cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer-cache hash table entries: 16384 (order: 4, 65536 bytes)
Page-cache hash table entries: 65536 (order: 6, 262144 bytes)
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 0383fbff 00000000 00000000 00000000
CPU:             Common caps: 0383fbff 00000000 00000000 00000000
CPU: Intel Pentium III (Coppermine) stepping 01
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 601.3629 MHz.
..... host bus clock speed is 133.6360 MHz.
cpu: 0, clocks: 1336360, slice: 668180
CPU0<T0:1336352,T1:668160,D:12,S:668180,C:1336360>
PCI: PCI BIOS revision 2.10 entry at 0xfb180, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
Transparent bridge - Intel Corp. 82801AA PCI Bridge
PCI: Using IRQ router PIIX [8086/2410] at 00:1f.0
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
Journalled Block Device driver loaded
 tbxface-0099 [01] Acpi_load_tables      : ACPI Tables successfully loaded
Parsing
Methods:...........................................................................................
91 Control Methods found and parsed (352 nodes total)
ACPI Namespace successfully loaded at root c02ec920
ACPI: Core Subsystem version [20011018]
evxfevnt-0081 [-23] Acpi_enable           : Transition to ACPI mode successful
Executing device _INI methods:................................
32 Devices found: 32 _STA, 0 _INI
Completing Region and Field
initialization:............................................
20/24 Regions, 24/24 Fields initialized (352 nodes total)
ACPI: Subsystem enabled
Detected PS/2 Mouse Port.
pty: 256 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI
ISAPNP enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
Compaq SMART2 Driver (v 2.4.25)
Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ICH: IDE controller at PCI slot 00:1f.1
ICH: chipset revision 2
ICH: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:pio, hdd:pio
hda: FUJITSU MPE3084AE, ATA DISK drive
blk: queue c0305200, I/O limit 4095Mb (mask 0xffffffff)
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: attached ide-disk driver.
hda: 16514064 sectors (8455 MB) w/512KiB Cache, CHS=1092/240/63, UDMA(33)
Partition check:
 hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 hda9 >
usb.c: registered new driver hub
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 2048 buckets, 16Kbytes
TCP: Hash tables configured (established 16384 bind 32768)
Linux IP multicast router 0.06 plus PIM-SM
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 252k freed
Real Time Clock Driver v1.10e
Adding Swap: 257032k swap-space (priority -1)
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,8), internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,9), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,5), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,6), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,7), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,2), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
ip_tables: (C) 2000-2002 Netfilter core team
ip_conntrack version 2.1 (2039 buckets, 16312 max) - 292 bytes per conntrack
3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
See Documentation/networking/vortex.txt
01:03.0: 3Com PCI 3c905C Tornado at 0xb000. Vers LK1.1.16
 00:01:02:f9:ed:f8, IRQ 11
  product code 4552 rev 00.13 date 11-30-00
  Internal config register is 1800000, transceivers 0xa.
  8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
  MII transceiver found at address 24, status 782d.
  Enabling bus-master transmits and whole-frame receives.
01:03.0: scatter/gather enabled. h/w checksums enabled
See Documentation/networking/vortex.txt
01:04.0: 3Com PCI 3c905C Tornado at 0xb400. Vers LK1.1.16
 00:50:da:3c:ab:34, IRQ 12
  product code 5957 rev 00.13 date 10-17-99
  Internal config register is 1800000, transceivers 0xa.
  8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
  MII transceiver found at address 24, status 782d.
  Enabling bus-master transmits and whole-frame receives.
01:04.0: scatter/gather enabled. h/w checksums enabled
See Documentation/networking/vortex.txt
01:07.0: 3Com PCI 3c905B Cyclone 100baseTx at 0xbc00. Vers LK1.1.16
 00:01:02:29:0e:b0, IRQ 11
  product code 4347 rev 00.12 date 01-20-00
  Internal config register is 1800000, transceivers 0xa.
  8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
  MII transceiver found at address 24, status 786d.
  Enabling bus-master transmits and whole-frame receives.
01:07.0: scatter/gather enabled. h/w checksums enabled
See Documentation/networking/vortex.txt
01:0a.0: 3Com PCI 3c905B Cyclone 100baseTx at 0xc800. Vers LK1.1.16
 00:10:5a:60:6b:24, IRQ 10
  product code 5152 rev 00.12 date 10-18-98
  Internal config register is 1800000, transceivers 0xa.
  8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
  MII transceiver found at address 24, status 786d.
  Enabling bus-master transmits and whole-frame receives.
01:0a.0: scatter/gather enabled. h/w checksums enabled
ne2k-pci.c:v1.02 10/19/2000 D. Becker/P. Gortmaker
  http://www.scyld.com/network/ne2k-pci.html
eth4: KTI ET32P2 found at 0xb800, IRQ 9, 00:40:F6:74:0D:44.
802.1Q VLAN Support v1.8 Ben Greear <greearb>
All bugs added by David S. Miller <davem>
vlan2: add 01:00:5e:00:00:01 mcast address to master interface
vlan3: add 01:00:5e:00:00:01 mcast address to master interface
vlan4: add 01:00:5e:00:00:01 mcast address to master interface
vlan6: add 01:00:5e:00:00:01 mcast address to master interface

# iptables -V
iptables v1.2.8

# uptime
12:56pm  up 5 days, 20 min,  1 user,  load average: 0.00, 0.00, 0.00

Note that the box is running with 2.4.21 + iptables 1.2.8 since the latest 5 days. 



Comment 6 Need Real Name 2003-08-29 10:07:24 UTC
Latest news...

I upgrade to iptables 1.2.8 few weeks ago, using Harald Welte rpms from:
ftp://gnumonks.org/pub/rpms

Kernel is still 2.4.21

Latest uptime was around 56 days, the double!!! 
I only could obtain around two weeks with netfilter 1.2.7

Anyway, the box is still having the same problems, it hangs after those days. At
least that is what the other technician told me, I was on holidays so I can't
verify that information.

Comment 7 Thomas Woerner 2004-10-08 10:59:23 UTC
Please verify this with a newer version of Red Hat Enterprise Linux or
Fedora Core and reopen it against the new version if it still occurs.

Closing as "not a bug" for now.



Note You need to log in before you can comment on or make changes to this bug.