From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3a) Gecko/20021207 Phoenix/0.5 Description of problem: I have a box here working as firewall - router at layer 3. It has 5 NICs, two of them has been separated into some sub-interfaces, these sub-interfaces are the vlan interfaces. We start to detect availability problems a week ago more or less. The system hangs withour any reason, there was no stranges messages into the console or into the logs. I search for the cause and I discover that the box loses memory and it can't recover it. Here is a capture from sar program (sysstat package): 00:00:00 kbmemfree kbmemused %memused 00:10:00 225612 30268 11,83 00:20:00 225556 30324 11,85 00:30:00 225212 30668 11,99 00:40:00 225484 30396 11,88 00:50:00 225428 30452 11,90 01:00:00 225080 30800 12,04 01:10:00 225332 30548 11,94 01:20:00 225276 30604 11,96 01:30:00 224928 30952 12,10 01:40:00 223800 32080 12,54 01:50:00 223696 32184 12,58 02:00:00 223352 32528 12,71 02:10:00 224304 31576 12,34 02:20:00 224248 31632 12,36 02:30:00 223904 31976 12,50 02:40:00 224180 31700 12,39 02:50:00 224124 31756 12,41 03:00:00 223784 32096 12,54 03:10:00 224032 31848 12,45 03:20:00 223980 31900 12,47 03:30:00 223628 32252 12,60 03:40:00 223908 31972 12,49 03:50:00 223844 32036 12,52 04:00:00 223492 32388 12,66 04:10:00 184592 71288 27,86 04:20:00 184540 71340 27,88 04:30:00 184192 71688 28,02 04:40:00 184472 71408 27,91 04:50:00 184412 71468 27,93 05:00:00 184068 71812 28,06 05:10:00 184324 71556 27,96 05:20:00 184264 71616 27,99 05:30:00 183924 71956 28,12 05:40:00 184144 71736 28,04 05:50:00 184092 71788 28,06 06:00:00 183744 72136 28,19 06:10:00 183988 71892 28,10 06:20:00 183932 71948 28,12 06:30:00 183592 72288 28,25 06:40:00 183860 72020 28,15 06:50:00 183808 72072 28,17 07:00:00 183464 72416 28,30 07:10:00 183720 72160 28,20 07:20:00 183668 72212 28,22 07:30:00 183308 72572 28,36 07:40:00 183588 72292 28,25 07:50:00 183536 72344 28,27 08:00:00 183192 72688 28,41 08:10:00 183440 72440 28,31 08:20:00 183388 72492 28,33 08:30:00 183044 72836 28,46 08:40:00 183316 72564 28,36 08:50:00 183260 72620 28,38 09:00:00 182916 72964 28,51 09:10:00 183160 72720 28,42 09:20:00 183104 72776 28,44 09:30:00 182760 73120 28,58 09:40:00 182988 72892 28,49 09:50:00 182936 72944 28,51 10:00:00 182592 73288 28,64 10:10:00 182844 73036 28,54 10:20:00 182784 73096 28,57 10:30:00 182440 73440 28,70 10:40:00 182716 73164 28,59 10:50:00 182660 73220 28,61 11:00:00 181344 74536 29,13 11:10:00 181180 74700 29,19 11:20:00 181144 74736 29,21 11:30:00 180780 75100 29,35 11:40:00 181060 74820 29,24 11:50:00 179776 76104 29,74 12:00:00 180108 75772 29,61 12:10:00 180348 75532 29,52 12:20:00 180152 75728 29,60 12:30:00 179764 76116 29,75 12:40:00 180040 75840 29,64 12:50:00 179980 75900 29,66 13:00:00 178672 77208 30,17 13:10:00 178924 76956 30,08 13:20:00 179128 76752 30,00 13:30:00 179396 76484 29,89 13:40:00 178956 76924 30,06 13:50:00 178904 76976 30,08 14:00:00 178256 77624 30,34 14:10:00 178508 77372 30,24 14:20:00 178460 77420 30,26 14:30:00 178116 77764 30,39 14:40:00 178388 77492 30,28 14:50:00 178332 77548 30,31 15:00:00 177980 77900 30,44 15:10:00 178232 77648 30,35 15:20:00 178176 77704 30,37 15:30:00 178736 77144 30,15 15:40:00 179008 76872 30,04 15:50:00 178952 76928 30,06 16:00:00 178584 77296 30,21 16:10:00 178840 77040 30,11 16:20:00 178736 77144 30,15 16:30:00 178364 77516 30,29 16:40:00 178636 77244 30,19 The box is not running any service, only sshd for remote administration. The only way to recover the memory is doing a hard reboot. This box is running a 2.4.19 kernel, with a 3com patch to avoid MTU problems, this patch is avaible at http://www.bewley.net/linux/vlan/patches/vlan-3c59x.patch Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.using kernel 2.4.19 with vlan as module and patched with this http://www.bewley.net/linux/vlan/patches/vlan-3c59x.patch 2.create a firewall script using netfilter 3.wait to the crash. Actual Results: System crash after X time because the is no memory free, maybe there is another problem too? I don't know. Expected Results: long uptime, no availability problems at all. Additional info: This problem has bee reported to netfilter core team at: https://bugzilla.netfilter.org/cgi-bin/bugzilla/show_bug.cgi?id=40 Maybe you find any other interesting information there.
Same results with kernel 2.4.20
vlan in rh-LiNUX is nearly unusable
Same results with 2.4.21. I get more uptime, but at the ends I obtain the same result, the box hangs. I will upgrade to iptables 1.2.8 today. The last resource is to replace 3Com NICs, I am thinking on Intel Pro NICs, as I know they have 802.1q native support so they don't need to be patched.
Are you sure that this is not a bug of the vlan-3c59x patch? I have never seen this before.
I don't know if the problem is the patch for the 3com nics. I would like to try http://www.scyld.com/network/ drivers, they seems to be more powerful than the kernel drivers. At least scyld drivers has 802.1q support included and they don't need any type of patch. I will try them as soon as possible, not today, my desktop crash, exactly my hard disk die :-( AAAAHH!!! If anything can go wrong, it will. The last info of the box is... Linux version 2.4.21-v2 (root.es) (gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-113)) #2 jue jun 26 11:14:18 CEST 2003 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 00000000000a0000 (usable) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000000fef0000 (usable) BIOS-e820: 000000000fef0000 - 000000000fef3000 (ACPI NVS) BIOS-e820: 000000000fef3000 - 000000000ff00000 (ACPI data) BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved) 254MB LOWMEM available. On node 0 totalpages: 65264 zone(0): 4096 pages. zone(1): 61168 pages. zone(2): 0 pages. Kernel command line: initrd=initrd.img root=/dev/hda8 BOOT_IMAGE=vmlinuz auto Local APIC disabled by BIOS -- reenabling. Found and enabled local APIC! Initializing CPU#0 Detected 601.378 MHz processor. Console: colour VGA+ 80x25 Calibrating delay loop... 1199.30 BogoMIPS Memory: 255736k/261056k available (1227k kernel code, 4936k reserved, 405k data, 252k init, 0k highmem) Dentry cache hash table entries: 32768 (order: 6, 262144 bytes) Inode cache hash table entries: 16384 (order: 5, 131072 bytes) Mount cache hash table entries: 512 (order: 0, 4096 bytes) Buffer-cache hash table entries: 16384 (order: 4, 65536 bytes) Page-cache hash table entries: 65536 (order: 6, 262144 bytes) CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 256K Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU: After generic, caps: 0383fbff 00000000 00000000 00000000 CPU: Common caps: 0383fbff 00000000 00000000 00000000 CPU: Intel Pentium III (Coppermine) stepping 01 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. POSIX conformance testing by UNIFIX enabled ExtINT on CPU#0 ESR value before enabling vector: 00000000 ESR value after enabling vector: 00000000 Using local APIC timer interrupts. calibrating APIC timer ... ..... CPU clock speed is 601.3629 MHz. ..... host bus clock speed is 133.6360 MHz. cpu: 0, clocks: 1336360, slice: 668180 CPU0<T0:1336352,T1:668160,D:12,S:668180,C:1336360> PCI: PCI BIOS revision 2.10 entry at 0xfb180, last bus=1 PCI: Using configuration type 1 PCI: Probing PCI hardware Transparent bridge - Intel Corp. 82801AA PCI Bridge PCI: Using IRQ router PIIX [8086/2410] at 00:1f.0 isapnp: Scanning for PnP cards... isapnp: No Plug & Play device found Linux NET4.0 for Linux 2.4 Based upon Swansea University Computer Society NET3.039 Initializing RT netlink socket Starting kswapd Journalled Block Device driver loaded tbxface-0099 [01] Acpi_load_tables : ACPI Tables successfully loaded Parsing Methods:........................................................................................... 91 Control Methods found and parsed (352 nodes total) ACPI Namespace successfully loaded at root c02ec920 ACPI: Core Subsystem version [20011018] evxfevnt-0081 [-23] Acpi_enable : Transition to ACPI mode successful Executing device _INI methods:................................ 32 Devices found: 32 _STA, 0 _INI Completing Region and Field initialization:............................................ 20/24 Regions, 24/24 Fields initialized (352 nodes total) ACPI: Subsystem enabled Detected PS/2 Mouse Port. pty: 256 Unix98 ptys configured Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI ISAPNP enabled ttyS00 at 0x03f8 (irq = 4) is a 16550A ttyS01 at 0x02f8 (irq = 3) is a 16550A Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 Compaq SMART2 Driver (v 2.4.25) Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ICH: IDE controller at PCI slot 00:1f.1 ICH: chipset revision 2 ICH: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:pio, hdd:pio hda: FUJITSU MPE3084AE, ATA DISK drive blk: queue c0305200, I/O limit 4095Mb (mask 0xffffffff) ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hda: attached ide-disk driver. hda: 16514064 sectors (8455 MB) w/512KiB Cache, CHS=1092/240/63, UDMA(33) Partition check: hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 hda9 > usb.c: registered new driver hub NET4: Linux TCP/IP 1.0 for NET4.0 IP Protocols: ICMP, UDP, TCP, IGMP IP: routing cache hash table of 2048 buckets, 16Kbytes TCP: Hash tables configured (established 16384 bind 32768) Linux IP multicast router 0.06 plus PIM-SM NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. VFS: Mounted root (ext3 filesystem) readonly. Freeing unused kernel memory: 252k freed Real Time Clock Driver v1.10e Adding Swap: 257032k swap-space (priority -1) EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,8), internal journal kjournald starting. Commit interval 5 seconds EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,1), internal journal EXT3-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,9), internal journal EXT3-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,5), internal journal EXT3-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,6), internal journal EXT3-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,7), internal journal EXT3-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,2), internal journal EXT3-fs: mounted filesystem with ordered data mode. ip_tables: (C) 2000-2002 Netfilter core team ip_conntrack version 2.1 (2039 buckets, 16312 max) - 292 bytes per conntrack 3c59x: Donald Becker and others. www.scyld.com/network/vortex.html See Documentation/networking/vortex.txt 01:03.0: 3Com PCI 3c905C Tornado at 0xb000. Vers LK1.1.16 00:01:02:f9:ed:f8, IRQ 11 product code 4552 rev 00.13 date 11-30-00 Internal config register is 1800000, transceivers 0xa. 8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface. MII transceiver found at address 24, status 782d. Enabling bus-master transmits and whole-frame receives. 01:03.0: scatter/gather enabled. h/w checksums enabled See Documentation/networking/vortex.txt 01:04.0: 3Com PCI 3c905C Tornado at 0xb400. Vers LK1.1.16 00:50:da:3c:ab:34, IRQ 12 product code 5957 rev 00.13 date 10-17-99 Internal config register is 1800000, transceivers 0xa. 8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface. MII transceiver found at address 24, status 782d. Enabling bus-master transmits and whole-frame receives. 01:04.0: scatter/gather enabled. h/w checksums enabled See Documentation/networking/vortex.txt 01:07.0: 3Com PCI 3c905B Cyclone 100baseTx at 0xbc00. Vers LK1.1.16 00:01:02:29:0e:b0, IRQ 11 product code 4347 rev 00.12 date 01-20-00 Internal config register is 1800000, transceivers 0xa. 8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface. MII transceiver found at address 24, status 786d. Enabling bus-master transmits and whole-frame receives. 01:07.0: scatter/gather enabled. h/w checksums enabled See Documentation/networking/vortex.txt 01:0a.0: 3Com PCI 3c905B Cyclone 100baseTx at 0xc800. Vers LK1.1.16 00:10:5a:60:6b:24, IRQ 10 product code 5152 rev 00.12 date 10-18-98 Internal config register is 1800000, transceivers 0xa. 8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface. MII transceiver found at address 24, status 786d. Enabling bus-master transmits and whole-frame receives. 01:0a.0: scatter/gather enabled. h/w checksums enabled ne2k-pci.c:v1.02 10/19/2000 D. Becker/P. Gortmaker http://www.scyld.com/network/ne2k-pci.html eth4: KTI ET32P2 found at 0xb800, IRQ 9, 00:40:F6:74:0D:44. 802.1Q VLAN Support v1.8 Ben Greear <greearb> All bugs added by David S. Miller <davem> vlan2: add 01:00:5e:00:00:01 mcast address to master interface vlan3: add 01:00:5e:00:00:01 mcast address to master interface vlan4: add 01:00:5e:00:00:01 mcast address to master interface vlan6: add 01:00:5e:00:00:01 mcast address to master interface # iptables -V iptables v1.2.8 # uptime 12:56pm up 5 days, 20 min, 1 user, load average: 0.00, 0.00, 0.00 Note that the box is running with 2.4.21 + iptables 1.2.8 since the latest 5 days.
Latest news... I upgrade to iptables 1.2.8 few weeks ago, using Harald Welte rpms from: ftp://gnumonks.org/pub/rpms Kernel is still 2.4.21 Latest uptime was around 56 days, the double!!! I only could obtain around two weeks with netfilter 1.2.7 Anyway, the box is still having the same problems, it hangs after those days. At least that is what the other technician told me, I was on holidays so I can't verify that information.
Please verify this with a newer version of Red Hat Enterprise Linux or Fedora Core and reopen it against the new version if it still occurs. Closing as "not a bug" for now.