We installed new version of cluster software (cman-1.0.4) and got errors in dmesg when cman service try starting. === message from CMAN === CMAN 2.6.14.1-20051219.162641.FC5.10 (built Feb 2 2006 08:45:18) installed NET: Registered protocol family 30 CMAN: Waiting to join or form a Linux-cluster CMAN: sending membership request CMAN: sending membership request CMAN: got node z3 CMAN: sendmsg failed: -13 CMAN: sendmsg failed: -13 CMAN: quorum regained, resuming activity CMAN: sendmsg failed: -13 CMAN: send_queued_message failed, error -13 CMAN: sendmsg failed: -13 CMAN: send_queued_message failed, error -13 CMAN: sendmsg failed: -13 DLM 2.6.14.1-20051219.162641.FC5.10 (built Feb 2 2006 08:46:32) installed CMAN: sendmsg failed: -13 CMAN: sendmsg failed: -13 CMAN: No functional network interfaces, leaving cluster CMAN: sendmsg failed: -13 CMAN: sendmsg failed: -13 CMAN: we are leaving the cluster. Shutdown WARNING: dlm_emergency_shutdown WARNING: dlm_emergency_shutdown =========== Probably error in new code for sending cluster messages. I found message "Say something if sendmsg fails." at this url http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/cman-kernel/src/cnxman.c?rev=1.55&content-type=text/x-cvsweb-markup&cvsroot=cluster One of characteristics of our servers we use bonding interface: === ifconfig === bond0 Link encap:Ethernet HWaddr 00:0B:CD:EF:F1:D7 inet addr:x.x.x.x Bcast:x.x.x.x Mask:255.255.255.224 UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:186012691 errors:0 dropped:0 overruns:0 frame:0 TX packets:48577497 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:1141297473 (1.0 GiB) TX bytes:3641909279 (3.3 GiB) bond0:0 Link encap:Ethernet HWaddr 00:0B:CD:EF:F1:D7 inet addr:192.168.0.165 Bcast:192.168.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 eth0 Link encap:Ethernet HWaddr 00:0B:CD:EF:F1:D7 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:185935375 errors:0 dropped:0 overruns:0 frame:0 TX packets:48577497 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1135633654 (1.0 GiB) TX bytes:3641909279 (3.3 GiB) Interrupt:185 eth1 Link encap:Ethernet HWaddr 00:0B:CD:EF:F1:D7 UP BROADCAST RUNNING NOARP SLAVE MULTICAST MTU:1500 Metric:1 RX packets:77316 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:5663819 (5.4 MiB) TX bytes:0 (0.0 b) Interrupt:193 eth2 Link encap:Ethernet HWaddr 00:04:23:AA:E0:60 inet addr:10.65.73.31 Bcast:10.65.73.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:336794267 errors:0 dropped:0 overruns:0 frame:0 TX packets:110800 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3615713202 (3.3 GiB) TX bytes:7091200 (6.7 MiB) Base address:0x3000 Memory:f7de0000-f7e00000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:30059 errors:0 dropped:0 overruns:0 frame:0 TX packets:30059 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2694690 (2.5 MiB) TX bytes:2694690 (2.5 MiB) ======= Dmesg: ======== Linux version 2.6.15-prep (root@z4) (gcc version 4.0.2 20051125 (Red Hat 4.0.2-8)) #3 SMP Sun Jan 29 21:14:49 EST 2006 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009f000 (usable) BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000f3ffa000 (usable) BIOS-e820: 00000000f3ffa000 - 00000000f4000000 (ACPI data) BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved) BIOS-e820: 00000000ffc00000 - 0000000100000000 (reserved) 3007MB HIGHMEM available. 896MB LOWMEM available. found SMP MP-table at 000f4fd0 Using x86 segment limits to approximate NX protection On node 0 totalpages: 999418 DMA zone: 4096 pages, LIFO batch:0 DMA32 zone: 0 pages, LIFO batch:0 Normal zone: 225280 pages, LIFO batch:31 HighMem zone: 770042 pages, LIFO batch:31 DMI 2.3 present. ACPI: RSDP (v000 COMPAQ ) @ 0x000f4f70 ACPI: RSDT (v001 COMPAQ P29 0x00000002 <D2>^D 0x0000162e) @ 0xf3ffa000 ACPI: FADT (v001 COMPAQ P29 0x00000002 <D2>^D 0x0000162e) @ 0xf3ffa040 ACPI: MADT (v001 COMPAQ 00000083 0x00000002 0x00000000) @ 0xf3ffa100 ACPI: SPCR (v001 COMPAQ SPCRRBSU 0x00000001 <D2>^D 0x0000162e) @ 0xf3ffa1c0 ACPI: DSDT (v001 COMPAQ DSDT 0x00000001 MSFT 0x0100000b) @ 0x00000000 ACPI: Local APIC enabled (0). ACPI: Local APIC address 0xfee00000 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 15:2 APIC version 20 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] disabled) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] disabled) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled) Processor #6 15:2 APIC version 20 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) Processor #1 15:2 APIC version 20 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] disabled) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] disabled) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled) Processor #7 15:2 APIC version 20 ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-15 ACPI: IOAPIC (id[0x03] address[0xfec01000] gsi_base[16]) IOAPIC[1]: apic_id 3, version 17, address 0xfec01000, GSI 16-31 ACPI: IOAPIC (id[0x04] address[0xfec02000] gsi_base[32]) IOAPIC[2]: apic_id 4, version 17, address 0xfec02000, GSI 32-47 ACPI: IOAPIC (id[0x05] address[0xfec03000] gsi_base[48]) IOAPIC[3]: apic_id 5, version 17, address 0xfec03000, GSI 48-63 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. Enabling APIC mode: Flat. Using 4 I/O APICs LAPIC enabled (0), calling get_smp_config Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at f5000000 (gap: f4000000:0ac00000) Built 1 zonelists Kernel command line: ro root=/dev/md1 mapped APIC to ffffd000 (fee00000) mapped IOAPIC to ffffc000 (fec00000) mapped IOAPIC to ffffb000 (fec01000) mapped IOAPIC to ffffa000 (fec02000) mapped IOAPIC to ffff9000 (fec03000) Initializing CPU#0 PID hash table entries: 4096 (order: 12, 65536 bytes) Detected 3186.683 MHz processor. Using tsc for high-res timesource Console: colour VGA+ 80x25 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 3959780k/3997672k available (2257k kernel code, 36640k reserved, 649k data, 196k init, 3080168k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay using timer specific routine.. 6382.33 BogoMIPS (lpj=12764671) Mount-cache hash table entries: 512 CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 CPU: Trace cache: 12K uops, L1 D cache: 8K CPU: L2 cache: 512K CPU: L3 cache: 1024K CPU: Physical Processor ID: 0 CPU: After all inits, caps: bfebf3ff 00000000 00000000 00000080 00004400 00000000 00000000 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU0: Intel P4/Xeon Extended MCE MSRs (12) available CPU0: Thermal monitoring enabled mtrr: v2.0 (20020519) Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. CPU0: Intel(R) Xeon(TM) CPU 3.20GHz stepping 05 Booting processor 1/1 eip 2000 Initializing CPU#1 Calibrating delay using timer specific routine.. 6373.16 BogoMIPS (lpj=12746338) CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 CPU: Trace cache: 12K uops, L1 D cache: 8K CPU: L2 cache: 512K CPU: L3 cache: 1024K CPU: Physical Processor ID: 0 CPU: After all inits, caps: bfebf3ff 00000000 00000000 00000080 00004400 00000000 00000000 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#1. CPU1: Intel P4/Xeon Extended MCE MSRs (12) available CPU1: Thermal monitoring enabled CPU1: Intel(R) Xeon(TM) CPU 3.20GHz stepping 05 Booting processor 2/6 eip 2000 Initializing CPU#2 Calibrating delay using timer specific routine.. 6373.22 BogoMIPS (lpj=12746440) CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 CPU: Trace cache: 12K uops, L1 D cache: 8K CPU: L2 cache: 512K CPU: L3 cache: 1024K CPU: Physical Processor ID: 3 CPU: After all inits, caps: bfebf3ff 00000000 00000000 00000080 00004400 00000000 00000000 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#2. CPU2: Intel P4/Xeon Extended MCE MSRs (12) available CPU2: Thermal monitoring enabled CPU2: Intel(R) Xeon(TM) CPU 3.20GHz stepping 05 Booting processor 3/7 eip 2000 Initializing CPU#3 Calibrating delay using timer specific routine.. 6372.92 BogoMIPS (lpj=12745858) CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 CPU: Trace cache: 12K uops, L1 D cache: 8K CPU: L2 cache: 512K CPU: L3 cache: 1024K CPU: Physical Processor ID: 3 CPU: After all inits, caps: bfebf3ff 00000000 00000000 00000080 00004400 00000000 00000000 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#3. CPU3: Intel P4/Xeon Extended MCE MSRs (12) available CPU3: Thermal monitoring enabled CPU3: Intel(R) Xeon(TM) CPU 3.20GHz stepping 05 Total of 4 processors activated (25501.65 BogoMIPS). ENABLING IO-APIC IRQs ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1 checking TSC synchronization across 4 CPUs: passed. Brought up 4 CPUs checking if image is initramfs... it is Freeing initrd memory: 1032k freed NET: Registered protocol family 16 ACPI: bus type pci registered PCI: PCI BIOS revision 2.10 entry at 0xf0094, last bus=9 PCI: Using configuration type 1 mtrr: your CPUs had inconsistent fixed MTRR settings mtrr: probably your BIOS does not setup all CPUs. mtrr: corrected configuration. ACPI: Subsystem revision 20050902 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) PCI: Probing PCI hardware (bus 00) Boot video device is 0000:00:03.0 PCI: Ignoring BAR0-3 of IDE controller 0000:00:0f.1 ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Root Bridge [PCI1] (0000:01) PCI: Probing PCI hardware (bus 01) ACPI: PCI Interrupt Routing Table [\_SB_.PCI1._PRT] ACPI: PCI Root Bridge [PCI2] (0000:02) PCI: Probing PCI hardware (bus 02) ACPI: PCI Interrupt Routing Table [\_SB_.PCI2._PRT] ACPI: PCI Root Bridge [PCI3] (0000:03) PCI: Probing PCI hardware (bus 03) ACPI: PCI Interrupt Routing Table [\_SB_.PCI3._PRT] ACPI: PCI Root Bridge [PCI4] (0000:06) PCI: Probing PCI hardware (bus 06) ACPI: PCI Interrupt Routing Table [\_SB_.PCI4._PRT] ACPI: PCI Interrupt Link [IUSB] (IRQs 4 5 7 10 *11 15) ACPI: PCI Interrupt Link [IN16] (IRQs 4 5 7 10 11 15) *3 ACPI: PCI Interrupt Link [IN17] (IRQs 4 *5 7 10 11 15) ACPI: PCI Interrupt Link [IN18] (IRQs 4 5 *7 10 11 15) ACPI: PCI Interrupt Link [IN19] (IRQs 4 5 7 10 11 15) *0, disabled. ACPI: PCI Interrupt Link [IN20] (IRQs 4 5 *7 10 11 15) ACPI: PCI Interrupt Link [IN21] (IRQs 4 5 7 10 11 15) *0, disabled. ACPI: PCI Interrupt Link [IN22] (IRQs 4 5 7 10 11 15) *0, disabled. ACPI: PCI Interrupt Link [IN23] (IRQs 4 5 7 10 11 15) *0, disabled. ACPI: PCI Interrupt Link [IN24] (IRQs 4 5 7 10 11 15) *0, disabled. ACPI: PCI Interrupt Link [IN25] (IRQs 4 5 7 10 11 15) *0, disabled. ACPI: PCI Interrupt Link [IN26] (IRQs 4 5 7 *10 11 15) ACPI: PCI Interrupt Link [IN27] (IRQs 4 5 7 10 11 *15) ACPI: PCI Interrupt Link [IN28] (IRQs 4 5 7 10 11 15) *0, disabled. ACPI: PCI Interrupt Link [IN29] (IRQs 4 5 7 10 *11 15) ACPI: PCI Interrupt Link [IN30] (IRQs 4 5 7 10 11 15) *0, disabled. ACPI: PCI Interrupt Link [IN31] (IRQs 4 5 7 10 *11 15) ACPI: PCI Interrupt Link [IN32] (IRQs 4 5 7 10 11 15) *0, disabled. ACPI: PCI Interrupt Link [IN33] (IRQs 4 5 7 10 11 15) *0, disabled. ACPI: PCI Interrupt Link [IN34] (IRQs 4 5 7 10 11 15) *0, disabled. Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init pnp: PnP ACPI: found 8 devices SCSI subsystem initialized PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report PCI: Device 0000:00:00.0 not found by BIOS PCI: Device 0000:00:00.1 not found by BIOS PCI: Device 0000:00:00.2 not found by BIOS PCI: Device 0000:00:0f.0 not found by BIOS PCI: Device 0000:00:0f.3 not found by BIOS PCI: Device 0000:00:10.0 not found by BIOS PCI: Device 0000:00:10.2 not found by BIOS PCI: Device 0000:00:11.0 not found by BIOS PCI: Device 0000:00:11.2 not found by BIOS pnp: 00:00: ioport range 0xf50-0xf58 has been reserved pnp: 00:00: ioport range 0x408-0x40f has been reserved pnp: 00:00: ioport range 0x900-0x903 could not be reserved pnp: 00:00: ioport range 0x910-0x911 could not be reserved pnp: 00:00: ioport range 0x920-0x923 could not be reserved pnp: 00:00: ioport range 0x930-0x937 has been reserved pnp: 00:00: ioport range 0x940-0x947 has been reserved Machine check exception polling timer started. audit: initializing netlink socket (disabled) audit(1138616034.080:1): initialized highmem bounce pool size: 64 pages Total HugeTLB memory allocated, 0 VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 1024 (order 0, 4096 bytes) SGI XFS with ACLs, large block numbers, no debug enabled SGI XFS Quota Management subsystem Initializing Cryptographic API ksign: Installing public key data Loading keyring io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered ACPI: Thermal Zone [THM0] (8 C) Real Time Clock Driver v1.12 PNP: PS/2 Controller [PNP0303:KBD,PNP0f0e:PS2M] at 0x60,0x64 irq 1,12 serio: i8042 AUX port at 0x60,0x64 irq 12 serio: i8042 KBD port at 0x60,0x64 irq 1 RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize Fusion MPT base driver 3.03.04 Copyright (c) 1999-2005 LSI Logic Corporation Fusion MPT SPI Host driver 3.03.04 ACPI: PCI Interrupt 0000:06:02.0[A] -> GSI 26 (level, low) -> IRQ 169 mptbase: Initiating ioc0 bringup ioc0: 53C1030: Capabilities={Initiator,Target} scsi0 : ioc0: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=169 Vendor: COMPAQ Model: BD14685A26 Rev: HPB6 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sda: 286749488 512-byte hdwr sectors (146816 MB) SCSI device sda: drive cache: write through SCSI device sda: 286749488 512-byte hdwr sectors (146816 MB) SCSI device sda: drive cache: write through sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > sd 0:0:0:0: Attached scsi disk sda Vendor: COMPAQ Model: BD14686225 Rev: HPB6 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdb: 286749488 512-byte hdwr sectors (146816 MB) SCSI device sdb: drive cache: write through SCSI device sdb: 286749488 512-byte hdwr sectors (146816 MB) SCSI device sdb: drive cache: write through sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 sdb7 > sd 0:0:1:0: Attached scsi disk sdb Vendor: COMPAQ Model: BD14686225 Rev: HPB6 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdc: 286749488 512-byte hdwr sectors (146816 MB) SCSI device sdc: drive cache: write through SCSI device sdc: 286749488 512-byte hdwr sectors (146816 MB) SCSI device sdc: drive cache: write through sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 sdc6 sdc7 > sd 0:0:2:0: Attached scsi disk sdc Vendor: COMPAQ Model: BD14687B52 Rev: HPB5 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdd: 286749488 512-byte hdwr sectors (146816 MB) SCSI device sdd: drive cache: write through SCSI device sdd: 286749488 512-byte hdwr sectors (146816 MB) SCSI device sdd: drive cache: write through sdd: sdd1 sdd2 sdd3 sdd4 < sdd5 sdd6 sdd7 > sd 0:0:3:0: Attached scsi disk sdd Vendor: COMPAQ Model: BD14686225 Rev: HPB6 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sde: 286749488 512-byte hdwr sectors (146816 MB) SCSI device sde: drive cache: write through SCSI device sde: 286749488 512-byte hdwr sectors (146816 MB) SCSI device sde: drive cache: write through sde: sde1 sde2 sde3 sde4 < sde5 sde6 sde7 > sd 0:0:4:0: Attached scsi disk sde Vendor: COMPAQ Model: BD14686225 Rev: HPB6 Type: Direct-Access ANSI SCSI revision: 03 SCSI device sdf: 286749488 512-byte hdwr sectors (146816 MB) SCSI device sdf: drive cache: write through SCSI device sdf: 286749488 512-byte hdwr sectors (146816 MB) SCSI device sdf: drive cache: write through sdf: sdf1 sdf2 sdf3 sdf4 < sdf5 sdf6 sdf7 > sd 0:0:5:0: Attached scsi disk sdf Vendor: COMPAQ Model: PROLIANT 4LCI Rev: 1.84 Type: Processor ANSI SCSI revision: 02 ACPI: PCI Interrupt 0000:06:02.1[B] -> GSI 27 (level, low) -> IRQ 177 mptbase: Initiating ioc1 bringup ioc1: 53C1030: Capabilities={Initiator,Target} scsi1 : ioc1: LSI53C1030, FwRev=01032700h, Ports=1, MaxQ=255, IRQ=177 Vendor: Promise Model: 12 Disk RAID5 Rev: V0.0 Type: Direct-Access ANSI SCSI revision: 04 SCSI device sdg: 2674804352 1024-byte hdwr sectors (2739000 MB) SCSI device sdg: drive cache: write through SCSI device sdg: 2674804352 1024-byte hdwr sectors (2739000 MB) SCSI device sdg: drive cache: write through sdg: unknown partition table sd 1:0:1:0: Attached scsi disk sdg Vendor: Promise Model: 12 Disk RAID5 Rev: V0.0 Type: Direct-Access ANSI SCSI revision: 04 SCSI device sdh: 2674804352 1024-byte hdwr sectors (2739000 MB) SCSI device sdh: drive cache: write through SCSI device sdh: 2674804352 1024-byte hdwr sectors (2739000 MB) SCSI device sdh: drive cache: write through sdh: unknown partition table sd 1:0:3:0: Attached scsi disk sdh Fusion MPT misc device (ioctl) driver 3.03.04 mptctl: Registered with Fusion MPT base driver mptctl: /dev/mptctl @ (major,minor=10,220) mice: PS/2 mouse device common for all mice md: raid1 personality registered as nr 3 md: raid5 personality registered as nr 4 raid5: automatically using best checksumming function: pIII_sse pIII_sse : 3459.000 MB/sec raid5: using function: pIII_sse (3459.000 MB/sec) md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: bitmap version 4.39 device-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm-devel NET: Registered protocol family 2 input: AT Translated Set 2 keyboard as /class/input/input0 IP route cache hash table entries: 131072 (order: 7, 524288 bytes) TCP established hash table entries: 524288 (order: 10, 4194304 bytes) TCP bind hash table entries: 65536 (order: 7, 524288 bytes) TCP: Hash tables configured (established 524288 bind 65536) TCP reno registered TCP bic registered Initializing IPsec netlink socket NET: Registered protocol family 1 NET: Registered protocol family 17 Starting balanced_irq Using IPI Shortcut mode Freeing unused kernel memory: 196k freed Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx md: Autodetecting RAID arrays. md: autorun ... md: considering sdf7 ... md: adding sdf7 ... md: sdf5 has different UUID to sdf7 md: sdf3 has different UUID to sdf7 md: sdf2 has different UUID to sdf7 md: sdf1 has different UUID to sdf7 md: adding sde7 ... md: sde5 has different UUID to sdf7 md: sde3 has different UUID to sdf7 md: sde2 has different UUID to sdf7 md: sde1 has different UUID to sdf7 md: adding sdd7 ... md: sdd5 has different UUID to sdf7 md: sdd3 has different UUID to sdf7 md: sdd2 has different UUID to sdf7 md: sdd1 has different UUID to sdf7 md: adding sdc7 ... md: sdc5 has different UUID to sdf7 md: sdc3 has different UUID to sdf7 md: sdc2 has different UUID to sdf7 md: sdc1 has different UUID to sdf7 md: adding sdb7 ... md: sdb5 has different UUID to sdf7 md: sdb3 has different UUID to sdf7 md: sdb2 has different UUID to sdf7 md: sdb1 has different UUID to sdf7 md: adding sda7 ... md: sda5 has different UUID to sdf7 md: sda3 has different UUID to sdf7 md: sda2 has different UUID to sdf7 md: sda1 has different UUID to sdf7 md: created md4 md: bind<sda7> md: bind<sdb7> md: bind<sdc7> md: bind<sdd7> md: bind<sde7> md: bind<sdf7> md: running: <sdf7><sde7><sdd7><sdc7><sdb7><sda7> raid5: device sdf7 operational as raid disk 5 raid5: device sde7 operational as raid disk 4 raid5: device sdd7 operational as raid disk 3 raid5: device sdc7 operational as raid disk 2 raid5: device sdb7 operational as raid disk 1 raid5: device sda7 operational as raid disk 0 raid5: allocated 6285kB for md4 raid5: raid level 5 set md4 active with 6 out of 6 devices, algorithm 2 RAID5 conf printout: --- rd:6 wd:6 fd:0 disk 0, o:1, dev:sda7 disk 1, o:1, dev:sdb7 disk 2, o:1, dev:sdc7 disk 3, o:1, dev:sdd7 disk 4, o:1, dev:sde7 disk 5, o:1, dev:sdf7 md: considering sdf5 ... md: adding sdf5 ... md: sdf3 has different UUID to sdf5 md: sdf2 has different UUID to sdf5 md: sdf1 has different UUID to sdf5 md: adding sde5 ... md: sde3 has different UUID to sdf5 md: sde2 has different UUID to sdf5 md: sde1 has different UUID to sdf5 md: adding sdd5 ... md: sdd3 has different UUID to sdf5 md: sdd2 has different UUID to sdf5 md: sdd1 has different UUID to sdf5 md: adding sdc5 ... md: sdc3 has different UUID to sdf5 md: sdc2 has different UUID to sdf5 md: sdc1 has different UUID to sdf5 md: adding sdb5 ... md: sdb3 has different UUID to sdf5 md: sdb2 has different UUID to sdf5 md: sdb1 has different UUID to sdf5 md: adding sda5 ... md: sda3 has different UUID to sdf5 md: sda2 has different UUID to sdf5 md: sda1 has different UUID to sdf5 md: created md3 md: bind<sda5> md: bind<sdb5> md: bind<sdc5> md: bind<sdd5> md: bind<sde5> md: bind<sdf5> md: running: <sdf5><sde5><sdd5><sdc5><sdb5><sda5> md: md3: raid array is not clean -- starting background reconstruction raid5: device sdf5 operational as raid disk 5 raid5: device sde5 operational as raid disk 4 raid5: device sdd5 operational as raid disk 3 raid5: device sdc5 operational as raid disk 2 raid5: device sdb5 operational as raid disk 1 raid5: device sda5 operational as raid disk 0 raid5: allocated 6285kB for md3 raid5: raid level 5 set md3 active with 6 out of 6 devices, algorithm 2 RAID5 conf printout: --- rd:6 wd:6 fd:0 disk 0, o:1, dev:sda5 disk 1, o:1, dev:sdb5 disk 2, o:1, dev:sdc5 disk 3, o:1, dev:sdd5 disk 4, o:1, dev:sde5 disk 5, o:1, dev:sdf5 md: considering sdf3 ... md: adding sdf3 ... md: sdf2 has different UUID to sdf3 md: sdf1 has different UUID to sdf3 md: adding sde3 ... md: sde2 has different UUID to sdf3 md: sde1 has different UUID to sdf3 md: adding sdd3 ... md: sdd2 has different UUID to sdf3 md: sdd1 has different UUID to sdf3 md: adding sdc3 ... md: sdc2 has different UUID to sdf3 md: sdc1 has different UUID to sdf3 md: adding sdb3 ... md: sdb2 has different UUID to sdf3 md: sdb1 has different UUID to sdf3 md: adding sda3 ... md: sda2 has different UUID to sdf3 md: sda1 has different UUID to sdf3 md: syncing RAID array md3 md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction. md: using 128k window, over a total of 1742912 blocks. md: created md2 md: bind<sda3> md: bind<sdb3> md: bind<sdc3> md: bind<sdd3> md: bind<sde3> md: bind<sdf3> md: running: <sdf3><sde3><sdd3><sdc3><sdb3><sda3> raid5: device sdf3 operational as raid disk 5 raid5: device sde3 operational as raid disk 4 raid5: device sdd3 operational as raid disk 3 raid5: device sdc3 operational as raid disk 2 raid5: device sdb3 operational as raid disk 1 raid5: device sda3 operational as raid disk 0 raid5: allocated 6285kB for md2 raid5: raid level 5 set md2 active with 6 out of 6 devices, algorithm 2 RAID5 conf printout: --- rd:6 wd:6 fd:0 disk 0, o:1, dev:sda3 disk 1, o:1, dev:sdb3 disk 2, o:1, dev:sdc3 disk 3, o:1, dev:sdd3 disk 4, o:1, dev:sde3 disk 5, o:1, dev:sdf3 md: considering sdf2 ... md: adding sdf2 ... md: sdf1 has different UUID to sdf2 md: adding sde2 ... md: sde1 has different UUID to sdf2 md: adding sdd2 ... md: sdd1 has different UUID to sdf2 md: adding sdc2 ... md: sdc1 has different UUID to sdf2 md: adding sdb2 ... md: sdb1 has different UUID to sdf2 md: adding sda2 ... md: sda1 has different UUID to sdf2 md: created md1 md: bind<sda2> md: bind<sdb2> md: bind<sdc2> md: bind<sdd2> md: bind<sde2> md: bind<sdf2> md: running: <sdf2><sde2><sdd2><sdc2><sdb2><sda2> raid5: device sdf2 operational as raid disk 5 raid5: device sde2 operational as raid disk 4 raid5: device sdd2 operational as raid disk 3 raid5: device sdc2 operational as raid disk 2 raid5: device sdb2 operational as raid disk 1 raid5: device sda2 operational as raid disk 0 raid5: allocated 6285kB for md1 raid5: raid level 5 set md1 active with 6 out of 6 devices, algorithm 2 RAID5 conf printout: --- rd:6 wd:6 fd:0 disk 0, o:1, dev:sda2 disk 1, o:1, dev:sdb2 disk 2, o:1, dev:sdc2 disk 3, o:1, dev:sdd2 disk 4, o:1, dev:sde2 disk 5, o:1, dev:sdf2 md: considering sdf1 ... md: adding sdf1 ... md: adding sde1 ... md: adding sdd1 ... md: adding sdc1 ... md: adding sdb1 ... md: adding sda1 ... md: created md0 md: bind<sda1> md: bind<sdb1> md: bind<sdc1> md: bind<sdd1> md: bind<sde1> md: bind<sdf1> md: running: <sdf1><sde1><sdd1><sdc1><sdb1><sda1> raid1: raid set md0 active with 6 out of 6 mirrors md: ... autorun DONE. md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. EXT3-fs: INFO: recovery required on readonly filesystem. EXT3-fs: write access will be enabled during recovery. kjournald starting. Commit interval 5 seconds EXT3-fs: md1: orphan cleanup on readonly fs ext3_orphan_cleanup: deleting unreferenced inode 84028 EXT3-fs: md1: 1 orphan inode deleted EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. usbcore: registered new driver usbfs usbcore: registered new driver hub sd 0:0:0:0: Attached scsi generic sg0 type 0 sd 0:0:1:0: Attached scsi generic sg1 type 0 sd 0:0:2:0: Attached scsi generic sg2 type 0 sd 0:0:3:0: Attached scsi generic sg3 type 0 sd 0:0:4:0: Attached scsi generic sg4 type 0 sd 0:0:5:0: Attached scsi generic sg5 type 0 0:0:15:0: Attached scsi generic sg6 type 3 sd 1:0:1:0: Attached scsi generic sg7 type 0 sd 1:0:3:0: Attached scsi generic sg8 type 0 SvrWks CSB5: IDE controller at PCI slot 0000:00:0f.1 SvrWks CSB5: chipset revision 147 SvrWks CSB5: not 100% native mode: will probe irqs later SvrWks CSB5: simplex device: DMA forced ide0: BM-DMA at 0x2000-0x2007, BIOS settings: hda:pio, hdb:pio SvrWks CSB5: simplex device: DMA forced ide1: BM-DMA at 0x2008-0x200f, BIOS settings: hdc:pio, hdd:pio Probing IDE interface ide0... hda: COMPAQ CD-ROM SN-124, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... Ethernet Channel Bonding Driver: v2.6.5 (November 4, 2005) bonding: MII link monitoring set to 100 ms tg3.c:v3.47 (Dec 28, 2005) ACPI: PCI Interrupt 0000:02:01.0[A] -> GSI 29 (level, low) -> IRQ 185 eth0: Tigon3 [partno(NA) rev 1002 PHY(5703)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:0b:cd:ef:f1:d7 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1] eth0: dma_rwctrl[769f4000] ACPI: PCI Interrupt 0000:02:02.0[A] -> GSI 31 (level, low) -> IRQ 193 eth1: Tigon3 [partno(NA) rev 1002 PHY(5703)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:0b:cd:ef:f1:d6 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1] eth1: dma_rwctrl[769f4000] Intel(R) PRO/1000 Network Driver - version 6.1.16-k2-NAPI Copyright (c) 1999-2005 Intel Corporation. ACPI: PCI Interrupt 0000:03:01.0[A] -> GSI 20 (level, low) -> IRQ 201 e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection pci_hotplug: PCI Hot Plug PCI Core version: 0.5 cpqphp: Compaq Hot Plug PCI Controller Driver version: 0.9.8 ACPI: PCI Interrupt 0000:06:1e.0[A] -> GSI 18 (level, low) -> IRQ 209 cpqphp: Hot Plug Subsystem Device ID: a2fe cpqphp: Initializing the PCI hot plug controller residing on PCI bus 6 PCI: Using BIOS Interrupt Routing Table PCI: Using BIOS Interrupt Routing Table piix4_smbus 0000:00:0f.0: Found 0000:00:0f.0 device piix4_smbus 0000:00:0f.0: Working around buggy BIOS (I2C) ohci_hcd: 2005 April 22 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI) ACPI: PCI Interrupt Link [IUSB] enabled at IRQ 11 ACPI: PCI Interrupt 0000:00:0f.2[A] -> Link [IUSB] -> GSI 11 (level, low) -> IRQ 11 ohci_hcd 0000:00:0f.2: OHCI Host Controller ohci_hcd 0000:00:0f.2: new USB bus registered, assigned bus number 1 ohci_hcd 0000:00:0f.2: irq 11, io mem 0xf5ef0000 hub 1-0:1.0: USB hub found hub 1-0:1.0: 4 ports detected md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. EXT3 FS on md1, internal journal kjournald starting. Commit interval 5 seconds EXT3 FS on md0, internal journal EXT3-fs: mounted filesystem with ordered data mode. XFS mounting filesystem md2 Starting XFS recovery on filesystem: md2 (logdev: internal) Ending XFS recovery on filesystem: md2 (logdev: internal) XFS mounting filesystem md3 Starting XFS recovery on filesystem: md3 (logdev: internal) Ending XFS recovery on filesystem: md3 (logdev: internal) XFS mounting filesystem md4 Starting XFS recovery on filesystem: md4 (logdev: internal) Ending XFS recovery on filesystem: md4 (logdev: internal) Adding 1469908k swap on /dev/sda6. Priority:1 extents:1 across:1469908k Adding 1469908k swap on /dev/sdb6. Priority:1 extents:1 across:1469908k Adding 1469908k swap on /dev/sdc6. Priority:1 extents:1 across:1469908k Adding 1469908k swap on /dev/sdd6. Priority:1 extents:1 across:1469908k Adding 1469908k swap on /dev/sde6. Priority:1 extents:1 across:1469908k Adding 1469908k swap on /dev/sdf6. Priority:1 extents:1 across:1469908k ======= Installed rpm packages: ===== [root@z4 ~]# rpm -qa | egrep -i "gfs|cman|dlm|magma|ccsd|gnbd|gulm" | sort cman-1.0.4-0.FC5.1 cman-devel-1.0.4-0.FC5.1 cman-kernel-2.6.14.1-20051219.162641.FC5.10 cman-kernel-smp-2.6.14.1-20051219.162641.FC5.10 cman-kernheaders-2.6.14.1-20051219.162641.FC5.10 dlm-1.0.0-9.FC5 dlm-devel-1.0.0-9.FC5 dlm-kernel-2.6.14.1-20051219.162641.FC5.8 dlm-kernel-smp-2.6.14.1-20051219.162641.FC5.8 dlm-kernheaders-2.6.14.1-20051219.162641.FC5.8 GFS-6.1.4-0.FC5.1 GFS-kernel-2.6.14.1-20051219.162641.FC5.9 GFS-kernel-smp-2.6.14.1-20051219.162641.FC5.9 GFS-kernheaders-2.6.14.1-20051219.162641.FC5.9 gnbd-1.0.2-0.2 gnbd-kernel-2.6.14.0-20051108.134753.FC5.14 gnbd-kernel-smp-2.6.14.0-20051108.134753.FC5.14 gnbd-kernheaders-2.6.14.0-20051108.134753.FC5.14 gulm-1.0.5-0.FC5.1 gulm-devel-1.0.5-0.FC5.1 magma-1.0.3-3.1 magma-devel-1.0.3-3.1 magma-plugins-1.0.5-0.FC5.1 ========= === cluster.conf === <?xml version="1.0"?> <cluster name="test" config_version="1"> <cman two_node="1" expected_votes="1"> </cman> <clusternodes> <clusternode name="z3" votes="1"> <fence> <method name="1"> <device name="HPiLO_z3"/> </method> </fence> </clusternode> <clusternode name="z4" votes="1"> <fence> <method name="1"> <device name="HPiLO_z4"/> </method> </fence> </clusternode> </clusternodes> <fence_devices> <fencedevice agent="fence_ilo" hostname="10.2.3.4" name="HPiLO_z4" login="claman" passwd="6cdgyvkBWblhhor9"/> <fencedevice agent="fence_ilo" hostname="10.2.3.3" name="HPiLO_z3" login="claman" passwd="6SpdmkmwttwgKv"/> </fence_devices> </cluster> ======= With cman-1.0.0 all work fine.
Sounds like you've got your kernel & userspace out of step. Make sure you are using the latest cman_tool with that kernel.
Yep. the userland packages are definitely out-of date. But the kernel packages are more recent.
With new packages I got same error. I try different versions, last working is 1.01.00 Probably problem in following changes: diff -i cnxman.c cnxman.c.new --- cnxman.c 2005-10-03 16:01:13.000000000 -0400 +++ cnxman.c.new 2006-02-20 10:32:27.000000000 -0500 @@ -32,7 +32,7 @@ #include "sm_user.h" #include "config.h" -#define CMAN_RELEASE_NAME "1.01.00" +#define CMAN_RELEASE_NAME "2.6.15.0-20051219.162641.FC5.11.7" static void process_incoming_packet(struct cl_comms_socket *csock, struct msghdr *msg, struct kvec *vec, int veclen, int len); @@ -55,6 +55,7 @@ static int send_or_queue_message(struct socket *sock, void *buf, int len, struct sockaddr_cl *caddr, unsigned int flags); static struct cl_comms_socket *get_next_interface(struct cl_comms_socket *cur); +static struct cl_comms_socket *get_peer_interface(int if_num, int mcast); static void check_for_unacked_nodes(void); static void free_cluster_sockets(void); static uint16_t generate_cluster_id(char *name); @@ -859,7 +860,7 @@ /* Have we received this message before ? If so just ignore it, it's a * resend for someone else's benefit */ if (!(flags & MSG_NOACK) && - rem_node && le16_to_cpu(header->seq) == rem_node->last_seq_recv) { + rem_node && ((short)le16_to_cpu(header->seq) <= (short)rem_node->last_seq_recv)) { P_COMMS ("Discarding message - Already seen this sequence number %d\n", rem_node->last_seq_recv); @@ -1168,6 +1169,7 @@ static int add_clsock(int broadcast, int number, struct socket *sock, struct file *file) { + struct cl_comms_socket *peer; struct cl_comms_socket *newsock = kmalloc(sizeof (struct cl_comms_socket), GFP_KERNEL); if (!newsock) @@ -1198,9 +1200,17 @@ &newsock->addr_len, 0); num_interfaces = max(num_interfaces, newsock->number); - if (!current_interface && newsock->broadcast) + if (!current_interface && newsock->recv_only) current_interface = newsock; + /* Get peer, if this fails because we're the first socket with this + number then that's fine. The subsequent call will fill in both */ + peer = get_peer_interface(number, !broadcast); + if (peer) { + peer->peer = newsock; + newsock->peer = peer; + } + /* Hook data_ready */ newsock->sock->sk->sk_data_ready = cnxman_data_ready; @@ -1754,14 +1764,14 @@ if (!atomic_read(&cnxman_running)) return -ENOTCONN; - if (!we_are_a_cluster_member) - return -ENOENT; + /* FORCE overrides several checks */ + if (!(leave_flags & CLUSTER_LEAVEFLAG_FORCE)) { + if (!we_are_a_cluster_member) + return -ENOENT; - if (in_transition()) - return -EBUSY; + if (in_transition()) + return -EBUSY; - /* Ignore the use count if FORCE is set */ - if (!(leave_flags & CLUSTER_LEAVEFLAG_FORCE)) { if (atomic_read(&use_count)) return -ENOTCONN; } @@ -2018,8 +2028,8 @@ vec[0].iov_len = saved_msg_len; memset(&msg, 0, sizeof (msg)); - msg.msg_name = ¤t_interface->saddr; - msg.msg_namelen = current_interface->addr_len; + msg.msg_name = ¤t_interface->peer->saddr; + msg.msg_namelen = current_interface->peer->addr_len; result = kernel_sendmsg(current_interface->sock, &msg, vec, 1, saved_msg_len); @@ -2126,22 +2136,22 @@ struct sockaddr_in6 daddr; struct cl_comms_socket *clsock; int result = 0; - int errors = 0; + static int errors = 0; our_msg->msg_name = &daddr; list_for_each_entry(clsock, &socket_list, list) { - /* Don't send out a recv-only socket */ - if (!clsock->recv_only) { + /* Don't send out of a broadcast socket */ + if (clsock->recv_only) { /* For temporary node IDs send to the node's real IP address */ if (nodeid < 0) { get_addr_from_temp_nodeid(nodeid, (char *)&daddr, &our_msg->msg_namelen); } else { - memcpy(&daddr, &clsock->saddr, clsock->addr_len); - our_msg->msg_namelen = clsock->addr_len; + memcpy(&daddr, &clsock->peer->saddr, clsock->peer->addr_len); + our_msg->msg_namelen = clsock->peer->addr_len; } result = __send_and_save(clsock, our_msg, vec, veclen, @@ -2149,11 +2159,13 @@ !(flags & MSG_NOACK)); if (result < 0) errors++; + else + errors = 0; } } /* If all the interfaces error then die */ - if (errors == num_interfaces) { + if (errors >= num_interfaces * cman_config.max_retries) { printk(KERN_ERR CMAN_NAME ": No functional network interfaces, leaving cluster\n"); quit_threads = 1; wake_up_interruptible(&cnxman_waitq); @@ -2347,8 +2359,8 @@ else { /* Send to only the current socket - resends will use the * others if necessary */ - our_msg.msg_name = ¤t_interface->saddr; - our_msg.msg_namelen = current_interface->addr_len; + our_msg.msg_name = ¤t_interface->peer->saddr; + our_msg.msg_namelen = current_interface->peer->addr_len; result = __send_and_save(current_interface, &our_msg, @@ -3092,7 +3104,7 @@ struct cl_comms_socket *sock; sock = list_entry(socklist, struct cl_comms_socket, list); - if (!sock->recv_only && sock->number == next) + if (sock->recv_only && sock->number == next) return sock; } @@ -3100,6 +3112,22 @@ return NULL; } +static struct cl_comms_socket *get_peer_interface(int if_num, int mcast) +{ + struct list_head *socklist; + + list_for_each(socklist, &socket_list) { + struct cl_comms_socket *sock; + sock = list_entry(socklist, struct cl_comms_socket, list); + + if (sock->broadcast == mcast && sock->number == if_num) + return sock; + } + + return NULL; +} + + /* MUST be called with the barrier list lock held */ static struct cl_barrier *find_barrier(char *name) {
Yes, that's a fix for bz#166752 The real problem here is that the kernel & userland are built from different CVS branches.
I tryed last binary rpm from http://download.fedora.redhat.com/pub/fedora/linux/core/test/4.92/i386/os/Fedora/RPMS/ and still got error :( CMAN: sendmsg failed: -13 CMAN: sendmsg failed: -13 CMAN: No functional network interfaces, leaving cluster CMAN: sendmsg failed: -13 CMAN: sendmsg failed: -13 CMAN: we are leaving the cluster. Shutdown WARNING: dlm_emergency_shutdown WARNING: dlm_emergency_shutdown
Can you try this rpm? And let me know if works for you? http://download.fedora.redhat.com/pub/fedora/linux/core/development/i386/Fedora/RPMS/cman-1.0.5-0.FC5.0.i386.rpm
No luck. We tested this userspace tool **with latest development version of cman_kernel**: http://download.fedora.redhat.com/pub/fedora/linux/core/development/SRPMS/cman-kernel-2.6.15.0-20051219.162641.FC5.11.7.src.rpm But this cman_tool, like previous, works well with old cman kernel module (without described changes in cnxman.c) so we think problem actually not in userspace tool but in kernel module. Should this bug be moved to "cman_kernel" component?
Err, chris. the source RPM has an old cman_tool binary in it! So "make" isn't building the new one from source ! jeltz:~/dev/rpms/cman/devel$ rm -rf cman-1.0.5 jeltz:~/dev/rpms/cman/devel$ tar -xzf cman-1.0.5.tar.gz jeltz:~/dev/rpms/cman/devel$ cd cman-1.0.5 jeltz:~/dev/rpms/cman/devel/cman-1.0.5$ cd cman_tool/ jeltz:~/dev/rpms/cman/devel/cman-1.0.5/cman_tool$ make make: Nothing to be done for `all'. jeltz:~/dev/rpms/cman/devel/cman-1.0.5/cman_tool$ ls -l total 180 -rwxrwxr-x 1 patrick patrick 49304 Mar 1 20:14 cman_tool -rw-r--r-- 1 patrick patrick 2328 Mar 1 20:14 cman_tool.h drwxrwxr-x 2 patrick patrick 4096 Mar 1 20:14 CVS -rw-r--r-- 1 patrick patrick 12703 Mar 1 20:14 join.c -rw-r--r-- 1 patrick patrick 13214 Mar 1 20:14 join_ccs.c -rw-rw-r-- 1 patrick patrick 10388 Mar 1 20:14 join_ccs.o -rw-rw-r-- 1 patrick patrick 18020 Mar 1 20:14 join.o -rw-r--r-- 1 patrick patrick 17126 Mar 1 20:14 main.c -rw-rw-r-- 1 patrick patrick 25652 Mar 1 20:14 main.o -rw-r--r-- 1 patrick patrick 1567 Mar 1 20:14 Makefile
I've found the mistake, I've built a new rpm and will update the bug when it has been put into fc5.
The new package has been built and tested, it is cman-1.0.5.FC5.1 and should be appearing online soon. Please let me know if the updated packages work for you.
Same rezults - cman ok, cman_kernel bad: We've tested this version of cman_tool and init script and it works with our custom 2.6.15 kernel with modified cman_kernel sources (rollback to 1.01.00 in cnxman.c as described above https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=179734#c3) However with available cman_kernel 2.6.15.0-20051219.162641.FC5.11.7 (http://download.fedora.redhat.com/pub/fedora/linux/core/development/SRPMS/cman-kernel-2.6.15.0-20051219.162641.FC5.11.7.src.rpm), which is same as 2.6.14.1-20051219.162641.FC5.10 except kernel requirements in .spec file, cluster doesn't working, cman is complaining about missing interfaces as described at https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=179734#c5 Maybe this bug should be moved/crossposted to cman_kernel?
Can you double-check that you have the right cman_tool binary? If you have the updated one then running 'strace cman_tool join' should show two calls to setsockopt(<n>, SOL_SOCKET, SO_BROADCAST, [1], 4). If you're only seeing one then its an old binary. If you are seeing two then it should work. The binary in the latest package certainly does 2 on my machine. If you're seeing 2 such calls and it isn't working, can you attach the strace to this bugzilla please ?
Can you provide the output of the following two commands: rpm -q cman rpm -q cman-kernel Thanks!
(In reply to comment #12) > Can you double-check that you have the right cman_tool binary? I'll try with strace later. Binary is correct, compiled from src.rpm
(In reply to comment #13) > Can you provide the output of the following two commands: > rpm -q cman cman-1.0.5-0.FC5.1 since fix https://bugzilla.redhat.com/bugzilla/process_bug.cgi#c10 > rpm -q cman-kernel cman-kernel-2.6.14.1-20051219.162641.FC5.10 it is original version with patched spec file to compile with custom 2.6.15-1.1826.2.10_FC5 kernel
It works fine for me on FC5t3 with those latest RPMS: $ rpm -qa|grep cman cman-kernel-2.6.15.0-20051219.162641.FC5.11.7 cman-1.0.5-0.FC5.1
Based on the date this bug was created, it appears to have been reported against rawhide during the development of a Fedora release that is no longer maintained. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained. If this bug remains in NEEDINFO thirty (30) days from now, we will automatically close it. If you can reproduce this bug in a maintained Fedora version (7, 8, or rawhide), please change this bug to the respective version and change the status to ASSIGNED. (If you're unable to change the bug's version or status, add a comment to the bug and someone will change it for you.) Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again.
This bug has been in NEEDINFO for more than 30 days since feedback was first requested. As a result we are closing it. If you can reproduce this bug in the future against a maintained Fedora version please feel free to reopen it against that version. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp