Bug 763694 (GLUSTER-1962)

Summary: 3.1GA Platform fails "add server" handshake
Product: [Retired] GlusterSP Reporter: Allen Lu <allen>
Component: coreAssignee: Balamurugan Arumugam <bala>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: urgent    
Version: 3.1.0CC: platform
Target Milestone: 3.1.1   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Allen Lu 2010-10-15 02:18:52 UTC
Testing on vmware shows that the first node and new node initially connects but fails to continue.

transport.log

[2010-10-14 19:14:13.843620] I [/usr/lib64/Agents/Utils.py:251:runCommand()]: runCommand(): execution status of command [['ping', '-qnc', '1', '10.1.10.164']] = [{'Status': 1, 'Stderr': '', 'Stdout': 'PING 10.1.10.164 (10.1.10.164) 56(84) bytes of data.\n\n--- 10.1.10.164 ping statistics ---\n1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 3000ms\n\n'}]
[2010-10-14 19:14:13.845435] I [/usr/lib64/Agents/Utils.py:168:runCommandBG()]: runCommandBG(): Trying to execute command [mv -f /tmp/GSPSAFIBuQn /etc/dnsmasq.d/dhcp.conf]
[2010-10-14 19:14:25.817373] E [/usr/sbin/transport.py:92:<module>]: session timed out



The first node and new node waits on each other forever. 
Reproduced on 10.1.10.161 and 162 on vm cloud server 10.1.10.210 under resource 'allen' and "platform1" "platform2".

Comment 1 Allen Lu 2010-10-15 14:13:36 UTC
Unfortunately adjusting memory did not work. I wiped out the previous config and tried at 1GB, then again with 2GB. The first node is hanging after the second node displays "Now installation will be controlled by Gluster management console (10.1.10.161)"

# tail -f transport.log
[2010-10-15 10:05:26.752972] I [/usr/lib64/Agents/Utils.py:168:runCommandBG()]: runCommandBG(): Trying to execute command [rm -f /etc/dnsmasq.d/dhcp.conf]
[2010-10-15 10:05:41.720319] E [/usr/sbin/transport.py:92:<module>]: session timed out
[2010-10-15 10:07:47.197832] I [/usr/lib64/Agents/Utils.py:468:getInstalledServerCount()]: failed to read file /GLUSTER/servers/$installer$/installed-server-count
[2010-10-15 10:07:47.198580] I [/usr/lib64/Agents/Utils.py:468:getInstalledServerCount()]: failed to read file /GLUSTER/servers/$installer$/installed-server-count
[2010-10-15 10:07:47.200404] I [/usr/lib64/Agents/Utils.py:168:runCommandBG()]: runCommandBG(): Trying to execute command [['ping', '-qnc', '1', '10.1.10.165']]
[2010-10-15 10:07:47.203106] I [/usr/lib64/Agents/Utils.py:251:runCommand()]: runCommand(): execution status of command [['ping', '-qnc', '1', '10.1.10.165']] = [{'Status': 0, 'Stderr': '', 'Stdout': 'PING 10.1.10.165 (10.1.10.165) 56(84) bytes of data.\n\n--- 10.1.10.165 ping statistics ---\n1 packets transmitted, 1 received, 0% packet loss, time 0ms\nrtt min/avg/max/mdev = 0.275/0.275/0.275/0.000 ms\n'}]
[2010-10-15 10:07:47.203994] I [/usr/lib64/Agents/Utils.py:168:runCommandBG()]: runCommandBG(): Trying to execute command [['ping', '-qnc', '1', '10.1.10.164']]
[2010-10-15 10:07:50.206922] I [/usr/lib64/Agents/Utils.py:251:runCommand()]: runCommand(): execution status of command [['ping', '-qnc', '1', '10.1.10.164']] = [{'Status': 1, 'Stderr': '', 'Stdout': 'PING 10.1.10.164 (10.1.10.164) 56(84) bytes of data.\n\n--- 10.1.10.164 ping statistics ---\n1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 3000ms\n\n'}]
[2010-10-15 10:07:50.208829] I [/usr/lib64/Agents/Utils.py:168:runCommandBG()]: runCommandBG(): Trying to execute command [mv -f /tmp/GSPSAQnw0hc /etc/dnsmasq.d/dhcp.conf]
[2010-10-15 10:08:02.177537] E [/usr/sbin/transport.py:92:<module>]: session timed out

dmesg on console node
pciehp 0000:00:17.0:pcie04: HPC vendor_id 15ad device_id 7a0 ss_vid 0 ss_did 0 pciehp 0000:00:17.0:pcie04: service driver pciehp loaded pciehp 0000:00:17.1:pcie04: HPC vendor_id 15ad device_id 7a0 ss_vid 0 ss_did 0 pciehp 0000:00:17.1:pcie04: service driver pciehp loaded pciehp 0000:00:17.2:pcie04: HPC vendor_id 15ad device_id 7a0 ss_vid 0 ss_did 0 pciehp 0000:00:17.2:pcie04: service driver pciehp loaded pciehp 0000:00:17.3:pcie04: HPC vendor_id 15ad device_id 7a0 ss_vid 0 ss_did 0 pciehp 0000:00:17.3:pcie04: service driver pciehp loaded pciehp 0000:00:17.4:pcie04: HPC vendor_id 15ad device_id 7a0 ss_vid 0 ss_did 0 pciehp 0000:00:17.4:pcie04: service driver pciehp loaded pciehp 0000:00:17.5:pcie04: HPC vendor_id 15ad device_id 7a0 ss_vid 0 ss_did 0 pciehp 0000:00:17.5:pcie04: service driver pciehp loaded pciehp 0000:00:17.6:pcie04: HPC vendor_id 15ad device_id 7a0 ss_vid 0 ss_did 0 pciehp 0000:00:17.6:pcie04: service driver pciehp loaded pciehp 0000:00:17.7:pcie04: HPC vendor_id 15ad device_id 7a0 ss_vid 0 ss_did 0 pciehp 0000:00:17.7:pcie04: service driver pciehp loaded pciehp 0000:00:18.0:pcie04: HPC vendor_id 15ad device_id 7a0 ss_vid 0 ss_did 0 pciehp 0000:00:18.0:pcie04: service driver pciehp loaded pciehp 0000:00:18.1:pcie04: HPC vendor_id 15ad device_id 7a0 ss_vid 0 ss_did 0 pciehp 0000:00:18.1:pcie04: service driver pciehp loaded pciehp 0000:00:18.2:pcie04: HPC vendor_id 15ad device_id 7a0 ss_vid 0 ss_did 0 pciehp 0000:00:18.2:pcie04: service driver pciehp loaded pciehp 0000:00:18.3:pcie04: HPC vendor_id 15ad device_id 7a0 ss_vid 0 ss_did 0 pciehp 0000:00:18.3:pcie04: service driver pciehp loaded pciehp 0000:00:18.4:pcie04: HPC vendor_id 15ad device_id 7a0 ss_vid 0 ss_did 0 pciehp 0000:00:18.4:pcie04: service driver pciehp loaded pciehp 0000:00:18.5:pcie04: HPC vendor_id 15ad device_id 7a0 ss_vid 0 ss_did 0 pciehp 0000:00:18.5:pcie04: service driver pciehp loaded pciehp 0000:00:18.6:pcie04: HPC vendor_id 15ad device_id 7a0 ss_vid 0 ss_did 0 pciehp 0000:00:18.6:pcie04: service driver pciehp loaded pciehp 0000:00:18.7:pcie04: HPC vendor_id 15ad device_id 7a0 ss_vid 0 ss_did 0 pciehp 0000:00:18.7:pcie04: service driver pciehp loaded
pciehp: PCI Express Hot Plug Controller Driver version: 0.4
acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
acpiphp: Slot [32] registered
acpiphp: Slot [33] registered
acpiphp: Slot [34] registered
acpiphp: Slot [35] registered
acpiphp: Slot [36] registered
acpiphp: Slot [37] registered
acpiphp: Slot [38] registered
acpiphp: Slot [39] registered
acpiphp: Slot [40] registered
acpiphp: Slot [41] registered
acpiphp: Slot [42] registered
acpiphp: Slot [43] registered
acpiphp: Slot [44] registered
acpiphp: Slot [45] registered
acpiphp: Slot [46] registered
acpiphp: Slot [47] registered
acpiphp: Slot [48] registered
acpiphp: Slot [49] registered
acpiphp: Slot [50] registered
acpiphp: Slot [51] registered
acpiphp: Slot [52] registered
acpiphp: Slot [53] registered
acpiphp: Slot [54] registered
acpiphp: Slot [55] registered
acpiphp: Slot [56] registered
acpiphp: Slot [57] registered
acpiphp: Slot [58] registered
acpiphp: Slot [59] registered
acpiphp: Slot [60] registered
acpiphp: Slot [61] registered
acpiphp: Slot [62] registered
acpiphp: Slot [63] registered
acpiphp_glue: Slot 160 already registered by another hotplug driver
acpiphp_glue: Slot 192 already registered by another hotplug driver
acpiphp_glue: Slot 224 already registered by another hotplug driver
acpiphp_glue: Slot 256 already registered by another hotplug driver
acpiphp_glue: Slot 161 already registered by another hotplug driver
acpiphp_glue: Slot 162 already registered by another hotplug driver
acpiphp_glue: Slot 163 already registered by another hotplug driver
acpiphp_glue: Slot 164 already registered by another hotplug driver
acpiphp_glue: Slot 165 already registered by another hotplug driver
acpiphp_glue: Slot 166 already registered by another hotplug driver
acpiphp_glue: Slot 167 already registered by another hotplug driver
acpiphp_glue: Slot 193 already registered by another hotplug driver
acpiphp_glue: Slot 194 already registered by another hotplug driver
acpiphp_glue: Slot 195 already registered by another hotplug driver
acpiphp_glue: Slot 196 already registered by another hotplug driver
acpiphp_glue: Slot 197 already registered by another hotplug driver
acpiphp_glue: Slot 198 already registered by another hotplug driver
acpiphp_glue: Slot 199 already registered by another hotplug driver
acpiphp_glue: Slot 225 already registered by another hotplug driver
acpiphp_glue: Slot 226 already registered by another hotplug driver
acpiphp_glue: Slot 227 already registered by another hotplug driver
acpiphp_glue: Slot 228 already registered by another hotplug driver
acpiphp_glue: Slot 229 already registered by another hotplug driver
acpiphp_glue: Slot 230 already registered by another hotplug driver
acpiphp_glue: Slot 231 already registered by another hotplug driver
acpiphp_glue: Slot 257 already registered by another hotplug driver
acpiphp_glue: Slot 258 already registered by another hotplug driver
acpiphp_glue: Slot 259 already registered by another hotplug driver
acpiphp_glue: Slot 260 already registered by another hotplug driver
acpiphp_glue: Slot 261 already registered by another hotplug driver
acpiphp_glue: Slot 262 already registered by another hotplug driver
acpiphp_glue: Slot 263 already registered by another hotplug driver
ACPI: AC Adapter [ACAD] (on-line)
input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
ACPI: Power Button [PWRF]
input: Sleep Button as /devices/LNXSYSTM:00/device:00/PNP0C0E:00/input/input1
ACPI: Sleep Button [SLPB]
processor ACPI_CPU:00: registered as cooling_device0
ACPI: Processor [CPU0] (supports 8 throttling states) Non-volatile memory driver v1.3 Linux agpgart interface v0.103 agpgart-intel 0000:00:00.0: Intel 440BX Chipset agpgart-intel 0000:00:00.0: AGP aperture is 256M @ 0x0
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
00:09: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:0a: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
brd: module loaded
loop: module loaded
input: Macintosh mouse button emulation as /devices/virtual/input/input2 Driver 'sd' needs updating - please use bus_type methods Driver 'sr' needs updating - please use bus_type methods ata_piix 0000:00:07.1: version 2.13 scsi0 : ata_piix
scsi1 : ata_piix
ata1: PATA max UDMA/33 cmd 0x1f0 ctl 0x3f6 bmdma 0x10c0 irq 14
ata2: PATA max UDMA/33 cmd 0x170 ctl 0x376 bmdma 0x10c8 irq 15 Fixed MDIO Bus: probed
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
uhci_hcd: USB Universal Host Controller Interface driver
PNP: PS/2 Controller [PNP0303:KBC,PNP0f13:MOUS] at 0x60,0x64 irq 1,12
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
ata1: port disabled. ignoring.
rtc_cmos 00:04: rtc core: registered rtc_cmos as rtc0
rtc0: alarms up to one month, y3k, 114 bytes nvram
device-mapper: uevent: version 1.0.3
device-mapper: ioctl: 4.14.0-ioctl (2008-04-23) initialised: dm-devel
cpuidle: using governor ladder
cpuidle: using governor menu
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
usbhid: v2.6:USB HID core driver
nf_conntrack version 0.5.0 (16384 buckets, 65536 max) CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use
nf_conntrack.acct=1 kernel paramater, acct=1 nf_conntrack module option or sysctl net.netfilter.nf_conntrack_acct=1 to enable it.
ip_tables: (C) 2000-2006 Netfilter Core Team TCP cubic registered Initializing XFRM netlink socket
NET: Registered protocol family 17
PM: Resume from disk failed.
registered taskstats version 1
  Magic number: 14:546:980
Initalizing network drop monitor service
input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input3
input: ImPS/2 Generic Wheel Mouse as /devices/platform/i8042/serio1/input/input4
ata2.00: ATAPI: VMware Virtual IDE CDROM Drive, 00000001, max UDMA/33
ata2.00: configured for UDMA/33
scsi 1:0:0:0: CD-ROM            NECVMWar VMware IDE CDR10 1.00 PQ: 0 ANSI: 5
sr0: scsi3-mmc drive: 1x/1x xa/form2 cdda tray Uniform CD-ROM driver Revision: 3.20 sr 1:0:0:0: Attached scsi CD-ROM sr0 sr 1:0:0:0: Attached scsi generic sg0 type 5 Freeing unused kernel memory: 1292k freed Write protecting the kernel read-only data: 5964k Fusion MPT base driver 3.04.07 Copyright (c) 1999-2008 LSI Corporation Fusion MPT SPI Host driver 3.04.07
  alloc irq_desc for 17 on cpu 0 node 0
  alloc kstat_irqs on cpu 0 node 0
mptspi 0000:00:10.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
mptbase: ioc0: Initiating bringup
ioc0: LSI53C1030 B0: Capabilities={Initiator}
scsi2 : ioc0: LSI53C1030 B0, FwRev=01032920h, Ports=1, MaxQ=128, IRQ=17
scsi: waiting for bus probes to complete ...
scsi 2:0:0:0: Direct-Access     VMware   Virtual disk     1.0  PQ: 0 ANSI: 2
scsi target2:0:0: Beginning Domain Validation scsi target2:0:0: Domain Validation skipping write tests scsi target2:0:0: Ending Domain Validation scsi target2:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 127) sd 2:0:0:0: [sda] 16777216 512-byte hardware sectors: (8.58 GB/8.00 GiB) sd 2:0:0:0: [sda] Test WP failed, assume Write Enabled sd 2:0:0:0: [sda] Cache data unavailable sd 2:0:0:0: [sda] Assuming drive cache: write through sd 2:0:0:0: [sda] Test WP failed, assume Write Enabled sd 2:0:0:0: [sda] Cache data unavailable sd 2:0:0:0: [sda] Assuming drive cache: write through  sda:<5>sd 2:0:0:0: Attached scsi generic sg1 type 0
 sda1 sda2
sd 2:0:0:0: [sda] Attached SCSI disk
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
SELinux:  Disabled at runtime.
SELinux:  Unregistering netfilter hooks
type=1404 audit(1287136689.357:2): selinux=0 auid=4294967295 ses=4294967295
udev: starting version 141
piix4_smbus 0000:00:07.3: Host SMBus controller not enabled!
shpchp: Standard Hot Plug PCI Controller Driver version: 0.4 parport_pc 00:08: reported by Plug and Play ACPI
parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
Intel(R) PRO/1000 Network Driver - version 7.3.21-k3-NAPI Copyright (c) 1999-2006 Intel Corporation.
  alloc irq_desc for 18 on cpu 0 node 0
  alloc kstat_irqs on cpu 0 node 0
e1000 0000:02:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18 e1000 0000:02:00.0: setting latency timer to 64
e1000: 0000:02:00.0: e1000_probe: (PCI:66MHz:32-bit) 00:50:56:90:00:57
input: PC Speaker as /devices/platform/pcspkr/input/input5
ppdev: user-space parallel port driver
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
device-mapper: multipath: version 1.0.5 loaded
EXT3 FS on sda1, internal journal
platform microcode: firmware: requesting intel-ucode/06-2c-02 Microcode Update Driver: v2.00 <tigran.co.uk>, Peter Oruba Microcode Update Driver: v2.00 removed.
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
NET: Registered protocol family 27
e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
eth0: no IPv6 routers present

Comment 2 Allen Lu 2010-10-16 23:36:28 UTC
I tested the .vmdk version and have the same exact issue.

Comment 3 Allen Lu 2010-10-18 15:16:21 UTC
Confirm img works on bare metal from usb.

Comment 4 Allen Lu 2010-10-18 17:47:39 UTC
Problem looks to be pointing DNS on the first server to itself. I had thought this would have worked as well if theres no other DNS existed.