Bug 129458 - clvmd Oops: Can't bind to port 21064
clvmd Oops: Can't bind to port 21064
Status: CLOSED CURRENTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: Christine Caulfield
GFS Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-08-09 10:23 EDT by Derek Anderson
Modified: 2010-01-11 21:56 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-10-04 15:29:02 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Derek Anderson 2004-08-09 10:23:10 EDT
Description of problem:
Running cluster setup/teardown tests after the latest cman changes
were checked in over the weekend.  Testing on a 2-node cluster.  On
the second startup I got this kernel Oops when clvmd was started.

dlm: Can't bind to port 21064
dlm: clvmd: recover event 3 (first)
dlm: clvmd: add nodes
Unable to handle kernel NULL pointer dereference at virtual address
00000046
 printing eip:
e04c725a
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: gfs lock_dlm lock_harness loop dlm cman ipv6
parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode dm_mod
uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd qla2300 qla2xxx
scsi_transport_fc sd_mod scsi_mod
CPU:    0
EIP:    0060:[<e04c725a>]    Not tainted
EFLAGS: 00010286   (2.6.7)
EIP is at send_to_sock+0x3a/0x210 [dlm]
eax: 00000002   ebx: dab9c060   ecx: 00000000   edx: 0000002b
esi: e04dabd0   edi: dab9c060   ebp: 00000000   esp: dabaffa0
ds: 007b   es: 007b   ss: 0068
Process dlm_sendd (pid: 5905, threadinfo=dabae000 task=dac8e630)
Stack: 00000000 e04dab70 00000000 dab9c068 dac8f6b0 dab9c060 e04dabd0
dabae000
       00000000 e04c7649 dabae000 00000000 00000000 e04c7993 e04d1ccd
00000000
       0000007b 0000007b ffffffff e04c78f0 c010429d 00000000 00000000
00000000
Call Trace:
 [<e04c7649>] process_output_queue+0x59/0x80 [dlm]
 [<e04c7993>] dlm_sendd+0xa3/0x100 [dlm]
 [<e04c78f0>] dlm_sendd+0x0/0x100 [dlm]
 [<c010429d>] kernel_thread_helper+0x5/0x18

Code: 8b 40 44 89 44 24 10 8d 47 30 89 44 24 08 90 8d b4 26 00 00
 Aug  9 15:04:20 link-10 kernel: dlm: Can't bind to port 21064
Aug  9 15:04:20 link-10 kernel: dlm: clvmd: recover event 3 (first)
Aug  9 15:04:20 link-10 kernel: dlm: clvmd: add nodes
Aug  9 15:04:20 link-10 kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000046
Aug  9 15:04:20 link-10 kernel:  printing eip:
Aug  9 15:04:20 link-10 kernel: e04c725a
Aug  9 15:04:20 link-10 kernel: *pde = 00000000
Aug  9 15:04:20 link-10 kernel: Oops: 0000 [#1]
Aug  9 15:04:20 link-10 kernel: Modules linked in: gfs lock_dlm
lock_harness loop dlm cman ipv6 parport_pc lp parport autofs4 sunrpc
e1000 floppy sg microcode dm_mod uhci_hcd ehci_hcd button battery
asus_acpi ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod
Aug  9 15:04:20 link-10 kernel: CPU:    0
Aug  9 15:04:20 link-10 kernel: EIP:    0060:[<e04c725a>]    Not tainted
Aug  9 15:04:20 link-10 kernel: EFLAGS: 00010286   (2.6.7)
Aug  9 15:04:20 link-10 kernel: EIP is at send_to_sock+0x3a/0x210 [dlm]
Aug  9 15:04:20 link-10 kernel: eax: 00000002   ebx: dab9c060   ecx:
00000000   edx: 0000002b
Aug  9 15:04:20 link-10 kernel: esi: e04dabd0   edi: dab9c060   ebp:
00000000   esp: dabaffa0
Aug  9 15:04:20 link-10 kernel: ds: 007b   es: 007b   ss: 0068
Aug  9 15:04:20 link-10 kernel: Process dlm_sendd (pid: 5905,
threadinfo=dabae000 task=dac8e630)
Aug  9 15:04:20 link-10 kernel: Stack: 00000000 e04dab70 00000000
dab9c068 dac8f6b0 dab9c060 e04dabd0 dabae000
Aug  9 15:04:20 link-10 kernel:        00000000 e04c7649 dabae000
00000000 00000000 e04c7993 e04d1ccd 00000000
Aug  9 15:04:20 link-10 kernel:        0000007b 0000007b ffffffff
e04c78f0 c010429d 00000000 00000000 00000000
Aug  9 15:04:20 link-10 kernel: Call Trace:
Aug  9 15:04:20 link-10 kernel:  [<e04c7649>]
process_output_queue+0x59/0x80 [dlm]
Aug  9 15:04:20 link-10 kernel:  [<e04c7993>] dlm_sendd+0xa3/0x100 [dlm]
Aug  9 15:04:20 link-10 kernel:  [<e04c78f0>] dlm_sendd+0x0/0x100 [dlm]
Aug  9 15:04:20 link-10 kernel:  [<c010429d>]
kernel_thread_helper+0x5/0x18
Aug  9 15:04:20 link-10 kernel:
Aug  9 15:04:20 link-10 kernel: Code: 8b 40 44 89 44 24 10 8d 47 30 89
44 24 08 90 8d b4 26 00 00


Version-Release number of selected component (if applicable):
cman_tool DEVEL.1092080900 (built Aug  9 2004 14:49:26)
Copyright (C) Red Hat, Inc.  2004  All rights reserved.

How reproducible:
Will retry after bug is filed.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Christine Caulfield 2004-08-10 05:48:21 EDT
Well, the oops should be gone now. if DLM gets an error binding to any
socket it will fail initialisation properly, rather than letting
everying blunder on into disaster.

The EADDRINUSE return is a function of TCP, you need to leave a
certain period of time before it will let you bind to a socket that
has been in use.

We do set SO_REUSEADDR on the socket, which should mitigate that a
little but I think the best we can do is fail gracefully this situation.
Comment 2 Derek Anderson 2004-10-04 15:29:02 EDT
OK, I have seen the "Can't bind to port 21064" numerous times since
this change went in and have not seen it Oops.  Waiting for the socket
to timeout seems to work.

DEVEL.1096898839 (built Oct  4 2004 09:08:31)
Comment 3 Kiersten (Kerri) Anderson 2004-11-16 14:09:35 EST
Updating version to the right level in the defects.  Sorry for the storm.

Note You need to log in before you can comment on or make changes to this bug.