Bug 190408 - Unable to manage large number of services on 8 node cluster
Unable to manage large number of services on 8 node cluster
Status: CLOSED DUPLICATE of bug 182454
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: rgmanager (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: Lon Hohberger
Cluster QE
:
Depends On:
Blocks: 180185
  Show dependency treegraph
 
Reported: 2006-05-01 20:57 EDT by Henry Harris
Modified: 2009-04-16 16:20 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-05-12 13:05:29 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Henry Harris 2006-05-01 20:57:57 EDT
Description of problem: With 29 IP services configured on an 8 node cluster, 3 
of the virtual IPs were not created or listed as secondary IPs with ip -o -f 
inet addr.  Also the clusvcadm -r command either hung or returned with error 
message "resource groups locked" when trying to move any of the IP services.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Configure 29 IP services on an 8 node cluster
2. Start the cluster
3. Run ip -o -f inet addr on all nodes and verify all 29 IPs are listed as 
secondary 
4. Run clusvcadm -r on any IP service
  
Actual results:
Only 26 IPs listed as secondary; clusvcadm -r hangs or returns "resource 
groups locked" message

Expected results:
All 29 IPs should be created and should be managed with clusvcadm

Additional info: Strace of clusvcadm -r hang shows:

open("/etc/hosts", O_RDONLY)            = 4
fcntl(4, F_GETFD)                       = 0
fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
fstat(4, {st_mode=S_IFREG|0644, st_size=1384, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2a9568f000
read(4, "# Do not remove the following li"..., 4096) = 1384
close(4)                                = 0
munmap(0x2a9568f000, 4096)              = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 4
setsockopt(4, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
fcntl(4, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
connect(4, {sa_family=AF_INET, sin_port=htons(41966),
sin_addr=inet_addr("10.10.10.6")}, 16) = -1 EINPROGRESS (Operation now in
progress)
select(5, [4], [4], NULL, {5, 0})       = 1 (out [4], left {5, 0})
getsockopt(4, SOL_SOCKET, SO_ERROR, [8589934592], [4]) = 0
fcntl(4, F_SETFL, O_RDWR)               = 0
write(4, "\22:\274\0\0\0\0l\0\23\205\202\0\0\0\0\0\0\0\000192.16"..., 108) =
108
select(5, [4], NULL, [4], NULL
Comment 1 Lon Hohberger 2006-05-02 10:07:04 EDT
It sounds like the rgmanager service group is not in the 'run' state.

Can you check /proc/cluster/services ?
Comment 2 Henry Harris 2006-05-02 11:03:30 EDT
Here's the output:


Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[8 1 3 2 4 5 6 7]

DLM Lock Space:  "clvmd"                             2   3 run       -
[4 6 7 8 5 1 3 2]

DLM Lock Space:  "crosswalk"                         3   4 run       -
[8 1 3 4 5 2 6 7]

DLM Lock Space:  "TestVolume_01"                     5   6 run       -
[8 1 3 4 5 2 6 7]

DLM Lock Space:  "TestVolume_02"                     7   8 run       -
[8 1 3 4 5 2 6 7]

DLM Lock Space:  "TestVolume_03"                     9  10 run       -
[8 1 3 4 5 2 6 7]

DLM Lock Space:  "TestVolume_04"                    11  12 run       -
[3 5 1 8 4 2 7 6]

DLM Lock Space:  "TestVolume_05"                    13  14 run       -
[3 1 5 8 4 2 7 6]

DLM Lock Space:  "TestVolume_06"                    15  16 run       -
[3 1 5 4 2 8 7 6]

DLM Lock Space:  "TestVolume_08"                    17  18 run       -
[3 5 4 1 2 8 6 7]

DLM Lock Space:  "TestVolume_07"                    19  20 run       -
[4 5 3 1 2 6 7 8]

DLM Lock Space:  "Magma"                            22  23 run       -
[3 1 2 4 5 6 7]

DLM Lock Space:  "SnapVolume"                       23  24 run       -
[3 1 2 4 5 6 7 8]

GFS Mount Group: "crosswalk"                         4   5 run       -
[8 1 3 4 5 2 6 7]

GFS Mount Group: "TestVolume_01"                     6   7 run       -
[8 1 3 4 5 2 6 7]

GFS Mount Group: "TestVolume_02"                     8   9 run       -
[8 1 3 4 5 2 6 7]

GFS Mount Group: "TestVolume_03"                    10  11 run       -
[8 1 3 4 5 2 6 7]

GFS Mount Group: "TestVolume_04"                    12  13 run       -
[3 5 1 8 4 2 7 6]

GFS Mount Group: "TestVolume_05"                    14  15 run       -
[3 1 5 8 4 2 7 6]

GFS Mount Group: "TestVolume_06"                    16  17 run       -
[3 1 5 4 2 8 6 7]

GFS Mount Group: "TestVolume_08"                    18  19 run       -
[3 5 4 1 2 6 7 8]

GFS Mount Group: "TestVolume_07"                    20  21 run       -
[4 5 3 1 2 6 7 8]

GFS Mount Group: "SnapVolume"                       24  25 run       -
[3 1 2 4 5 6 7 8]

User:            "usrm::manager"                    21  22 update    U-6,2,8
[3 1 2 4 5 6 7 8]
Comment 3 Lon Hohberger 2006-05-02 14:13:18 EDT
Suspicion confirmed, thanks.
Comment 4 Lon Hohberger 2006-05-12 13:05:29 EDT

*** This bug has been marked as a duplicate of 182454 ***

Note You need to log in before you can comment on or make changes to this bug.