| Summary: | cmirror create deadlock - 'clogd: cpg_initialize failed: Cannot join cluster' | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Corey Marthaler <cmarthal> | |
| Component: | lvm2-cluster | Assignee: | Jonathan Earl Brassow <jbrassow> | |
| Status: | CLOSED WONTFIX | QA Contact: | Cluster QE <mspqa-list> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 5.3 | CC: | agk, ccaulfie, dwysocha, heinzm, jbrassow, lmiksik, nperic, prajnoha, prockai, thornber, zkabelac | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 782156 (view as bug list) | Environment: | ||
| Last Closed: | 2013-04-10 20:18:03 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | ||||
| Bug Blocks: | 782156, 807971, 928849 | |||
|
Description
Corey Marthaler
2011-12-12 23:43:32 UTC
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.8 and Red Hat does not plan to fix this issue the currently developed update. Contact your manager or support representative in case you need to escalate this bug. *** Bug 782156 has been marked as a duplicate of this bug. *** Encountered the same issue while testing mirrors in a cluster. reserved_memory set to 32768 the error shows itself around 220th mirror. The errors in /var/log/messages are: [lvm_cluster_mirror] [lvm_cluster_mirror_sanity] 230 a1: lvcreate -m 1 -n 500_230 -L 25M --nosync mirror_sanity [lvm_cluster_mirror] [lvm_cluster_mirror_sanity] WARNING: New mirror won't be synchronised. Don't read what you didn't write! errors in /var/log/messages: (08:25:09) [root@a1:/var/log]$ tail /var/log/messages Jan 16 08:26:23 a1 kernel: clogd(26041): unaligned access to 0x600000000001160c, ip=0x4000000000005ef0 Jan 16 08:26:23 a1 kernel: clogd(26041): unaligned access to 0x6000000000011614, ip=0x4000000000005f10 Jan 16 08:26:23 a1 clogd[26041]: cpg_mcast_joined error: SA_AIS_ERR_BAD_HANDLE Jan 16 08:26:28 a1 last message repeated 36 times Jan 16 08:26:28 a1 kernel: kernel unaligned access to 0xe0000001f1a60394, ip=0xa00000020371e4d0 Jan 16 08:26:28 a1 kernel: kernel unaligned access to 0xe0000001f1a6043c, ip=0xa00000020371e560 Jan 16 08:26:28 a1 kernel: clogd(26041): unaligned access to 0x600000000001160c, ip=0x400000000006f9f0 Jan 16 08:26:28 a1 kernel: clogd(26041): unaligned access to 0x600000000001160c, ip=0x4000000000005ef0 Jan 16 08:26:28 a1 kernel: clogd(26041): unaligned access to 0x6000000000011614, ip=0x4000000000005f10 Jan 16 08:26:28 a1 clogd[26041]: cpg_mcast_joined error: SA_AIS_ERR_BAD_HANDLE The operation can be unlocked after executing vgs or vgscan command on the active node (which can as well get stuck, then do the same command on one other node and they both get unlocked), after which the mirrors continue being created for maybe 10 more or 15 times and it gets deadlocked again. This can be repeated as far as I can tell indefinitely. The errors showing in /var/log/messages then are: Jan 16 11:04:40 a1 kernel: kernel unaligned access to 0xe0000001f3490714, ip=0xa000000202f424d0 Jan 16 11:04:40 a1 kernel: kernel unaligned access to 0xe0000001f34907bc, ip=0xa000000202f42560 Jan 16 11:04:40 a1 kernel: clogd(4059): unaligned access to 0x600000000001160c, ip=0x400000000006f9f0 Jan 16 11:04:40 a1 kernel: clogd(4059): unaligned access to 0x60000000000116b4, ip=0x4000000000057f60 Jan 16 11:04:40 a1 kernel: clogd(4059): unaligned access to 0x60000000000116b4, ip=0x4000000000058260 Jan 16 11:04:40 a1 clogd[4059]: cpg_initialize failed: Cannot join cluster Jan 16 11:04:40 a1 clogd[4059]: clog_resume: Failed to create cluster CPG Jan 16 11:04:40 a1 lvm[4524]: Monitoring mirror device mirror_sanity-500_322 for events This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. There simply may need to be limits placed on the number of cluster mirrors that are allowed. It doesn't look like checkpointing/CPG can handle the load of all the mirrors. |