| Summary: | cmirror does not handle (POLLHUP|POLLERR|POLLINVAL) | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Jonathan Earl Brassow <jbrassow> |
| Component: | lvm2 | Assignee: | Jonathan Earl Brassow <jbrassow> |
| Status: | CLOSED WONTFIX | QA Contact: | Corey Marthaler <cmarthal> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 6.1 | CC: | agk, dwysocha, heinzm, jbrassow, prajnoha, prockai, thornber, zkabelac |
| Target Milestone: | rc | ||
| Target Release: | 6.2 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-10-15 22:07:13 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | |||
| Bug Blocks: | 756082 | ||
|
Description
Jonathan Earl Brassow
2011-03-22 14:21:51 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. Adding QA ack for 6.2. Devel will need to provide unit testing results however before this bug can be ultimately verified by QA. This issue still exists in the latest 6.2 build, however I see this isn't currently included in the 6.2 lvm2 errata. [root@hayes-01 ~]# service clvmd start Starting clvmd: Activating VG(s): [HANG] hayes-01: device-mapper: dm-log-userspace: [pERyFOX6] Request timed out: [5/201681] - retrying Aug 18 10:54:33 hayes-01 kernel: device-mapper: dm-log-userspace: [pERyFOX6] Request timed out: [5/201681] - retrying device-mapper: dm-log-userspace: [pERyFOX6] Request timed out: [5/201682] - retrying hayes-03: Aug 18 10:50:12 hayes-03 cmirrord[8772]: [pERyFOX6] Failed to open checkpoint for 1: SA_AIS_ERR_LIBRARY Aug 18 10:50:11 hayes-03 kernel: device-mapper: dm-log-userspace: [pERyFOX6] Request timed out: [13/1919] - retrying Aug 18 10:50:12 hayes-03 cmirrord[8772]: [pERyFOX6] Failed to export checkpoint for 1 2.6.32-188.el6.x86_64 lvm2-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 lvm2-libs-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 lvm2-cluster-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 udev-147-2.37.el6 BUILT: Wed Aug 10 07:48:15 CDT 2011 device-mapper-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 device-mapper-libs-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 device-mapper-event-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 device-mapper-event-libs-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 cmirror-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011 Hmmm, this is going to have to wait. The changes are much more involved and intrusive than I thought. It is not simply a matter of responding to SIGHUP and reconnecting to corosync. There is live state on the system that must be transmitted - probably via checkpoint. It is much different than the start-up scenario where in-coming nodes do not already have a live impression of the system. Additionally, this bug was filed (by me) to see if we could better handle a situation that isn't allowed in the first place - that is, shutting down a service that cmirrord depends on without first shutting down cmirrord. I'm pushing this out to 6.3 and changing the scope of this bug. If we are trying to protect against the scenario where everything is shutdown except cmirrord and then those things are restarted again and cmirrord is expected to work, then I will simply check that there are no active logs and if not, reform the connection with corosync. If there are active logs, then the reconnect will be refused. This simplified handling should be more than sufficient for a scenario that is not allowed. |