Bug 227398
Summary: | cmirror request to LRT_CLEAR_REGION fails and causes cmd to run for many days | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Corey Marthaler <cmarthal> |
Component: | cmirror | Assignee: | Jonathan Earl Brassow <jbrassow> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4 | CC: | agk, cfeist, dwysocha, jbrassow, mbroz, prockai |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-08-05 21:43:44 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Corey Marthaler
2007-02-05 19:52:35 UTC
Just a note that this has been reproduced. The following has been going on now for 20+ hours. Feb 12 13:00:31 link-04 kernel: dm-cmirror: Clustered mirror retried requests :: 32 of 437462 (1%) Feb 12 13:00:39 link-04 kernel: dm-cmirror: Clustered mirror retried requests :: 64 of 437526 (1%) Feb 12 13:00:47 link-04 kernel: dm-cmirror: Clustered mirror retried requests :: 96 of 437590 (1%) [...] Feb 13 08:52:14 link-04 kernel: dm-cmirror: Clustered mirror retried requests :: 285728 of 1008866 (29%) Feb 13 08:52:22 link-04 kernel: dm-cmirror: Clustered mirror retried requests :: 285760 of 1008930 (29%) please help me reproduce with the latest cmirror-kernel package (>= 2/21/2007) Marking modified, as I believe this has been fixed in the process of fixing other bugs. Hit this last night on link-04 with the latest cmirror build on the link cluster (link-02,4,7,8) Ok, we've proved the bug is still around. Now we need to either: A) find the boundary conditions of the bug B) find quicker/easier ways to reproduce the bug Obviously, 'B' is the best. I need to know how many mirrors you had running on the system at the time, what was being done to them, whether they were 3-way mirrors or not, etc. Then I need to know if you can reproduce with just one mirror on the system, and/or with just 2-way mirrors... simplify. Again, the best thing that you could do is find a way to trigger the bug that is simple and straight forward that causes it to show up in a reasonable timeframe. I suspect that the server is getting stuck while trying to read a log device that has been suspended. I'm now forcing the resume to establish the log server before it completes to try to mitigate this problem. post -> modified modified -> needinfo Changes for bug 231230 included a partial reversal of some of the changes designed for this bug. I've tried to be careful in my selection of which changes to revert, but this bug should be reinvestigated. needinfo -> modified I guess it was pointless for me to change that... QA will retest to validate it anyway. I had I/O going to 3 gfs filesystems on top of cmirrors, all up and down converting for almost 24 hours and wasn't able to reproduce this bug. Marking verified. Fixed in current release (4.7). |