Bug 234613
| Summary: | primary leg failure of cmirror can cause down conversion to deadlock | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Retired] Red Hat Cluster Suite | Reporter: | Corey Marthaler <cmarthal> | ||||||||||
| Component: | cmirror | Assignee: | Jonathan Earl Brassow <jbrassow> | ||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Cluster QE <mspqa-list> | ||||||||||
| Severity: | medium | Docs Contact: | |||||||||||
| Priority: | high | ||||||||||||
| Version: | 4 | CC: | agk, cfeist, dwysocha, fweyns, jbrassow, mbroz, prockai | ||||||||||
| Target Milestone: | --- | ||||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | All | ||||||||||||
| OS: | Linux | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2008-08-05 21:42:22 UTC | Type: | --- | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Attachments: |
|
||||||||||||
Created attachment 151315 [details]
log/kern dump from link-02
Created attachment 151316 [details]
log/kern dump from link-04
Created attachment 151317 [details]
log/kern dump from link-07
Created attachment 151318 [details]
log/kern dump from link-08
Hit this again today, only this time with a synced core log mirror. Senario: Kill primary leg of synced core log 2 leg mirror ****** Mirror hash info for this scenario ****** * name: fail_primary_synced_core_log_2_legs * sync: 1 * disklog: 0 * failpv: /dev/sdd1 * legs: 2 * pvs: /dev/sdd1 /dev/sdb1 ************************************************ Creating mirror on link-04... qarsh root@link-04 lvcreate --corelog -m 1 -n fail_primary_synced_core_log_2_legs -L 800M helter_skelter /dev/sdd1:0-500 /dev/sdb1:0-500 Creating gfs on top of mirror on link-04... Creating mnt point /mnt/fail_primary_synced_core_log_2_legs on link-02... Mounting gfs on link-02... Creating mnt point /mnt/fail_primary_synced_core_log_2_legs on link-04... Mounting gfs on link-04... Creating mnt point /mnt/fail_primary_synced_core_log_2_legs on link-07... Mounting gfs on link-07... Creating mnt point /mnt/fail_primary_synced_core_log_2_legs on link-08... Mounting gfs on link-08... Waiting for mirror to sync Verifying that the mirror is fully syncd, currently at ...33.50% ...43.00% ...54.00% ...63.00% ...72.50% ...81.50% ...91.50% ...100.00% Disabling device sde on link-02 Disabling device sdd on link-04 Disabling device sde on link-07 Disabling device sde on link-08 Attempting I/O to cause mirror down conversion on link-04 10+0 records in 10+0 records out Verifying the down conversion from mirror to linear /dev/sdd1: open failed: No such device or address [DEADLOCK] [root@link-02 tmp]# dmsetup ls helter_skelter-fail_primary_synced_core_log_2_legs (253, 4) helter_skelter-fail_primary_synced_core_log_2_legs_mimage_1 (253, 3) helter_skelter-fail_primary_synced_core_log_2_legs_mimage_0 (253, 2) VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) [root@link-02 tmp]# dmsetup table helter_skelter-fail_primary_synced_core_log_2_legs: 0 1638400 mirror clustered_core 3 1024 LVM-TEOcHjQ0N3Czu9Jb2oIEJovnrJeucbpr5TJl37Ajeq36q4ZW9GhZlh0Txxwmbict block_on_error 2 253:2 0 253:3 0 helter_skelter-fail_primary_synced_core_log_2_legs_mimage_1: 0 1638400 linear 8:33 384 helter_skelter-fail_primary_synced_core_log_2_legs_mimage_0: 0 1638400 linear 8:65 384 VolGroup00-LogVol01: 0 4063232 linear 3:2 73728384 VolGroup00-LogVol00: 0 73728000 linear 3:2 384 [root@link-02 tmp]# dmsetup info Name: helter_skelter-fail_primary_synced_core_log_2_legs State: SUSPENDED Tables present: LIVE & INACTIVE Open count: 1 Event number: 0 Major, minor: 253, 4 Number of targets: 1 UUID: LVM-TEOcHjQ0N3Czu9Jb2oIEJovnrJeucbpr5TJl37Ajeq36q4ZW9GhZlh0Txxwmbict Name: helter_skelter-fail_primary_synced_core_log_2_legs_mimage_1 State: SUSPENDED Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 3 Number of targets: 1 UUID: LVM-TEOcHjQ0N3Czu9Jb2oIEJovnrJeucbprHsq3GcGBCgULHz3btSw3FpzkemU1m5os Name: helter_skelter-fail_primary_synced_core_log_2_legs_mimage_0 State: SUSPENDED Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 2 Number of targets: 1 UUID: LVM-TEOcHjQ0N3Czu9Jb2oIEJovnrJeucbprWzqZcXF00YdzIVWCSdQhqHQXVmz9lf0c Name: VolGroup00-LogVol01 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 1 Number of targets: 1 UUID: LVM-i9Sdf1SXACVF330Sb7WdqECxIr1LdJtDxCyO7AGloRnyFBWnRK03Ir4tIJVh1hdX Name: VolGroup00-LogVol00 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 0 Number of targets: 1 UUID: LVM-i9Sdf1SXACVF330Sb7WdqECxIr1LdJtDiNgxxFoLGSZ82yY0VHP39K4fbTxLg5Xn I appear to have hit this same issue now without using a corelog. I had a disk log with GFS on top, and the mirror was fully sync'ed so the down conversion should have worked. Also, after I noticed the deadlock, a 'vgreduce --removemissing helter_skelter' didn't cause the down convert to work, it only hung as well. I'll need to reboot it appears to get out of this state. Version: cmirror-kernel-2.6.9-27.0 Senario: Kill primary leg of synced 3 leg mirror ****** Mirror hash info for this scenario ****** * name: fail_primary_synced_3_legs * sync: 1 * disklog: 1 * failpv: /dev/sda1 * legs: 3 * pvs: /dev/sda1 /dev/sdg1 /dev/sdh1 /dev/sde1 ************************************************ Creating mirror on link-04... qarsh root@link-04 lvcreate -m 2 -n fail_primary_synced_3_legs -L 800M helter_skelter /dev/sda1:0-500 /dev/sdg1:0-500 /dev/sdh1:0-500 /dev/sde1:0-50 Creating gfs on top of mirror on link-04... Creating mnt point /mnt/fail_primary_synced_3_legs on link-02... Mounting gfs on link-02... Creating mnt point /mnt/fail_primary_synced_3_legs on link-04... Mounting gfs on link-04... Creating mnt point /mnt/fail_primary_synced_3_legs on link-07... Mounting gfs on link-07... Creating mnt point /mnt/fail_primary_synced_3_legs on link-08... Mounting gfs on link-08... Waiting for mirror to sync Verifying that the mirror is fully syncd, currently at ...25.00% ...32.50% ...40.00% ...47.50% ...55.00% ...62.50% ...70.50% ...78.00% ...85.00% ...93.00% ...100.00% Disabling device sdb on link-02 Disabling device sda on link-04 Disabling device sdb on link-07 Disabling device sdb on link-08 Attempting I/O to cause mirror down conversion on link-04 10+0 records in 10+0 records out Verifying the down conversion from mirror to linear /dev/sda1: open failed: No such device or address [DEADLOCK] [root@link-08 ~]# dmsetup info Name: helter_skelter-fail_primary_synced_3_legs_mlog State: SUSPENDED Tables present: LIVE Open count: 3 Event number: 0 Major, minor: 253, 2 Number of targets: 1 UUID: LVM-Ba0fgkaq9jLMYmNhI3j24yTcOAu9BI9M52OhctNaGMcMWJOtumO9QPYAYdLa08NI Name: helter_skelter-fail_primary_synced_3_legs_mimage_2 State: SUSPENDED Tables present: LIVE Open count: 2 Event number: 0 Major, minor: 253, 5 Number of targets: 1 UUID: LVM-Ba0fgkaq9jLMYmNhI3j24yTcOAu9BI9MGM6Xmg509Nr4rTseHkdOGCLQFtaye7y6 Name: helter_skelter-fail_primary_synced_3_legs_mimage_1 State: SUSPENDED Tables present: LIVE Open count: 2 Event number: 0 Major, minor: 253, 4 Number of targets: 1 UUID: LVM-Ba0fgkaq9jLMYmNhI3j24yTcOAu9BI9MJEtCS6KEIRGbf4xQSrVByNJD9Bu5S2OA Name: helter_skelter-fail_primary_synced_3_legs_mimage_0 State: SUSPENDED Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 3 Number of targets: 1 UUID: LVM-Ba0fgkaq9jLMYmNhI3j24yTcOAu9BI9MyxjkVuZYmcm8IgHl7zUyAtec3S9pJcx9 Name: VolGroup00-LogVol01 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 1 Number of targets: 1 UUID: LVM-kXi6raoZjwmxhIMi9yE47lLMxD7rCSgkT1h6aidVTpJgaACOvfJduKZ88jI03FEf Name: helter_skelter-fail_primary_synced_3_legs State: SUSPENDED Tables present: LIVE & INACTIVE Open count: 1 Event number: 0 Major, minor: 253, 6 Number of targets: 1 UUID: LVM-Ba0fgkaq9jLMYmNhI3j24yTcOAu9BI9MJ2UoH58JseqQejQTIOyxOoNZljKyhO1h Name: VolGroup00-LogVol00 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 0 Number of targets: 1 UUID: LVM-kXi6raoZjwmxhIMi9yE47lLMxD7rCSgkXAqtIL9TdeqFs3GAoSPDj3JVguOzhxCd What is the deadlock timeout set to? How much I/O is happening? Are there clvmd timeouts happening? I'm setting this one to modified, given all the changes since the last reproduction. Please attempt with cmirror-kernel >= 2.6.9-30.1 I don't think this was ever a problem in cmirror-kernel. I think it's a problem with clvmd timeouts. If you can reproduce the problem, look for clvmd timeout problems (should say something like "Error locking on node X: ... timeout"). If you find those errors, then it is bug 235092 - otherwise, this bug is valid. needinfo -> modified I have been unable to reproduce this bug with the latest version. Marking verified in cmirror-kernel-2.6.9-32.0. Fixed in current release (4.7). |
Description of problem: Here is the output of the failed testcase: Senario: Kill primary leg of non synced core log 2 leg mirror ****** Mirror hash info for this scenario ****** * name: fail_primary_non_synced_core_log_2_legs * sync: 0 * disklog: 0 * failpv: /dev/sdh1 * legs: 2 * pvs: /dev/sdh1 /dev/sda1 ************************************************ Creating mirror on link-04... qarsh root@link-04 lvcreate --corelog -m 1 -n fail_primary_non_synced_core_log_2_legs -L 500M helter_skelter /dev/sdh1:0-500 /dev/sda1:0-500 mirror is only 32.80% synced right now Disabling device sda on link-02 Disabling device sdh on link-04 Disabling device sda on link-07 Disabling device sda on link-08 Attempting I/O to cause mirror down conversion on link-04 10+0 records in 10+0 records out Verifying the down conversion from mirror to linear /dev/sdh1: open failed: No such device or address LINK-02: [root@link-02 ~]# dmsetup ls helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_1 (253, 3) helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_0 (253, 2) helter_skelter-fail_primary_non_synced_core_log_2_legs (253, 4) VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) [root@link-02 ~]# dmsetup table helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_1: 0 1024000 linear 8:17 384 helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_0: 0 1024000 linear 8:1 384 helter_skelter-fail_primary_non_synced_core_log_2_legs: 0 1024000 mirror clustered_core 3 1024 LVM-poU7xsDZFkvWZmvWHmrRfnIUPYDVnsweCY2aS6BqlJ83uYWaF9KN1Yxl24li2Uj8 block_on_error 2 253:2 0 253:3 0 VolGroup00-LogVol01: 0 4063232 linear 3:2 73728384 VolGroup00-LogVol00: 0 73728000 linear 3:2 384 [root@link-02 ~]# dmsetup info Name: helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_1 State: SUSPENDED Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 3 Number of targets: 1 UUID: LVM-poU7xsDZFkvWZmvWHmrRfnIUPYDVnsweQA9o6rbcl0VdBZ7GsPLtCbxDkBdPnhYQ Name: helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_0 State: SUSPENDED Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 2 Number of targets: 1 UUID: LVM-poU7xsDZFkvWZmvWHmrRfnIUPYDVnswe4P8lCbUdfIxiXJ81KLN58zgzou37yKG5 Name: helter_skelter-fail_primary_non_synced_core_log_2_legs State: SUSPENDED Tables present: LIVE & INACTIVE Open count: 0 Event number: 0 Major, minor: 253, 4 Number of targets: 1 UUID: LVM-poU7xsDZFkvWZmvWHmrRfnIUPYDVnsweCY2aS6BqlJ83uYWaF9KN1Yxl24li2Uj8 Name: VolGroup00-LogVol01 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 1 Number of targets: 1 UUID: LVM-i9Sdf1SXACVF330Sb7WdqECxIr1LdJtDxCyO7AGloRnyFBWnRK03Ir4tIJVh1hdX Name: VolGroup00-LogVol00 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 0 Number of targets: 1 UUID: LVM-i9Sdf1SXACVF330Sb7WdqECxIr1LdJtDiNgxxFoLGSZ82yY0VHP39K4fbTxLg5Xn LINK-04: [root@link-04 ~]# dmsetup ls helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_1 (253, 3) helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_0 (253, 2) helter_skelter-fail_primary_non_synced_core_log_2_legs (253, 4) VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) [root@link-04 ~]# dmsetup table helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_1: 0 1024000 linear 8:1 384 helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_0: 0 1024000 error helter_skelter-fail_primary_non_synced_core_log_2_legs: 0 1024000 mirror clustered_core 3 1024 LVM-poU7xsDZFkvWZmvWHmrRfnIUPYDVnsweCY2aS6BqlJ83uYWaF9KN1Yxl24li2Uj8 block_on_error 2 253:2 0 253:3 0 VolGroup00-LogVol01: 0 4063232 linear 3:2 151781760 VolGroup00-LogVol00: 0 151781376 linear 3:2 384 [root@link-04 ~]# dmsetup info Name: helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_1 State: SUSPENDED Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 3 Number of targets: 1 UUID: LVM-poU7xsDZFkvWZmvWHmrRfnIUPYDVnsweQA9o6rbcl0VdBZ7GsPLtCbxDkBdPnhYQ Name: helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_0 State: SUSPENDED Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 2 Number of targets: 1 UUID: LVM-poU7xsDZFkvWZmvWHmrRfnIUPYDVnswe4P8lCbUdfIxiXJ81KLN58zgzou37yKG5 Name: helter_skelter-fail_primary_non_synced_core_log_2_legs State: SUSPENDED Tables present: LIVE & INACTIVE Open count: 0 Event number: 2 Major, minor: 253, 4 Number of targets: 1 UUID: LVM-poU7xsDZFkvWZmvWHmrRfnIUPYDVnsweCY2aS6BqlJ83uYWaF9KN1Yxl24li2Uj8 Name: VolGroup00-LogVol01 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 1 Number of targets: 1 UUID: LVM-Uw05H7EuFJTbRfMQk06fp6KLnbSv1UytIV2Iledh5wbylTH5N1KnoLWAOOZzkBwA Name: VolGroup00-LogVol00 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 0 Number of targets: 1 UUID: LVM-Uw05H7EuFJTbRfMQk06fp6KLnbSv1Uyt76Wfag76bLltNNkvUPA5iOSBleOpFPde LINK-07: [root@link-07 ~]# dmsetup ls helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_1 (253, 3) helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_0 (253, 2) helter_skelter-fail_primary_non_synced_core_log_2_legs (253, 4) VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) [root@link-07 ~]# dmsetup table helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_1: 0 1024000 linear 8:17 384 helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_0: 0 1024000 linear 8:1 384 helter_skelter-fail_primary_non_synced_core_log_2_legs: 0 1024000 mirror clustered_core 3 1024 LVM-poU7xsDZFkvWZmvWHmrRfnIUPYDVnsweCY2aS6BqlJ83uYWaF9KN1Yxl24li2Uj8 block_on_error 2 253:2 0 253:3 0 VolGroup00-LogVol01: 0 4063232 linear 3:2 151781760 VolGroup00-LogVol00: 0 151781376 linear 3:2 384 [root@link-07 ~]# dmsetup info Name: helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_1 State: SUSPENDED Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 3 Number of targets: 1 UUID: LVM-poU7xsDZFkvWZmvWHmrRfnIUPYDVnsweQA9o6rbcl0VdBZ7GsPLtCbxDkBdPnhYQ Name: helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_0 State: SUSPENDED Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 2 Number of targets: 1 UUID: LVM-poU7xsDZFkvWZmvWHmrRfnIUPYDVnswe4P8lCbUdfIxiXJ81KLN58zgzou37yKG5 Name: helter_skelter-fail_primary_non_synced_core_log_2_legs State: SUSPENDED Tables present: LIVE & INACTIVE Open count: 0 Event number: 0 Major, minor: 253, 4 Number of targets: 1 UUID: LVM-poU7xsDZFkvWZmvWHmrRfnIUPYDVnsweCY2aS6BqlJ83uYWaF9KN1Yxl24li2Uj8 Name: VolGroup00-LogVol01 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 1 Number of targets: 1 UUID: LVM-8qGbKfLuKYoljGNFE1gsS77AYQM3dC4xQjIYEP6InPgUU5nsDPYSZl5EAEKqRWcY Name: VolGroup00-LogVol00 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 0 Number of targets: 1 UUID: LVM-8qGbKfLuKYoljGNFE1gsS77AYQM3dC4xrapcuzOGNgADTzIRUNTk0MZBbtWAyXhh LINK-08: [root@link-08 ~]# dmsetup ls helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_1 (253, 3) helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_0 (253, 2) helter_skelter-fail_primary_non_synced_core_log_2_legs (253, 4) VolGroup00-LogVol01 (253, 1) VolGroup00-LogVol00 (253, 0) [root@link-08 ~]# dmsetup table helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_1: 0 1024000 linear 8:17 384 helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_0: 0 1024000 linear 8:1 384 helter_skelter-fail_primary_non_synced_core_log_2_legs: 0 1024000 mirror clustered_core 3 1024 LVM-poU7xsDZFkvWZmvWHmrRfnIUPYDVnsweCY2aS6BqlJ83uYWaF9KN1Yxl24li2Uj8 block_on_error 2 253:2 0 253:3 0 VolGroup00-LogVol01: 0 4063232 linear 3:2 151781760 VolGroup00-LogVol00: 0 151781376 linear 3:2 384 [root@link-08 ~]# dmsetup info Name: helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_1 State: SUSPENDED Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 3 Number of targets: 1 UUID: LVM-poU7xsDZFkvWZmvWHmrRfnIUPYDVnsweQA9o6rbcl0VdBZ7GsPLtCbxDkBdPnhYQ Name: helter_skelter-fail_primary_non_synced_core_log_2_legs_mimage_0 State: SUSPENDED Tables present: LIVE Open count: 2 Event number: 0 Major, minor: 253, 2 Number of targets: 1 UUID: LVM-poU7xsDZFkvWZmvWHmrRfnIUPYDVnswe4P8lCbUdfIxiXJ81KLN58zgzou37yKG5 Name: helter_skelter-fail_primary_non_synced_core_log_2_legs State: SUSPENDED Tables present: LIVE & INACTIVE Open count: 0 Event number: 0 Major, minor: 253, 4 Number of targets: 1 UUID: LVM-poU7xsDZFkvWZmvWHmrRfnIUPYDVnsweCY2aS6BqlJ83uYWaF9KN1Yxl24li2Uj8 Name: VolGroup00-LogVol01 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 1 Number of targets: 1 UUID: LVM-kXi6raoZjwmxhIMi9yE47lLMxD7rCSgkT1h6aidVTpJgaACOvfJduKZ88jI03FEf Name: VolGroup00-LogVol00 State: ACTIVE Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 253, 0 Number of targets: 1 UUID: LVM-kXi6raoZjwmxhIMi9yE47lLMxD7rCSgkXAqtIL9TdeqFs3GAoSPDj3JVguOzhxCd I'll attach the logs and stuck processes... Version-Release number of selected component (if applicable): 2.6.9-50.ELsmp cmirror-kernel-2.6.9-26.0