Bug 471042
Summary: | RHEL5 cmirror tracker: cluster pvmove performance has regressed | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Corey Marthaler <cmarthal> |
Component: | cmirror | Assignee: | Jonathan Earl Brassow <jbrassow> |
Status: | CLOSED NOTABUG | QA Contact: | Cluster QE <mspqa-list> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 5.3 | CC: | agk, bstevens, ccaulfie, ddomingo, dwysocha, edamato, heinzm, mbroz, pep, rlerch, syeghiay |
Target Milestone: | rc | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-12-08 20:34:44 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Corey Marthaler
2008-11-11 15:24:12 UTC
I ran the same pvmove scenario on the older versions cmirror-1.1.33-1.el5/kmod-cmirror-0.1.20-1.el5 and then again on the newer versions cmirror-1.1.34-5.el5/kmod-cmirror-0.1.21-2.el5 and the newer version was about 20X slower! cmirror-1.1.33-1.el5/kmod-cmirror-0.1.20-1.el5: ============================================================ Iteration 1 of 1 started at Wed Nov 12 16:48:01 CST 2008 ============================================================ SCENARIO - [segmented_pvmove] Create a couple small segmented lvs and then pvmove them Running lv_seg on hayes-02 to create the segmented volumes Deactivating seg_pvmove0 mirror Moving data from pv /dev/etherd/e1.1p5 Finding volume group "mirror_sanity" Executing: /sbin/modprobe dm-log-clustered Archiving volume group "mirror_sanity" metadata (seqno 27). Creating logical volume pvmove0 Moving 3662 extents of logical volume mirror_sanity/seg_pvmove0 Updating volume group metadata Creating volume group backup "/etc/lvm/backup/mirror_sanity" (seqno 28). Checking progress every 15 seconds Removing temporary pvmove LV Writing out final volume group after pvmove Creating volume group backup "/etc/lvm/backup/mirror_sanity" (seqno 30). Deactivating mirror seg_pvmove0... and removing Deactivating mirror seg_pvmove1... and removing real 13m17.521s user 0m0.843s sys 0m0.458s cmirror-1.1.34-5.el5/kmod-cmirror-0.1.21-2.el5: ============================================================ Iteration 1 of 1 started at Wed Nov 12 11:04:17 CST 2008 ============================================================ SCENARIO - [segmented_pvmove] Create a couple small segmented lvs and then pvmove them Running lv_seg on hayes-03 to create the segmented volumes Deactivating seg_pvmove0 mirror Moving data from pv /dev/etherd/e1.1p5 Finding volume group "mirror_sanity" Executing: /sbin/modprobe dm-log-clustered Archiving volume group "mirror_sanity" metadata (seqno 27). Creating logical volume pvmove0 Moving 3662 extents of logical volume mirror_sanity/seg_pvmove0 Updating volume group metadata Creating volume group backup "/etc/lvm/backup/mirror_sanity" (seqno 28). Checking progress every 15 seconds Removing temporary pvmove LV Writing out final volume group after pvmove Creating volume group backup "/etc/lvm/backup/mirror_sanity" (seqno 30). Deactivating mirror seg_pvmove0... and removing Deactivating mirror seg_pvmove1... and removing real 264m4.870s user 0m0.833s sys 0m0.495s Here is the exact cmdline I ran: ./mirror_sanity -l /home/msp/cmarthal/work/rhel5/sts-root -r /usr/tests/sts-rhel5.3 -R ../../var/share/resource_files/hayes.xml -e segmented_pvmove -i 1 Are all mirrors suffering in the same way, or is it just this test case? This appears to be a regression across the board for pvmoves. I tried the basic pvmove cases and they also appear to much slower (about twice as slow) as the previous builds. cmirror-1.1.33-1.el5/kmod-cmirror-0.1.20-1.el5: [cmarthal@silver bin]$ time ./pv_shuffle -l /home/msp/cmarthal/work/rhel5/sts-root -r /usr/tests/sts-rhel5.3 -R ../../var/share/resource_files/hayes.xml -i 2 1 disk(s) to be used: hayes-01=/dev/etherd/e1.1 hayes-02=/dev/etherd/e1.1 hayes-03=/dev/etherd/e1.1 on hayes-01... dicing /dev/etherd/e1.1 into 7... re-reading disks on hayes-03... re-reading disks on hayes-01... re-reading disks on hayes-02... vgscan on all nodes... creating lvm devices... create PV(s) out of all avail partitions on hayes-01 create VG(s) on hayes-01 pv_shuffle creating linear on hayes-01 creating stripe on hayes-01 Iteration 1 of 2 started on hayes-03 at Mon Nov 17 14:27:13 CST 2008 FREE PVS: /dev/etherd/e1.1p1 /dev/etherd/e1.1p2 INUSE PVS: /dev/etherd/e1.1p4 /dev/etherd/e1.1p5 /dev/etherd/e1.1p6 /dev/etherd/e1.1p7 NOT INUSE PVS: /dev/etherd/e1.1p3 Adding /dev/etherd/e1.1p2 to the volume group on hayes-03 Moving data from /dev/etherd/e1.1p4 to /dev/etherd/e1.1p2 on hayes-03 Finding volume group "pv_shuffle" Executing: /sbin/modprobe dm-log-clustered Archiving volume group "pv_shuffle" metadata (seqno 4). Creating logical volume pvmove0 Moving 427 extents of logical volume pv_shuffle/stripe Updating volume group metadata Creating volume group backup "/etc/lvm/backup/pv_shuffle" (seqno 5). Checking progress every 15 seconds Removing temporary pvmove LV Writing out final volume group after pvmove Creating volume group backup "/etc/lvm/backup/pv_shuffle" (seqno 7). Removing /dev/etherd/e1.1p3 from the volume group on hayes-03 Iteration 2 of 2 started on hayes-02 at Mon Nov 17 14:58:30 CST 2008 FREE PVS: /dev/etherd/e1.1p1 /dev/etherd/e1.1p3 INUSE PVS: /dev/etherd/e1.1p2 /dev/etherd/e1.1p5 /dev/etherd/e1.1p6 /dev/etherd/e1.1p7 NOT INUSE PVS: /dev/etherd/e1.1p4 Adding /dev/etherd/e1.1p3 to the volume group on hayes-02 Moving data from /dev/etherd/e1.1p7 to /dev/etherd/e1.1p3 on hayes-02 Finding volume group "pv_shuffle" Executing: /sbin/modprobe dm-log-clustered Archiving volume group "pv_shuffle" metadata (seqno 9). Creating logical volume pvmove0 Moving 1280 extents of logical volume pv_shuffle/linear Updating volume group metadata Creating volume group backup "/etc/lvm/backup/pv_shuffle" (seqno 10). Checking progress every 15 seconds Removing temporary pvmove LV Writing out final volume group after pvmove Creating volume group backup "/etc/lvm/backup/pv_shuffle" (seqno 12). Removing /dev/etherd/e1.1p4 from the volume group on hayes-02 real 35m3.384s user 0m0.943s sys 0m0.708s cmirror-1.1.34-5.el5/kmod-cmirror-0.1.21-2.el5: [cmarthal@silver bin]$ time ./pv_shuffle -l /home/msp/cmarthal/work/rhel5/sts-root -r /usr/tests/sts-rhel5.3 -R ../../var/share/resource_files/hayes.xml -i 2 1 disk(s) to be used: hayes-01=/dev/etherd/e1.1 hayes-02=/dev/etherd/e1.1 hayes-03=/dev/etherd/e1.1 on hayes-01... dicing /dev/etherd/e1.1 into 7... re-reading disks on hayes-03... re-reading disks on hayes-01... re-reading disks on hayes-02... vgscan on all nodes... creating lvm devices... create PV(s) out of all avail partitions on hayes-01 create VG(s) on hayes-01 pv_shuffle creating linear on hayes-01 creating stripe on hayes-01 Iteration 1 of 2 started on hayes-03 at Mon Nov 17 15:10:20 CST 2008 FREE PVS: /dev/etherd/e1.1p1 /dev/etherd/e1.1p2 INUSE PVS: /dev/etherd/e1.1p4 /dev/etherd/e1.1p5 /dev/etherd/e1.1p6 /dev/etherd/e1.1p7 NOT INUSE PVS: /dev/etherd/e1.1p3 Adding /dev/etherd/e1.1p2 to the volume group on hayes-03 Moving data from /dev/etherd/e1.1p5 to /dev/etherd/e1.1p2 on hayes-03 Finding volume group "pv_shuffle" Executing: /sbin/modprobe dm-log-clustered Archiving volume group "pv_shuffle" metadata (seqno 4). Creating logical volume pvmove0 Moving 427 extents of logical volume pv_shuffle/stripe Updating volume group metadata Creating volume group backup "/etc/lvm/backup/pv_shuffle" (seqno 5). Checking progress every 15 seconds Removing temporary pvmove LV Writing out final volume group after pvmove Creating volume group backup "/etc/lvm/backup/pv_shuffle" (seqno 7). Removing /dev/etherd/e1.1p3 from the volume group on hayes-03 Iteration 2 of 2 started on hayes-03 at Mon Nov 17 15:41:40 CST 2008 FREE PVS: /dev/etherd/e1.1p1 /dev/etherd/e1.1p3 INUSE PVS: /dev/etherd/e1.1p2 /dev/etherd/e1.1p4 /dev/etherd/e1.1p6 /dev/etherd/e1.1p7 NOT INUSE PVS: /dev/etherd/e1.1p5 Adding /dev/etherd/e1.1p3 to the volume group on hayes-03 Moving data from /dev/etherd/e1.1p2 to /dev/etherd/e1.1p3 on hayes-03 Finding volume group "pv_shuffle" Executing: /sbin/modprobe dm-log-clustered Archiving volume group "pv_shuffle" metadata (seqno 9). Creating logical volume pvmove0 Moving 427 extents of logical volume pv_shuffle/stripe Updating volume group metadata Creating volume group backup "/etc/lvm/backup/pv_shuffle" (seqno 10). Checking progress every 15 seconds Removing temporary pvmove LV Writing out final volume group after pvmove Creating volume group backup "/etc/lvm/backup/pv_shuffle" (seqno 12). Removing /dev/etherd/e1.1p5 from the volume group on hayes-03 real 63m14.555s user 0m0.940s sys 0m0.715s FWIW, I retested this with the fix for bz 464550 that Jon built for me and that time was: real 42m25.138s user 0m0.923s sys 0m0.741s I stole corey's 'hayes' machines (the only ones I can reproduce this on), and did some profiling. I used a system tap script to print out every request that goes to the userspace server. At the end of the 'Exit' line, you can see the time it takes to process the request. I'm also printing out the time between calls. You can easily see that the majority of the time is spent doing the work and not making cmirror requests. Resync work is handed out in 4MB chunks. The time between getting the work and setting it as finished we say is around the largest number 404227us (time spent between requests). This leads us to <10MB/s - longer than 160 minutes to complete. Entering: dm_clog_consult_server(DM_CLOG_GET_RESYNC_WORK) [kmirrord/20217] Exiting : dm_clog_consult_server() [kmirrord/20217] 475us Time between calls: 3us Entering: dm_clog_consult_server(DM_CLOG_IN_SYNC) [kmirrord/20217] Exiting : dm_clog_consult_server() [kmirrord/20217] 17us Time between calls: 9us Entering: dm_clog_consult_server(DM_CLOG_GET_SYNC_COUNT) [kmirrord/20217] Exiting : dm_clog_consult_server() [kmirrord/20217] 61us Time between calls: 404227us Entering: dm_clog_consult_server(DM_CLOG_SET_REGION_SYNC) [kmirrord/20217] Exiting : dm_clog_consult_server() [kmirrord/20217] 695us Time between calls: 3us Entering: dm_clog_consult_server(DM_CLOG_CLEAR_REGION) [kmirrord/20217] Exiting : dm_clog_consult_server() [kmirrord/20217] 18us Time between calls: 1us Entering: dm_clog_consult_server(DM_CLOG_FLUSH) [kmirrord/20217] Exiting : dm_clog_consult_server() [kmirrord/20217] 702us It is with great pride that I close this NOTABUG. The problem was network hardware related. I'll let corey post the confirming data from his machines if he wants to. Removing the release note request on this one based on comment #12 Just a follow up to comment #11, the hardware issue referenced was the RX-ERRs sometimes seen on one or all of the nodes in the cluster during cmirror/pvmove operations. Kernel Interface table Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg eth0 1500 0 4918652 1575 0 0 5307271 0 0 0 BMRU lo 16436 0 121 0 0 0 121 0 0 0 LRU Oddly, these errors are seen on machines in all 3 of my clusters all on different sub nets only when doing cmirror/openais intensive things. I don't have issues when not doing these operations. Also other clusters on these networks do not see any errors while doing other tests. |