Bug 1721395
Summary: | disk boot delay and high cpu | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Kapetanakis Giannis <bilias> | ||||||||||
Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> | ||||||||||
Status: | CLOSED WONTFIX | QA Contact: | Lin Li <lilin> | ||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | 8.4 | CC: | agk, bmarzins, bugs, heinzm, jbrassow, lilin, msnitzer, mtessun, prajnoha, tnisan, zkabelac | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | 8.5 | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2021-03-15 07:36:50 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Created attachment 1581601 [details]
VM open fd
so, problem is not completely solved by changing the disk to virtio. See dmesg of same VM. 1) with multipath enabled and iscsi access to both ports of storage 2) with access to only one port of iscsi and logged out from all other ports of storage Created attachment 1581802 [details]
dmesg with multipath
Created attachment 1581803 [details]
dmesg without multipath
much faster boot time (and reboot)
Hi Martin, Can you please assign it to the relevant team? Seems like it's out of RHV Storage team's scope Just for the record, I've already contacted Dell support about the storage and it doesn't seem to be a storage (device) problem. Moving this to multipath for now, as logging out of one mpath target seems to fix the issue. Also the IO Timeouts seem to lead to that conclusion. Are you still able to reproduce this? Do you have the log messages from when this occurs. Yes I still see this. Definitely happens after VM installation upon first boot. This happens with virtio as well but the delay is not that big. I'm using ovirt 4.3.10 and haven't moved to 4.4 yet What kind of logs do you need? It's likely too late for this to make it in RHEL7. Do you know if this happens in RHEL8? Haven't tested this in RHEL8... Wild guess: is there a chance this is related to kvm_intel preemption_timer? I have similar problems with virtio disks (not only ide) too many events=POLLIN timeouts (In reply to Kapetanakis Giannis from comment #12) > Haven't tested this in RHEL8... It would be really helpful to know if this is still happening in RHEL8. RHEL7 is closed to all but the most serious errors, and this doesn't qualify. > Wild guess: > is there a chance this is related to kvm_intel preemption_timer? Possibly. I have a hard time figuring out how the multipath device is even involved, since it works fine after boot, and from the multipath device's perspective IO is the same whether a guest is booting or not. It's possible that there is some different sort of access pattern to the device when a guest is booting, but I don't see why having only one iSCSI session would make that better. What process are you stracing when you see all the poll event timeouts? > I have similar problems with virtio disks (not only ide) > > too many events=POLLIN timeouts (In reply to Ben Marzinski from comment #13) > (In reply to Kapetanakis Giannis from comment #12) > > Haven't tested this in RHEL8... > > It would be really helpful to know if this is still happening in RHEL8. > RHEL7 is closed to all but the most serious errors, and this doesn't qualify. > > > Wild guess: > > is there a chance this is related to kvm_intel preemption_timer? > > Possibly. I have a hard time figuring out how the multipath device is even > involved, since it works fine after boot, and from the multipath device's > perspective IO is the same whether a guest is booting or not. It's possible > that there is some different sort of access pattern to the device when a > guest is booting, but I don't see why having only one iSCSI session would > make that better. > > What process are you stracing when you see all the poll event timeouts? > > > I have similar problems with virtio disks (not only ide) > > > > too many events=POLLIN timeouts I'm stracing qemu-kvm on the supervisor nodes. I've also have similar delays upon upgrading VMs with openbsd. This reboots the VM with a minimal kernel and then extracts tar archives from and to the filesystem. It hangs for a while and then continues, all the time... lot's of events=POLLIN timeouts on strace. Anyway, I will upgrade to RHEL8 and see what's the status there. thanks Moving this over to RHEL8, to see if it can be reproduced there, since it is too late to get any fix into RHEL7. I'm evaluating the same thing on RHEL 8.3.2011 Situation seems the same. A lot of: 14:36:29 ppoll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=13, events=POLLIN}, {fd=16, events=POLLIN}, {fd=18, events=POLLIN}, {fd=29, events=POLLIN}, {fd=31, events=POLLIN}, {fd=32, events=POLLIN}, {fd=33, events=POLLIN}, {fd=36, events=POLLIN}], 12, {tv_sec=0, tv_nsec=996416}, NULL, 8) = 0 (Timeout) <0.001011> 14:36:29 ppoll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=13, events=POLLIN}, {fd=16, events=POLLIN}, {fd=18, events=POLLIN}, {fd=29, events=POLLIN}, {fd=31, events=POLLIN}, {fd=32, events=POLLIN}, {fd=33, events=POLLIN}, {fd=36, events=POLLIN}], 12, {tv_sec=0, tv_nsec=996427}, NULL, 8) = 0 (Timeout) <0.001011> The setup is a physical box connected with iSCSI on the same storage with LVM. VMs are setup with libvirt (virt-manager) on kvm The hungs are while upgrading an openbsd vm. see https://bugzilla.redhat.com/show_bug.cgi?id=1721395#c14 I've also tried (again) logging out of one multipath and TIMEOUTs stopped and disk stalls also stopped. So situation is the same. I believe you can replicate it easily. install openbsd 6.7 (amd64) boot login sysupgrade After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |
Created attachment 1581600 [details] VM boot strace Hi, Not sure if this belongs here but I'll try anyway since vdsm is the closest match. We've recently migrated from FC to iSCSI 10Gbps on an EMC Unity 350F Storage. Active controller on EMC is connected to 2 switches with different VLANs per port. Nodes are connected to the two switches with bonding (active-backup). Multipathd works on top of this. We have problems booting VMs with IDE disks. - boot stops after SeaBIOS on "Booting from Hard Disk" - CPU of VM process is at 100% - Sometimes I've seen booting VM virtio disks delaying 10-20 seconds when scanning LVM (after swap). setup is up2date: kernel-3.10.0-957.21.2.el7.x86_64 vdsm-4.30.17-1.el7.x86_64 qemu-kvm-ev-2.12.0-18.el7_6.5.1.x86_64 The funny thing is that if logout from one ISCSI (multipath) session then the boot continues: 360060160f1c04800f55b065c6c7cff11 dm-10 DGC ,VRAID size=4.0T features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw |-+- policy='service-time 0' prio=50 status=active | |- 7:0:0:0 sdf 8:80 active ready running | `- 8:0:0:0 sdg 8:96 active ready running `-+- policy='service-time 0' prio=10 status=enabled |- 9:0:0:0 sdh 8:112 active ready running `- 10:0:0:0 sdi 8:128 active ready running Either logging out from sdf or sdg, makes boot continue strace of the process I see a lot of - EAGAIN (Resource temporarily unavailable) - ppoll resumed Timeout After the VM is booted and then I re-login to all iscsi sessions I don't see any problems. I don't have any errors on the network interfaces on the node and on switches.