Bug 1941841
Summary: | [Hyper-V][RHEL-7] Cannot boot kernel 3.10.0-1160.21.1.el7.x86_64 on Hyper-V | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Akemi Yagi <toracat> | ||||||||||||||||||||||||||||||
Component: | kernel | Assignee: | Mohammed Gamal <mmorsy> | ||||||||||||||||||||||||||||||
kernel sub component: | Hyper-V | QA Contact: | HuijingHei <hhei> | ||||||||||||||||||||||||||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||||||||||||||||||||||||||
Severity: | urgent | ||||||||||||||||||||||||||||||||
Priority: | urgent | CC: | acaringi, ajb, cavery, decui, hhei, huzhao, jbainbri, jen, jreznik, mmorsy, nmurray, online, ribarry, riehecky, vkuznets, xialiu, xuli, yacao, yuxisun | ||||||||||||||||||||||||||||||
Version: | 7.9 | Keywords: | Regression, ZStream | ||||||||||||||||||||||||||||||
Target Milestone: | rc | ||||||||||||||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||||||||||||||||||
OS: | Linux | ||||||||||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||||||||||
Fixed In Version: | kernel-3.10.0-1160.27.1.el7 | Doc Type: | If docs needed, set a value | ||||||||||||||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||||||||||||||
Last Closed: | 2021-06-08 22:31:47 UTC | Type: | Bug | ||||||||||||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||||||||||||
Bug Depends On: | |||||||||||||||||||||||||||||||||
Bug Blocks: | 1951491 | ||||||||||||||||||||||||||||||||
Attachments: |
|
Description
Akemi Yagi
2021-03-22 21:44:24 UTC
Bumping this up in sev/prio and marking as Regression, as this potentially affects all RHEL 7 on Hyper-V, which is a large install base and not desirable to have broken. Hi Akemi Yagi, thanks for taking the time to enter a bug report with us. I am trying to reproduce the issue with 3.10.0-1160.21.1.el7.x86_64 in VM on Hyper-V host, but failed to reproduce on Windows server 2019 (17763-10.0-1-0.1757) or Windows server 2016 (14393-10.0-4-0.4169), perhaps it is related to specify host build, could you help to check the host build version on which you meet this problem? To get host build, you can run 'dmesg | grep "Hyper-V Host Build"' on the vm Thanks! Doesn't see it with kernel-3.10.0-1160.21.1.el7.x86_64 on Azure Hyper-V Host build: 18362-10.0-3-0.3216 (In reply to HuijingHei from comment #8) > I am trying to reproduce the issue with 3.10.0-1160.21.1.el7.x86_64 in VM on > Hyper-V host, but failed to reproduce on Windows server 2019 > (17763-10.0-1-0.1757) or Windows server 2016 (14393-10.0-4-0.4169), perhaps > it is related to specify host build, could you help to check the host build > version on which you meet this problem? > > To get host build, you can run 'dmesg | grep "Hyper-V Host Build"' on the vm Hi HuijingHei, I will ask the original reporters of the issue (CentOS bug and RH discussion forum). On my PC Dell Precison 5820, Windows 10 for Workstation, version 20H2, OS build 19042.870, this is the Hyper-V version: [marco@redhat7 ~]$ dmesg | grep "Hyper-V Host Build" [ 0.000000] Hyper-V Host Build:19041-10.0-0-0.870 Marco Hi HuijingHei, Here are the replies from the original reporters: (1) https://bugs.centos.org/view.php?id=18117#c38322 Edition: Windows 10 Business Version: 20H2 OS build: 19042.867 Experience: Windows Feature Experience Pack 120.2212.551.0 There is something more going on, yesterday I spun up a fresh install of CentOS 7 on a new VM in the same host (with the intent of testing the plus kernel) and I was unable to reproduce the issue. It still reliably reproduces on the existing CentOS 7 installs however. Perhaps it is because the existing instances have a number of kernels installed? (2) https://access.redhat.com/discussions/5895461#comment-2062041 On my PC Dell Precison 5820, Windows 10 for Workstation, version 20H2, OS build 19042.870, this is the Hyper-V version: [marco@redhat7 ~]$ dmesg | grep "Hyper-V Host Build" [ 0.000000] Hyper-V Host Build:19041-10.0-0-0.870 I can't repro the issue with CentOS 7.9 (3.10.0-1160.21.1.el7.x86_64 #1 SMP Tue Mar 16 18:28:22 UTC 2021) on my local Win10 (Hyper-V Host Build:19041-10.0-0-0.867). Can we ask the original bug reporters to share the /var/log/Xorg.0.log? They can boot up the VM using the old good kernel, and check /var/log/Xorg.0.log.old. (In reply to Akemi Yagi from comment #14) > Here are the replies from the original reporters: > > (1) https://bugs.centos.org/view.php?id=18117#c38322 > > Edition: Windows 10 Business > Version: 20H2 > OS build: 19042.867 > Experience: Windows Feature Experience Pack 120.2212.551.0 > > There is something more going on, yesterday I spun up a fresh install of > CentOS 7 on a new VM in the same host (with the intent of testing the plus > kernel) and I was unable to reproduce the issue. It still reliably > reproduces on the existing CentOS 7 installs however. Perhaps it is because > the existing instances have a number of kernels installed? > > (2) https://access.redhat.com/discussions/5895461#comment-2062041 > > On my PC Dell Precison 5820, Windows 10 for Workstation, version 20H2, OS > build 19042.870, this is the Hyper-V version: > > [marco@redhat7 ~]$ dmesg | grep "Hyper-V Host Build" [ 0.000000] Hyper-V > Host Build:19041-10.0-0-0.870 Thanks Akemi and Marco for your info! Can not reproduce with 3.10.0-1160.21.1.el7.x86_64 on my local Hyper-V Host (Build:19041-10.0-0-0.867), same result as comment #15. Could you help to try Dexuan's suggestion(comment #16) to share the failed boot log /var/log/Xorg.0.log.old and /var/log/dmesg.old? Thanks! Created attachment 1766193 [details]
dmesg actual
Created attachment 1766194 [details]
dmesg old
Created attachment 1766195 [details]
Xorg log actual
Created attachment 1766196 [details]
Xorg log old
Hi Dexuan, I have a Dell Precison 5820, Windows 10 Pro for Workstation, version 20H2, OS build 19042.870, Hyper-V Host Build:19041-10.0-0-0.870. I started the Redhat 7.9 VM using a backup vhdx file (Kernel: Linux 3.10.0-1160.15.2.el7.x86_64), I applied le the latest update (kernel.x86_64 0:3.10.0-1160.21.1.el7), restarted the VM, it was stuck. I turned off the VM, started with the kernel 3.10.0-1160.15.2.el7.x86_64, it was ok, I saved the the dmesg and Xorg file (current and old) and post as attachments above. Tell me if you need more information. Marco All the 4 log files (comment #19~22) show that 3.10.0-1160.15.2.el7.x86_64 is used and I can't find any error. I remember people mentioned that the bug only repros with 3.10.0-1160.21.1 (but somehow HuijingHei and I can't repro it). @Marco: Can you please check if you have some log files in /var/log/ that are generated with 3.10.0-1160.21.1? Can you also enable the virtual serial console port for the VM and pass the "console=ttyS0" kernel parameter. This way you're able to check if there is any error from the serial console, whe the issue repros. You need a tool, e.g. Putty, to get the log messages from the serial console. FYI: How to "Use Putty to connect Hyper-V Linux VM by serial console": https://capsl0cker.github.io/memo.html The putty tool is here: https://www.chiark.greenend.org.uk/~sgtatham/putty/ (we need right click the program, then "run it as Administrator" to open the VM's virtual serial console) Created attachment 1766212 [details]
addit.log
Created attachment 1766214 [details]
grubby
Created attachment 1766215 [details]
yum.log
Created attachment 1766216 [details]
messages
Hi Dexuan, I did this: grep -rlw "3.10.0-1160.21.1" /var/log /var/log/audit/audit.log /var/log/grubby /var/log/yum.log /var/log/messages I attached the 4 files. Marco Thanks for the new logs, but unluckily these logs still show that the old kernel was running. I suspect the new kernel (3.10.0-1160.21.1) hung or panicked, so the VM was unable to save the messages in the /var/log/ folder. In this case, we need to check the log messages from the virtual serial console. Please refer to comment 24 and 25 to get the kernel messages, when the isue repros. When you add the "console=ttyS0" kernel parameter, please also replace the "rhgb quiet" with "ignore_loglevel" to get more debug messages. Created attachment 1766292 [details]
Putty log
Hi Dexuan, I modified the /etc/dafault/grub file with the 3.10.0-1160.15.2.el7.x86_64 kernel: GRUB_CMDLINE_LINUX="rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap ignore_loglevel console=ttyS0" I removed the "video=hyperv_fb:1600x1400" in the kernel config. Then: grub2-mkconfig -o /boot/grub2/grub.cfg systemctl reboot Started the 3.10.0-1160.15.2.el7.x86_64 kernel. On Windows: PS C:\WINDOWS\system32> Set-VMComPort -VMName "Red Hat 7" -Path \\.\pipe\redhat7 -Number 1 PS C:\WINDOWS\system32> Get-VMComPort -VMName "Red Hat 7" VMName Name Path ------ ---- ---- Red Hat 7 COM 1 \\.\pipe\redhat7 Red Hat 7 COM 2 Installed Putty x64 on Windows, run it as Administrator: ‘Connection type’ -> ‘Serial’ Serial line: \\.\pipe\redhat7 It opened the connection, and I could login with my account. Then I stopped the VM, and started it with the 3.10.0-1160.21.1.el7.x86_64 kernel and it was just fine. I could connect with Putty and I saw the boot sequence, with no errors, I attach the log. hostnamectl Static hostname: redhat7.local Icon name: computer-vm Chassis: vm Machine ID: 7f58a54ed36b42e28ac0e213325cb5b2 Boot ID: afa01bfdde3c4e60aff058898055d6fa Virtualization: microsoft Operating System: Red Hat Enterprise Linux CPE OS Name: cpe:/o:redhat:enterprise_linux:7.9:GA:server Kernel: Linux 3.10.0-1160.21.1.el7.x86_64 Architecture: x86-64 The display with the wmconnect is just 1152x864, but it works. With XRDP I can get 3000X1500. I hope this will help. Marco I think that the problem is the video=hyperv_fb:1600x1400 configuration. You probably couldn't reproduce it because you doesn't have this setting. Marco The video=hyperv_fb setting works with 3.10.0-1160.15.2.el7.x86_64 kernel, but not with 3.10.0-1160.21.1.el7.x86_64 kernel. Marco (In reply to Marco Gregorini from comment #35) > The video=hyperv_fb setting works with 3.10.0-1160.15.2.el7.x86_64 kernel, > but not with 3.10.0-1160.21.1.el7.x86_64 kernel. > Marco Hi Marco, can you please re-collect the Putty log, but with the 3.10.0-1160.21.1 kernel + video=hyperv_fb:1600x1400 + ignore_loglevel console=ttyS0? Looks like this is the only combination with which you're able to reproduce the issue. In my test, this combination still works just fine. Note: Actually "video=hyperv_fb:1600x1400" is ignored (so the default 1152x864 is used) because the required video memory size is 1600*1400*32/8 / (1024.0*1024) ~= 8.54 MB, which is bigger than the 8MB VRAM size supported by the hyperv_fb driver in RHEL 7.9. I'll be attaching my screenshot FYI, which shows 2 lines: [ 2.281338] hyperv_fb: Screen resolution option is out of range: skipped [ 2.281340] hyperv_fb: Screen resolution: 1152x864, Color depth: 32 Created attachment 1766404 [details]
3.10.0-1160.21.1 kernel + video=hyperv_fb:1600x1400 + ignore_loglevel console=ttyS0
Thanks Dexuan and Marco! 3.10.0-1160.21.1.el7.x86_64 and 'video=hyperv_fb:1600x1400' also works on my VM, and actually use 1152x864 instead Hi Dexuan, I was wrong, I didn't set video=hyperv_fb to 1600X1400, but it was 1920X1080 (HD). I set it editing the start kernel menu (see the attached png file), then I removed it. I changed GRUB_CMDLINE_LINUX: GRUB_CMDLINE_LINUX="rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap ignore_loglevel console=ttyS0 video=hyperv_fb:1920X1080" grub2-mkconfig -o /boot/grub2/grub.cfg systemctl reboot I started the 3.10.0-1160.15.2.el7.x86_64 kernel and the VM was fine, with 1920X1080 resulotion. But starting the 3.10.0-1160.21.1.el7.x86_64 kernel did not works. The VM was stuck with the red signs on the top. I was able to record the output with putty, see the attached tex file ("3.10.0-1160.21.1.el7.x86_64 failed.txt"). I changed again GRUB_CMDLINE_LINUX: GRUB_CMDLINE_LINUX="rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap ignore_loglevel console=ttyS0 video=hyperv_fb:1600x1400" grub2-mkconfig -o /boot/grub2/grub.cfg systemctl reboot Both kernels start the VM with 1152x864 resulotion (see the attached file with the boot of 3.10.0-1160.21.1.el7.x86_64 kernel, "3.10.0-1160.21.1.el7.x86_64 ok.txt"). I hope this will help. Marco Created attachment 1766555 [details]
hyperv_fb:1920X1080
Created attachment 1766556 [details]
3.10.0-1160.21.1.el7.x86_64 failed
Created attachment 1766557 [details]
3.10.0-1160.21.1.el7.x86_64 ok
And (1920*1080*32)/8/(1024*1024) give 7,91015625, so it should works with 3.10.0-1160.21.1.el7.x86_64 kernel as it works with 3.10.0-1160.15.2.el7.x86_64 kernel. So probably there is some bug in the 3.10.0-1160.21.1.el7.x86_64 kernel, I think just hyper-v related. Marco Thanks Marco! Test with 'video=hyperv_fb:1920x1080' and 3.10.0-1160.21.1.el7.x86_64, vm start failed at 'hyperv_fb: Screen resolution: 1920x1080, Color depth: 32', the issue is not related to host version Thanks, Marco! Now I'm able to repro the issue with 3.10.0-1160.15.2.el7.x86_64 + video=hyperv_fb:1920X1080. The key is to use "video=hyperv_fb:1920X1080". I got the below panic from the host's Event Viewer: Applications and Services Logs -> Microsoft -> Windows -> Hyper-V-worker -> Admin: 'decui-co79' has encountered a fatal error. The guest operating system reported that it failed with the following error codes: ErrorCode0: 0x0, ErrorCode1: 0x0, ErrorCode2: 0x0, ErrorCode3: 0x0, ErrorCode4: 0x0. If the problem persists, contact Product Support for the guest operating system. (Virtual machine ID 82303BB4-3A05-42E3-8C1A-EE20A798F9E1) Guest message: <4>[ 1.864470] RBP: ffff89b07546b618 R08: ffffffff99468420 R09: ffffb84000cc3000 <4>[ 1.864470] R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff89b12eb96600 <4>[ 1.864471] R13: ffff89b12eb965a0 R14: ffffb84000cc3004 R15: ffffb84000cc2400 <4>[ 1.864472] FS: 00007fbd49ca88c0(0000) GS:ffff89b147c00000(0000) knlGS:0000000000000000 <4>[ 1.864472] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 1.864473] CR2: ffffb84000cc3000 CR3: 0000000035494000 CR4: 00000000003606f0 <4>[ 1.864475] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[ 1.864476] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 <4>[ 1.864476] Call Trace: <4>[ 1.864480] [<ffffffff98ffb7ce>] ? fb_set_var+0x20e/0x440 <4>[ 1.864483] [<ffffffffc0344a04>] hvfb_cfb_imageblit+0x24/0x90 [hyperv_fb] <4>[ 1.864485] [<ffffffff9900b95d>] bit_putcs+0x31d/0x5a0 <4>[ 1.864486] [<ffffffffc034426d>] ? hvfb_ondemand_refresh_throttle+0xcd/0xe0 [hyperv_fb] <4>[ 1.864488] [<ffffffff99005e09>] ? fbcon_clear_margins+0x69/0x90 <4>[ 1.864489] [<ffffffff99006d5b>] fbcon_putcs+0x12b/0x160 <4>[ 1.864490] [<ffffffff9900b640>] ? bit_cursor+0x6a0/0x6a0 <4>[ 1.864492] [<ffffffff9907dc44>] do_update_region+0x114/0x1a0 <4>[ 1.864494] [<ffffffff9908071e>] redraw_screen+0x1fe/0x270 <4>[ 1.864495] [<ffffffff99080c7a>] vc_do_resize+0x4ea/0x520 <4>[ 1.864496] [<ffffffff99080ccc>] vc_resize+0x1c/0x20 <4>[ 1.864498] [<ffffffff99009a0d>] fbcon_init+0x36d/0x580 <4>[ 1.864499] [<ffffffff9907e560>] visual_init+0xd0/0x130 <4>[ 1.864500] [<ffffffff99081099>] do_bind_con_driver+0x169/0x340 <4>[ 1.864501] [<ffffffff990817a9>] do_take_over_console+0x49/0x60 <4>[ 1.864502] [<ffffffff99004c53>] do_fbcon_takeover+0x63/0xd0 <4>[ 1.864503] [<ffffffff9900a73d>] fbcon_event_notify+0x61d/0x730 <4>[ 1.864506] [<ffffffff99390b6f>] notifier_call_chain+0x4f/0x70 <4>[ 1.864508] [<ffffffff98ccc15d>] __blocking_notifier_call_chain+0x4d/0x70 <4>[ 1.864509] [<ffffffff98ccc196>] blocking_notifier_call_chain+0x16/0x20 <4>[ 1.864511] [<ffffffff98ffafcb>] fb_notifier_call_chain+0x1b/0x20 <4>[ 1.864512] [<ffffffff98ffc276>] register_framebuffer+0x1f6/0x340 <4>[ 1.864526] [<ffffffffc03459e2>] hvfb_probe+0x512/0x803 [hyperv_fb] <4>[ 1.864530] [<ffffffffc02a0b81>] vmbus_probe+0x41/0xa0 [hv_vmbus] <4>[ 1.864531] [<ffffffff990bb6a5>] driver_probe_device+0xc5/0x3e0 <4>[ 1.864532] [<ffffffff990bbaa3>] __driver_attach+0x93/0xa0 <4>[ 1.864534] [<ffffffff990bba10>] ? __device_attach+0x50/0x50 <4>[ 1.864535] [<ffffffff990b9245>] bus_for_each_dev+0x75/0xc0 <4>[ 1.864536] [<ffffffff990bb01e>] driver_attach+0x1e/0x20 <4>[ 1.864537] [<ffffffff990baac0>] bus_add_driver+0x200/0x2d0 <4>[ 1.864538] [<ffffffff990bc134>] driver_register+0x64/0xf0 <4>[ 1.864540] [<ffffffffc02a0b36>] __vmbus_driver_register+0x76/0x80 [hv_vmbus] <4>[ 1.864541] [<ffffffffc034a000>] ? 0xffffffffc0349fff <4>[ 1.864543] [<ffffffffc034a021>] hvfb_drv_init+0x21/0x1000 [hyperv_fb] <4>[ 1.864545] [<ffffffff98c0210a>] do_one_initcall+0xba/0x240 <4>[ 1.864547] [<ffffffff98d1e62a>] load_module+0x271a/0x2bb0 <4>[ 1.864549] [<ffffffff98fb4710>] ? ddebug_proc_write+0x100/0x100 <4>[ 1.864551] [<ffffffff98d1ebaf>] SyS_init_module+0xef/0x140 <4>[ 1.864553] [<ffffffff99395f92>] system_call_fastpath+0x25/0x2a <4>[ 1.864565] Code: ec b9 08 00 00 00 89 5d d0 eb 30 0f 1f 44 00 00 41 0f be 04 24 29 f9 4d 8d 71 04 d3 f8 44 21 d0 41 8b 1c 80 44 21 db 89 d8 31 f0 <41> 89 01 85 c9 75 06 49 83 c4 01 b1 08 4d 89 f1 83 ea 01 83 fa <1>[ 1.864567] RIP [<ffffffff99012443>] cfb_imageblit+0x4d3/0x510 <4>[ 1.864567] RSP <ffff89b07546b5a8> <4>[ 1.864567] CR2: ffffb84000cc3000 <4>[ 1.864569] ---[ end trace f293fabe7364caa3 ]--- <0>[ 1.864570] Kernel panic - not syncing: Fatal exception <0>[ 1.864602] Kernel Offset: 0x17c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Sorry for the mistake in my last reply -- I meant to say the issue repros with 3.10.0-1160.21.1, not .15.2 Actually the issue does not repro with 3.10.0-1160.15.2.el7.x86_64, so something must have changed between .15.2 and .21.1. Found the src code of the 2 kernels at https://vault.centos.org/7.9.2009/centosplus/Source/SPackages/ Trying to find the difference... After I apply the patch 67e7cdb4829d ("video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host") to 3.10.0-1160.21.1 (we need to comment out the line "case VERSION_WIN10_V5:" in synthvid_connect_vsp(), because VERSION_WIN10_V5 is not supported in 3.10.0-1160.21.1), the issue is fixed. But I'm not sure how exactly the issue is caused when we don't have 67e7cdb4829d. About the VRAM's cache type, please note: For v5.5+, please use "video: hyperv_fb: Fix the cache type when mapping the VRAM" (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5f1251a48c17b54939d7477305e39679a565382c) For v5.4 and older, please use the 2 patches (the first is a simple git-cherry-pick of the mainline patch, but unluckily it breaks Xorg, so I made the second patch to un-break it): https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=db49200b1dad3949fef14d0cf2aa426d879a7f16 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=9e60056b1f532520dae5333c24e2e2b944c929b7 To make the discussion easy, let me list all the patches involved here: Patch A: 67e7cdb4829d ("video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host") Patch B: d21987d709e8 ("video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver") Patch C: 5f1251a48c17 ("video: hyperv_fb: Fix the cache type when mapping the VRAM") Patch D1: db49200b1dad ("video: hyperv_fb: Fix the cache type when mapping the VRAM") Note: Actually, D1 = C. Patch D2: 9e60056b1f53 ("video: hyperv_fb: Fix the mmap() regression for v5.4.y and older") 3.10.0-1160.21.1 takes patch B and C, and the issue happens. If it also takes A (with the line "case VERSION_WIN10_V5:" commented out), then the issue is fixed. Note: for RHEL 8.x, can Red Hat also please make sure the kernel picks patch A + B + C together? If you pick up C without B, then Xorg is broken; if you don't want B, then please pick up D1 + D2 rather than C. If you pick up B without A, then I suspect RHEL 8 may have the same bug here. I found out why 3.10.0-1160.21.1 fails in the case of hyperv_fb:1920x1080: because of patch B, info->screen_base points to a shadow VRAM buffer whose size is "screen_width * screen_height * screen_depth / 8" bytes: static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info) { ... dio_fb_size = screen_width * screen_height * screen_depth / 8; ... /* Allocate memory for deferred IO */ par->dio_vp = vzalloc(round_up(dio_fb_size, PAGE_SIZE)); ... info->fix.smem_len = dio_fb_size; info->screen_base = par->dio_vp; info->screen_size = dio_fb_size; Note: at this time, screen_width and screen_height are still the initial values 1152 and 864, respectively. In hvfb_probe(), after hvfb_getmem() is called, hvfb_get_option() is called to update screen_width to 1920, and screen_height to 1080, and next the updated values are used: info->var.xres_virtual = info->var.xres = screen_width; info->var.yres_virtual = info->var.yres = screen_height; Later, when the kernel framebuffer subsystem tries to access a pixel outside of the range 1152x864, the kernel tries to access a memory location outside of the buffer info->screen_base, and this causes a page fault: <1>[ 2.293602] BUG: unable to handle kernel paging request at ffffab6440dd9000 <1>[ 2.293623] IP: [<ffffffffacc1e543>] cfb_imageblit+0x4d3/0x510 <4>[ 2.293634] PGD 103170067 PUD 103171067 PMD 346b2067 PTE 0 <4>[ 2.293639] Oops: 0002 [#1] SMP <4>[ 2.293677] Modules linked in: ata_piix(+) crct10dif_pclmul crct10dif_common hyperv_fb(+) crc32c_intel libata serio_raw hv_vmbus floppy dm_mirror dm_region_hash dm_log dm_mod fuse <4>[ 2.293705] CPU: 0 PID: 269 Comm: systemd-udevd Not tainted 3.10.0 #1 <4>[ 2.293708] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018 <4>[ 2.293710] task: ffff9b8ff4e9e2a0 ti: ffff9b8ff446c000 task.ti: ffff9b8ff446c000 <4>[ 2.293717] RIP: 0010:[<ffffffffacc1e543>] [<ffffffffacc1e543>] cfb_imageblit+0x4d3/0x510 <4>[ 2.293719] RSP: 0018:ffff9b8ff446f5a8 EFLAGS: 00010286 <4>[ 2.293720] RAX: 00000000ff000000 RBX: 0000000000000000 RCX: 0000000000000007 <4>[ 2.293721] RDX: 000000000000047f RSI: 00000000ff000000 RDI: 0000000000000001 <4>[ 2.293721] RBP: ffff9b8ff446f618 R08: ffffffffad069840 R09: ffffab6440dd9000 <4>[ 2.293722] R10: 0000000000000001 R11: 0000000000aaaaaa R12: fff To fix the bug, we can pick up patch A (I prefer this), or just apply the below change: --- drivers/video/hyperv_fb.c.old 2021-03-26 19:09:14.694996600 -0700 +++ drivers/video/hyperv_fb.c 2021-03-26 19:09:37.602996600 -0700 @@ -956,13 +956,14 @@ goto error1; } + hvfb_get_option(info); + ret = hvfb_getmem(hdev, info); if (ret) { pr_err("No memory for framebuffer\n"); goto error2; } - hvfb_get_option(info); pr_info("Screen resolution: %dx%d, Color depth: %d\n", screen_width, screen_height, screen_depth); Hi Dexuan, thanks for your help, though I cannot find a way to apply the patch 67e7cdb4829d. I downloaded the source code of "video: hyperv_fb: Fix the cache type when mapping the VRAM", ad I will try to build it. Regarding Redhat 8, I have "Red Hat Enterprise Linux release 8.3 (Ootpa)". I set this kernelopts: grub2-editenv - set "kernelopts=root=/dev/mapper/rhel_redhat8-root ro resume=/dev/mapper/rhel_redhat8-swap rd.lvm.lv=rhel_redhat8/root rd.lvm.lv=rhel_redhat8/swap ignore_level console=ttyS0 video=hyperv_fb:1920x1080" On restart I can access the console with Putty, and the vmconnect show 1920x1080 resolution, with no error. I attach the console output. Thanks again, Marco Created attachment 1766893 [details]
Console redhat 8
It's glad to know that RHEL 8 doesn't have the bug. For RHEL 7.9, I suppose RedHat will integrate the fix soon, e.g. Patch A: 67e7cdb4829d ("video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host"). (In reply to Dexuan Cui from comment #56) > For RHEL 7.9, I suppose RedHat will integrate the fix soon, e.g. Patch A: > 67e7cdb4829d ("video: hyperv: hyperv_fb: Obtain screen resolution from > Hyper-V host"). How about patch D2? C (=D1) is in RHEL 7, so wouldn't it need D2 to fix it? (In reply to Akemi Yagi from comment #57) > (In reply to Dexuan Cui from comment #56) > > > For RHEL 7.9, I suppose RedHat will integrate the fix soon, e.g. Patch A: > > 67e7cdb4829d ("video: hyperv: hyperv_fb: Obtain screen resolution from > > Hyper-V host"). > > How about patch D2? C (=D1) is in RHEL 7, so wouldn't it need D2 to fix it? If a kernel has patch B, then it does not need D2 (please refer to the changelog of pach D2 for the details). If a kernel have B, then it should have patch A as well, otherwise this bug happens. If a kernel neither has A nor B, then it can have patch D1 and D2 to fix the VRAM performace issue. I hope Redhat engineers will let us know when it will be safe to download the patched kernel. Thanks you all, Marco (In reply to Dexuan Cui from comment #58) > If a kernel has patch B, then it does not need D2 (please refer to the > changelog of pach D2 for the details). > > If a kernel have B, then it should have patch A as well, otherwise this bug > happens. > > If a kernel neither has A nor B, then it can have patch D1 and D2 to fix the > VRAM performace issue. The EL7 kernel 3.10.0-1160.21.1.el7 has the following patches applied: - [video] hyperv_fb: Fix the cache type when mapping the VRAM (Mohammed Gamal) [1908896] - [video] hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver (Mohammed Gamal) [1908896] That is, the kernel has "Patch B" and "Patch C". Therefore, I have added "Patch A" and rebuilt the kernel as a CentOSPlus kernel: kernel-plus-3.10.0-1160.21.1.el7.centos.plus.bug18117.2.x86_64 It is available here: https://people.centos.org/toracat/kernel/7/bugs/18117/ Anyone is welcome to give it a try. Please note that the packages are not signed and is provided for testing purposes only. (In reply to Akemi Yagi from comment #60) > (In reply to Dexuan Cui from comment #58) > > > If a kernel has patch B, then it does not need D2 (please refer to the > > changelog of pach D2 for the details). > > > > If a kernel have B, then it should have patch A as well, otherwise this bug > > happens. > > > > If a kernel neither has A nor B, then it can have patch D1 and D2 to fix the > > VRAM performace issue. > > The EL7 kernel 3.10.0-1160.21.1.el7 has the following patches applied: > > - [video] hyperv_fb: Fix the cache type when mapping the VRAM (Mohammed > Gamal) [1908896] > - [video] hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer > driver (Mohammed Gamal) [1908896] > > That is, the kernel has "Patch B" and "Patch C". > > Therefore, I have added "Patch A" and rebuilt the kernel as a CentOSPlus > kernel: > > kernel-plus-3.10.0-1160.21.1.el7.centos.plus.bug18117.2.x86_64 > > It is available here: > > https://people.centos.org/toracat/kernel/7/bugs/18117/ Thanks all for your effort! Test with 3.10.0-1160.21.1.el7.centos.plus.bug18117.2.x86_64, can start with 'video=hyperv_fb:1920x1080'. I have a question, if add "Patch A", does this mean it will support on host 'Set-VMVideo test1 -HorizontalResolution x -VerticalResolution x -ResolutionType Single'? Thanks! Start gen2 vm (on host exec 'Set-VMVideo test1 -HorizontalResolution 7680 -VerticalResolution 4320 -ResolutionType Single'), after vm starts, get 'Screen resolution: 7680x4320' (In reply to HuijingHei from comment #61) > I have a question, if add "Patch A", does this mean it will support on host > 'Set-VMVideo test1 -HorizontalResolution x -VerticalResolution x > -ResolutionType Single'? Thanks! Yes, with Patch A, Set-VMVideo is supposed to work, but for Gen-1 VM, the available VRAM size is only about 64MB (IMO this is a bug which even exists in the mainline. I just let the patch author know about this), so please make sure you don't set a too high resolution, e.g. I get the below error with 7680x4320: [ 2.468894] hv_vmbus: registering driver hyperv_fb [ 2.486380] hyperv_fb: Synthvid Version major 3, minor 5 [ 2.486555] hyperv_fb: Screen resolution: 1920x1080, Color depth: 32 [ 2.486561] hyperv_fb: Resource not available or (0x4000000 < 0x8000000) [ 2.486562] hyperv_fb: No memory for framebuffer [ 2.487018] hv_vmbus: probe failed for device 5620e0c7-8062-4dce-aeb7-520c7ef76171 (-12) [ 2.487023] hyperv_fb: probe of 5620e0c7-8062-4dce-aeb7-520c7ef76171 failed with error -12 This is because (7680 * 4320 * 32/8) / (1024.0*1024) = 126.5625, which > 64. I also tried 3840x4320, which worked fine. :-) IMO Gen-2 doesn't have the 64MB VRAM size limit. > Start gen2 vm (on host exec 'Set-VMVideo test1 -HorizontalResolution 7680 > -VerticalResolution 4320 -ResolutionType Single'), after vm starts, get > 'Screen resolution: 7680x4320' This is expected. (In reply to Dexuan Cui from comment #62) > (In reply to HuijingHei from comment #61) > > I have a question, if add "Patch A", does this mean it will support on host > > 'Set-VMVideo test1 -HorizontalResolution x -VerticalResolution x > > -ResolutionType Single'? Thanks! > > Yes, with Patch A, Set-VMVideo is supposed to work, but for Gen-1 VM, the > available VRAM size is only about 64MB (IMO this is a bug which even exists > in the mainline. I just let the patch author know about this), so please > make sure you don't set a too high resolution, e.g. I get the below error > with 7680x4320: > > [ 2.468894] hv_vmbus: registering driver hyperv_fb > [ 2.486380] hyperv_fb: Synthvid Version major 3, minor 5 > [ 2.486555] hyperv_fb: Screen resolution: 1920x1080, Color depth: 32 > [ 2.486561] hyperv_fb: Resource not available or (0x4000000 < 0x8000000) > [ 2.486562] hyperv_fb: No memory for framebuffer > [ 2.487018] hv_vmbus: probe failed for device > 5620e0c7-8062-4dce-aeb7-520c7ef76171 (-12) > [ 2.487023] hyperv_fb: probe of 5620e0c7-8062-4dce-aeb7-520c7ef76171 > failed with error -12 > > This is because (7680 * 4320 * 32/8) / (1024.0*1024) = 126.5625, which > 64. > > I also tried 3840x4320, which worked fine. :-) > > IMO Gen-2 doesn't have the 64MB VRAM size limit. > > > Start gen2 vm (on host exec 'Set-VMVideo test1 -HorizontalResolution 7680 > > -VerticalResolution 4320 -ResolutionType Single'), after vm starts, get > > 'Screen resolution: 7680x4320' > > This is expected. Thanks Dexuan for the confirmation! Thanks for your help, everyone. We plan to include the missing patch and push a fix out in an upcoming RHEL 7.9.z batch update. You can follow the current status in this BZ. Thanks Rick! Add rhel-7.9.z? flag to review (In reply to Dexuan Cui from comment #62) > Yes, with Patch A, Set-VMVideo is supposed to work, but for Gen-1 VM, the > available VRAM size is only about 64MB (IMO this is a bug which even exists > in the mainline. I just let the patch author know about this), so please > make sure you don't set a too high resolution, e.g. I get the below error > with 7680x4320: > > [ 2.468894] hv_vmbus: registering driver hyperv_fb > [ 2.486380] hyperv_fb: Synthvid Version major 3, minor 5 > [ 2.486555] hyperv_fb: Screen resolution: 1920x1080, Color depth: 32 > [ 2.486561] hyperv_fb: Resource not available or (0x4000000 < 0x8000000) > [ 2.486562] hyperv_fb: No memory for framebuffer > [ 2.487018] hv_vmbus: probe failed for device > 5620e0c7-8062-4dce-aeb7-520c7ef76171 (-12) > [ 2.487023] hyperv_fb: probe of 5620e0c7-8062-4dce-aeb7-520c7ef76171 > failed with error -12 > > This is because (7680 * 4320 * 32/8) / (1024.0*1024) = 126.5625, which > 64. > > I also tried 3840x4320, which worked fine. :-) > > IMO Gen-2 doesn't have the 64MB VRAM size limit. > > > Start gen2 vm (on host exec 'Set-VMVideo test1 -HorizontalResolution 7680 > > -VerticalResolution 4320 -ResolutionType Single'), after vm starts, get > > 'Screen resolution: 7680x4320' > > This is expected. Hi Dexuan, If I set video=hyperv_fb:3840x4320 in kernel parameter and restart guest, it does not make effect, but works using Set-VMVideo, is this by design? Thanks! # dmesg | grep hyperv_fb [ 0.000000] Kernel command line: .... video=hyperv_fb:3840x4320 [ 1.736969] hv_vmbus: registering driver hyperv_fb [ 1.739835] hyperv_fb: Synthvid Version major 3, minor 5 [ 1.741653] hyperv_fb: Screen resolution option is out of range: skipped [ 1.743175] hyperv_fb: Screen resolution: 1024x768, Color depth: 32 (In reply to HuijingHei from comment #72) > Hi Dexuan, > > If I set video=hyperv_fb:3840x4320 in kernel parameter and restart guest, it > does not make effect, but works using Set-VMVideo, is this by design? Thanks! > > # dmesg | grep hyperv_fb > [ 0.000000] Kernel command line: .... video=hyperv_fb:3840x4320 > [ 1.736969] hv_vmbus: registering driver hyperv_fb > [ 1.739835] hyperv_fb: Synthvid Version major 3, minor 5 > [ 1.741653] hyperv_fb: Screen resolution option is out of range: skipped > [ 1.743175] hyperv_fb: Screen resolution: 1024x768, Color depth: 32 To make "video=hyperv_fb:3840x4320" work, we should make sure: 1. the kernel has "Patch A: 67e7cdb4829d ("video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host")". 2. "Get-VMVideo -VMName your_vm_name" should report a resolution >= 3840x4320. See https://docs.microsoft.com/en-us/powershell/module/hyper-v/set-vmvideo?view=windowsserver2019-ps: -ResolutionType Specifies the resolution type for the virtual machine display. The acceptable values for this parameter are: Maximum. The input HorizontalResolution * VerticalResolution is the maximum supported resolution. All standard resolutions smaller than HorizontalResolution * VerticalResolution are also supported. Single. The input HorizontalResolution * VerticalResolution is the only supported resolution. Default. The supported resolutions are those in the list of standard resolutions. Input HorizontalResolution * VerticalResolution is ignored. By default the max supported resolution for a VM should be 1920x1200. I guess this is why "video=hyperv_fb:3840x4320" didn't work for you(?) For the -ResolutionType parameter, we don't have to specify the "Single" (which forces the VM to only use that resolution specified) -- we can also use "-ResolutionType Maximum" (which means we specify the max supported resolution and then we can use video=hyperv_fb:AxB to use a resolution that's <= the one we specify by Set-VMVideo). (In reply to Dexuan Cui from comment #74) > > To make "video=hyperv_fb:3840x4320" work, we should make sure: > 1. the kernel has "Patch A: 67e7cdb4829d ("video: hyperv: hyperv_fb: Obtain > screen resolution from Hyper-V host")". > 2. "Get-VMVideo -VMName your_vm_name" should report a resolution >= > 3840x4320. See > https://docs.microsoft.com/en-us/powershell/module/hyper-v/set- > vmvideo?view=windowsserver2019-ps: > > -ResolutionType > Specifies the resolution type for the virtual machine display. The > acceptable values for this parameter are: > > Maximum. The input HorizontalResolution * VerticalResolution is the maximum > supported resolution. All standard resolutions smaller than > HorizontalResolution * VerticalResolution are also supported. > Single. The input HorizontalResolution * VerticalResolution is the only > supported resolution. > Default. The supported resolutions are those in the list of standard > resolutions. Input HorizontalResolution * VerticalResolution is ignored. > > By default the max supported resolution for a VM should be 1920x1200. I > guess this is why "video=hyperv_fb:3840x4320" didn't work for you(?) Yes, on host get VM default resolution is 1920x1200, change Maximum resolution to 3840x4320 with "Set-VMVideo RHEL-8.4-GEN1-B -HorizontalResolution 3840 -VerticalResolution 4320 -ResolutionType Maximum", VM can start with "video=hyperv_fb:3840x4320" > For the -ResolutionType parameter, we don't have to specify the "Single" > (which forces the VM to only use that resolution specified) -- we can also > use "-ResolutionType Maximum" (which means we specify the max supported > resolution and then we can use video=hyperv_fb:AxB to use a resolution > that's <= the one we specify by Set-VMVideo). Thanks Dexuan for you info! Create bug 1948442 to track the gen1 VM issue with 7680x4320 in comment #62 Hi HuijingHei, can you make the bug 1948442 public, or should it remain private? Thanks Marco (In reply to Marco Gregorini from comment #77) > can you make the bug 1948442 public, or should it remain > private? Thanks Marco Hi Macro, bug 1948442 is public now, contact to me if you need other help. Thanks! Thanks HuijingHei, Marco Verify passed with 3.10.0-1160.27.1.el7.x86_64, VM works with video=hyperv_fb:1920x1080 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: kernel security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2314 |