Bug 1941841

Summary: [Hyper-V][RHEL-7] Cannot boot kernel 3.10.0-1160.21.1.el7.x86_64 on Hyper-V
Product: Red Hat Enterprise Linux 7 Reporter: Akemi Yagi <toracat>
Component: kernelAssignee: Mohammed Gamal <mmorsy>
kernel sub component: Hyper-V QA Contact: HuijingHei <hhei>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: acaringi, ajb, cavery, decui, hhei, huzhao, jbainbri, jen, jreznik, mmorsy, nmurray, online, ribarry, riehecky, vkuznets, xialiu, xuli, yacao, yuxisun
Version: 7.9Keywords: Regression, ZStream
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-3.10.0-1160.27.1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-08 22:31:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1951491    
Attachments:
Description Flags
dmesg actual
none
dmesg old
none
Xorg log actual
none
Xorg log old
none
addit.log
none
grubby
none
yum.log
none
messages
none
Putty log
none
3.10.0-1160.21.1 kernel + video=hyperv_fb:1600x1400 + ignore_loglevel console=ttyS0
none
hyperv_fb:1920X1080
none
3.10.0-1160.21.1.el7.x86_64 failed
none
3.10.0-1160.21.1.el7.x86_64 ok
none
Console redhat 8 none

Description Akemi Yagi 2021-03-22 21:44:24 UTC
* Description of problem (copied from https://bugs.centos.org/view.php?id=18117):

After updating to kernel version 3.10.0-1160.21.1.el7.x86_64 on a Hyper-V (Windows 10) VM, the OS will not boot. There appear to be error messages output but the display is affected (squashed) and they are illegible. Selecting another kernel from the grub menu allows the OS to boot.

* The following patch seems to have caused the issue:

[Hyper-V][RHEL-7.9]video: hyperv_fb: Fix the cache type when mapping the VRAM Edit (BZ#1908896)

kernel commit:
commit 5f1251a48c17b54939d7477305e39679a565382c
Author: Dexuan Cui <decui>
Date:   Tue Nov 17 16:03:05 2020 -0800

    video: hyperv_fb: Fix the cache type when mapping the VRAM

* The issue has been fixed by:

Stable kernel commit:
commit 452f087d2ff6decf298149e0bfd9fa5c212a636d
Author: Dexuan Cui <decui>
Date:   Sat Jan 9 14:53:58 2021 -0800

    video: hyperv_fb: Fix the mmap() regression for v5.4.y and older

Comment 3 Jamie Bainbridge 2021-03-22 23:05:00 UTC
Bumping this up in sev/prio and marking as Regression, as this potentially affects all RHEL 7 on Hyper-V, which is a large install base and not desirable to have broken.

Comment 8 HuijingHei 2021-03-24 03:51:39 UTC
Hi Akemi Yagi, thanks for taking the time to enter a bug report with us.

I am trying to reproduce the issue with 3.10.0-1160.21.1.el7.x86_64 in VM on Hyper-V host, but failed to reproduce on Windows server 2019 (17763-10.0-1-0.1757) or Windows server 2016 (14393-10.0-4-0.4169), perhaps it is related to specify host build, could you help to check the host build version on which you meet this problem?

To get host build, you can run 'dmesg | grep "Hyper-V Host Build"' on the vm

Thanks!

Comment 9 Yuxin Sun 2021-03-24 04:37:39 UTC
Doesn't see it with kernel-3.10.0-1160.21.1.el7.x86_64 on Azure
Hyper-V Host build: 18362-10.0-3-0.3216

Comment 10 Akemi Yagi 2021-03-24 09:18:51 UTC
(In reply to HuijingHei from comment #8)

> I am trying to reproduce the issue with 3.10.0-1160.21.1.el7.x86_64 in VM on
> Hyper-V host, but failed to reproduce on Windows server 2019
> (17763-10.0-1-0.1757) or Windows server 2016 (14393-10.0-4-0.4169), perhaps
> it is related to specify host build, could you help to check the host build
> version on which you meet this problem?
> 
> To get host build, you can run 'dmesg | grep "Hyper-V Host Build"' on the vm

Hi HuijingHei,

I will ask the original reporters of the issue (CentOS bug and RH discussion forum).

Comment 11 Marco Gregorini 2021-03-24 10:21:58 UTC
On my PC Dell Precison 5820, Windows 10 for Workstation, version 20H2, OS build 19042.870, this is the Hyper-V version:

[marco@redhat7 ~]$ dmesg | grep "Hyper-V Host Build"
[    0.000000] Hyper-V Host Build:19041-10.0-0-0.870

Marco

Comment 14 Akemi Yagi 2021-03-24 15:27:06 UTC
Hi HuijingHei,

Here are the replies from the original reporters:

(1) https://bugs.centos.org/view.php?id=18117#c38322

Edition: Windows 10 Business
Version: 20H2
OS build: 19042.867
Experience: Windows Feature Experience Pack 120.2212.551.0

There is something more going on, yesterday I spun up a fresh install of CentOS 7 on a new VM in the same host (with the intent of testing the plus kernel) and I was unable to reproduce the issue. It still reliably reproduces on the existing CentOS 7 installs however. Perhaps it is because the existing instances have a number of kernels installed?

(2) https://access.redhat.com/discussions/5895461#comment-2062041

On my PC Dell Precison 5820, Windows 10 for Workstation, version 20H2, OS build 19042.870, this is the Hyper-V version:

[marco@redhat7 ~]$ dmesg | grep "Hyper-V Host Build" [ 0.000000] Hyper-V Host Build:19041-10.0-0-0.870

Comment 15 Dexuan Cui 2021-03-24 23:57:20 UTC
I can't repro the issue with CentOS 7.9 (3.10.0-1160.21.1.el7.x86_64 #1 SMP Tue Mar 16 18:28:22 UTC 2021) on my local Win10 (Hyper-V Host Build:19041-10.0-0-0.867).

Comment 16 Dexuan Cui 2021-03-25 00:12:47 UTC
Can we ask the original bug reporters to share the /var/log/Xorg.0.log? They can boot up the VM using the old good kernel, and check /var/log/Xorg.0.log.old.

Comment 18 HuijingHei 2021-03-25 07:03:17 UTC
(In reply to Akemi Yagi from comment #14) 
> Here are the replies from the original reporters:
> 
> (1) https://bugs.centos.org/view.php?id=18117#c38322
> 
> Edition: Windows 10 Business
> Version: 20H2
> OS build: 19042.867
> Experience: Windows Feature Experience Pack 120.2212.551.0
> 
> There is something more going on, yesterday I spun up a fresh install of
> CentOS 7 on a new VM in the same host (with the intent of testing the plus
> kernel) and I was unable to reproduce the issue. It still reliably
> reproduces on the existing CentOS 7 installs however. Perhaps it is because
> the existing instances have a number of kernels installed?
> 
> (2) https://access.redhat.com/discussions/5895461#comment-2062041
> 
> On my PC Dell Precison 5820, Windows 10 for Workstation, version 20H2, OS
> build 19042.870, this is the Hyper-V version:
> 
> [marco@redhat7 ~]$ dmesg | grep "Hyper-V Host Build" [ 0.000000] Hyper-V
> Host Build:19041-10.0-0-0.870

Thanks Akemi and Marco for your info!

Can not reproduce with 3.10.0-1160.21.1.el7.x86_64 on my local Hyper-V Host (Build:19041-10.0-0-0.867), same result as comment #15. 
Could you help to try Dexuan's suggestion(comment #16) to share the failed boot log /var/log/Xorg.0.log.old and /var/log/dmesg.old? Thanks!

Comment 19 Marco Gregorini 2021-03-25 07:40:17 UTC
Created attachment 1766193 [details]
dmesg actual

Comment 20 Marco Gregorini 2021-03-25 07:41:28 UTC
Created attachment 1766194 [details]
dmesg old

Comment 21 Marco Gregorini 2021-03-25 07:42:17 UTC
Created attachment 1766195 [details]
Xorg log actual

Comment 22 Marco Gregorini 2021-03-25 07:43:06 UTC
Created attachment 1766196 [details]
Xorg log old

Comment 23 Marco Gregorini 2021-03-25 07:55:53 UTC
Hi Dexuan, I have  a Dell Precison 5820, Windows 10 Pro for Workstation, version 20H2, OS build 19042.870, Hyper-V Host Build:19041-10.0-0-0.870. I started the Redhat 7.9 VM using a backup vhdx file (Kernel: Linux 3.10.0-1160.15.2.el7.x86_64), I applied le the latest update (kernel.x86_64 0:3.10.0-1160.21.1.el7), restarted the VM, it was stuck. I turned off the VM, started with the kernel 3.10.0-1160.15.2.el7.x86_64, it was ok, I saved the the dmesg and Xorg file (current and old) and post as attachments above. 

Tell me if you need more information.

Marco

Comment 24 Dexuan Cui 2021-03-25 08:06:16 UTC
All the 4 log files (comment #19~22) show that 3.10.0-1160.15.2.el7.x86_64 is used and I can't find any error.

I remember people mentioned that the bug only repros with 3.10.0-1160.21.1 (but somehow HuijingHei and I can't repro it).

@Marco: Can you please check if you have some log files in /var/log/ that are generated with 3.10.0-1160.21.1?

Can you also enable the virtual serial console port for the VM and pass the "console=ttyS0" kernel parameter. This way you're able to check if there is any error from the serial console, whe the issue repros. You need a tool, e.g. Putty, to get the log messages from the serial console.

Comment 25 Dexuan Cui 2021-03-25 08:20:27 UTC
FYI: 
How to "Use Putty to connect Hyper-V Linux VM by serial console": https://capsl0cker.github.io/memo.html

The putty tool is here: https://www.chiark.greenend.org.uk/~sgtatham/putty/ (we need right click the program, then "run it as Administrator" to open the VM's virtual serial console)

Comment 26 Marco Gregorini 2021-03-25 08:37:45 UTC
Created attachment 1766212 [details]
addit.log

Comment 27 Marco Gregorini 2021-03-25 08:38:13 UTC
Created attachment 1766214 [details]
grubby

Comment 28 Marco Gregorini 2021-03-25 08:38:43 UTC
Created attachment 1766215 [details]
yum.log

Comment 29 Marco Gregorini 2021-03-25 08:39:30 UTC
Created attachment 1766216 [details]
messages

Comment 30 Marco Gregorini 2021-03-25 08:42:54 UTC
Hi Dexuan, I did this:

grep -rlw  "3.10.0-1160.21.1" /var/log
/var/log/audit/audit.log
/var/log/grubby
/var/log/yum.log
/var/log/messages

I attached the 4 files.

Marco

Comment 31 Dexuan Cui 2021-03-25 09:23:27 UTC
Thanks for the new logs, but unluckily these logs still show that the old kernel was running. I suspect the new kernel (3.10.0-1160.21.1) hung or panicked, so the VM was unable to save the messages in the /var/log/ folder. 

In this case, we need to check the log messages from the virtual serial console. Please refer to comment 24 and 25 to get the kernel messages, when the isue repros.  When you add the "console=ttyS0" kernel parameter, please also replace the "rhgb quiet" with "ignore_loglevel" to get more debug messages.

Comment 32 Marco Gregorini 2021-03-25 11:56:54 UTC
Created attachment 1766292 [details]
Putty log

Comment 33 Marco Gregorini 2021-03-25 11:58:07 UTC
Hi Dexuan, I modified the /etc/dafault/grub file with the 3.10.0-1160.15.2.el7.x86_64 kernel:

GRUB_CMDLINE_LINUX="rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap ignore_loglevel console=ttyS0"

I removed the "video=hyperv_fb:1600x1400" in the kernel config.

Then: 
grub2-mkconfig -o /boot/grub2/grub.cfg
systemctl reboot

Started the 3.10.0-1160.15.2.el7.x86_64 kernel.

On Windows:

PS C:\WINDOWS\system32> Set-VMComPort -VMName "Red Hat 7" -Path \\.\pipe\redhat7 -Number 1
PS C:\WINDOWS\system32> Get-VMComPort -VMName "Red Hat 7"

VMName    Name  Path
------    ----  ----
Red Hat 7 COM 1 \\.\pipe\redhat7
Red Hat 7 COM 2

Installed Putty x64 on Windows, run it as Administrator:

‘Connection type’ -> ‘Serial’
Serial line: \\.\pipe\redhat7

It opened the connection, and I could login with my account.

Then I stopped the VM, and started it with the 3.10.0-1160.21.1.el7.x86_64 kernel and it was just fine. I could connect with Putty and I saw the boot sequence, with no errors, I attach the log.

hostnamectl
   Static hostname: redhat7.local
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 7f58a54ed36b42e28ac0e213325cb5b2
           Boot ID: afa01bfdde3c4e60aff058898055d6fa
    Virtualization: microsoft
  Operating System: Red Hat Enterprise Linux
       CPE OS Name: cpe:/o:redhat:enterprise_linux:7.9:GA:server
            Kernel: Linux 3.10.0-1160.21.1.el7.x86_64
      Architecture: x86-64

The display with the wmconnect is just 1152x864, but it works. With XRDP I can get 3000X1500.

I hope this will help.

Marco

Comment 34 Marco Gregorini 2021-03-25 12:19:24 UTC
I think that the problem is the video=hyperv_fb:1600x1400 configuration.
You probably couldn't reproduce it because you doesn't have this setting.
Marco

Comment 35 Marco Gregorini 2021-03-25 12:35:15 UTC
The video=hyperv_fb setting works with 3.10.0-1160.15.2.el7.x86_64 kernel, but not with 3.10.0-1160.21.1.el7.x86_64 kernel.
Marco

Comment 36 Dexuan Cui 2021-03-25 19:36:30 UTC
(In reply to Marco Gregorini from comment #35)
> The video=hyperv_fb setting works with 3.10.0-1160.15.2.el7.x86_64 kernel,
> but not with 3.10.0-1160.21.1.el7.x86_64 kernel.
> Marco

Hi Marco, can you please re-collect the Putty log, but with the 3.10.0-1160.21.1 kernel + video=hyperv_fb:1600x1400 + ignore_loglevel console=ttyS0? Looks like this is the only combination with which you're able to reproduce the issue.

In my test, this combination still works just fine.

Note: Actually "video=hyperv_fb:1600x1400" is ignored (so the default 1152x864 is used) because the required video memory size is 1600*1400*32/8 / (1024.0*1024) ~= 8.54 MB, which is bigger than the 8MB VRAM size supported by the hyperv_fb driver in RHEL 7.9.

I'll be attaching my screenshot FYI, which shows 2 lines:
[    2.281338] hyperv_fb: Screen resolution option is out of range: skipped
[    2.281340] hyperv_fb: Screen resolution: 1152x864, Color depth: 32

Comment 37 Dexuan Cui 2021-03-25 19:39:58 UTC
Created attachment 1766404 [details]
3.10.0-1160.21.1 kernel + video=hyperv_fb:1600x1400 + ignore_loglevel console=ttyS0

Comment 38 HuijingHei 2021-03-26 02:32:48 UTC
Thanks Dexuan and Marco!

3.10.0-1160.21.1.el7.x86_64 and 'video=hyperv_fb:1600x1400' also works on my VM, and actually use 1152x864 instead

Comment 39 Marco Gregorini 2021-03-26 09:49:52 UTC
Hi Dexuan, I was wrong, I didn't set video=hyperv_fb to 1600X1400, but it was 1920X1080 (HD). I set it editing the start kernel menu (see the attached png file), then I removed it.

I changed GRUB_CMDLINE_LINUX:
GRUB_CMDLINE_LINUX="rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap ignore_loglevel console=ttyS0 video=hyperv_fb:1920X1080"
grub2-mkconfig -o /boot/grub2/grub.cfg
systemctl reboot

I started the 3.10.0-1160.15.2.el7.x86_64 kernel and the VM was fine, with 1920X1080 resulotion.

But starting the 3.10.0-1160.21.1.el7.x86_64 kernel did not works. The VM was stuck with the red signs on the top. I was able to record the output with putty, see the attached tex file ("3.10.0-1160.21.1.el7.x86_64 failed.txt").

I changed again GRUB_CMDLINE_LINUX:
GRUB_CMDLINE_LINUX="rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap ignore_loglevel console=ttyS0 video=hyperv_fb:1600x1400"
grub2-mkconfig -o /boot/grub2/grub.cfg
systemctl reboot

Both kernels start the VM with 1152x864 resulotion (see the attached file with the boot of 3.10.0-1160.21.1.el7.x86_64 kernel, "3.10.0-1160.21.1.el7.x86_64 ok.txt").

I hope this will help.
Marco

Comment 40 Marco Gregorini 2021-03-26 09:52:32 UTC
Created attachment 1766555 [details]
hyperv_fb:1920X1080

Comment 41 Marco Gregorini 2021-03-26 09:53:14 UTC
Created attachment 1766556 [details]
3.10.0-1160.21.1.el7.x86_64 failed

Comment 42 Marco Gregorini 2021-03-26 09:54:02 UTC
Created attachment 1766557 [details]
3.10.0-1160.21.1.el7.x86_64 ok

Comment 43 Marco Gregorini 2021-03-26 10:18:25 UTC
And (1920*1080*32)/8/(1024*1024) give 7,91015625, so it should works with 3.10.0-1160.21.1.el7.x86_64 kernel as it works with 3.10.0-1160.15.2.el7.x86_64 kernel.

So probably there is some bug in the 3.10.0-1160.21.1.el7.x86_64 kernel, I think just hyper-v related.

Marco

Comment 44 HuijingHei 2021-03-26 12:20:34 UTC
Thanks Marco!
Test with 'video=hyperv_fb:1920x1080' and 3.10.0-1160.21.1.el7.x86_64, vm start failed at 'hyperv_fb: Screen resolution: 1920x1080, Color depth: 32', the issue is not related to host version

Comment 48 Dexuan Cui 2021-03-26 19:33:43 UTC
Thanks, Marco! Now I'm able to repro the issue with 3.10.0-1160.15.2.el7.x86_64 + video=hyperv_fb:1920X1080.
The key is to use "video=hyperv_fb:1920X1080".

I got the below panic from the host's Event Viewer: Applications and Services Logs -> Microsoft -> Windows -> Hyper-V-worker -> Admin:

'decui-co79' has encountered a fatal error.  The guest operating system reported that it failed with the following error codes: ErrorCode0: 0x0, ErrorCode1: 0x0, ErrorCode2: 0x0, ErrorCode3: 0x0, ErrorCode4: 0x0.  If the problem persists, contact Product Support for the guest operating system.  (Virtual machine ID 82303BB4-3A05-42E3-8C1A-EE20A798F9E1)

Guest message:
<4>[    1.864470] RBP: ffff89b07546b618 R08: ffffffff99468420 R09: ffffb84000cc3000
<4>[    1.864470] R10: 0000000000000001 R11: 0000000000aaaaaa R12: ffff89b12eb96600
<4>[    1.864471] R13: ffff89b12eb965a0 R14: ffffb84000cc3004 R15: ffffb84000cc2400
<4>[    1.864472] FS:  00007fbd49ca88c0(0000) GS:ffff89b147c00000(0000) knlGS:0000000000000000
<4>[    1.864472] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[    1.864473] CR2: ffffb84000cc3000 CR3: 0000000035494000 CR4: 00000000003606f0
<4>[    1.864475] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[    1.864476] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[    1.864476] Call Trace:
<4>[    1.864480]  [<ffffffff98ffb7ce>] ? fb_set_var+0x20e/0x440
<4>[    1.864483]  [<ffffffffc0344a04>] hvfb_cfb_imageblit+0x24/0x90 [hyperv_fb]
<4>[    1.864485]  [<ffffffff9900b95d>] bit_putcs+0x31d/0x5a0
<4>[    1.864486]  [<ffffffffc034426d>] ? hvfb_ondemand_refresh_throttle+0xcd/0xe0 [hyperv_fb]
<4>[    1.864488]  [<ffffffff99005e09>] ? fbcon_clear_margins+0x69/0x90
<4>[    1.864489]  [<ffffffff99006d5b>] fbcon_putcs+0x12b/0x160
<4>[    1.864490]  [<ffffffff9900b640>] ? bit_cursor+0x6a0/0x6a0
<4>[    1.864492]  [<ffffffff9907dc44>] do_update_region+0x114/0x1a0
<4>[    1.864494]  [<ffffffff9908071e>] redraw_screen+0x1fe/0x270
<4>[    1.864495]  [<ffffffff99080c7a>] vc_do_resize+0x4ea/0x520
<4>[    1.864496]  [<ffffffff99080ccc>] vc_resize+0x1c/0x20
<4>[    1.864498]  [<ffffffff99009a0d>] fbcon_init+0x36d/0x580
<4>[    1.864499]  [<ffffffff9907e560>] visual_init+0xd0/0x130
<4>[    1.864500]  [<ffffffff99081099>] do_bind_con_driver+0x169/0x340
<4>[    1.864501]  [<ffffffff990817a9>] do_take_over_console+0x49/0x60
<4>[    1.864502]  [<ffffffff99004c53>] do_fbcon_takeover+0x63/0xd0
<4>[    1.864503]  [<ffffffff9900a73d>] fbcon_event_notify+0x61d/0x730
<4>[    1.864506]  [<ffffffff99390b6f>] notifier_call_chain+0x4f/0x70
<4>[    1.864508]  [<ffffffff98ccc15d>] __blocking_notifier_call_chain+0x4d/0x70
<4>[    1.864509]  [<ffffffff98ccc196>] blocking_notifier_call_chain+0x16/0x20
<4>[    1.864511]  [<ffffffff98ffafcb>] fb_notifier_call_chain+0x1b/0x20
<4>[    1.864512]  [<ffffffff98ffc276>] register_framebuffer+0x1f6/0x340
<4>[    1.864526]  [<ffffffffc03459e2>] hvfb_probe+0x512/0x803 [hyperv_fb]
<4>[    1.864530]  [<ffffffffc02a0b81>] vmbus_probe+0x41/0xa0 [hv_vmbus]
<4>[    1.864531]  [<ffffffff990bb6a5>] driver_probe_device+0xc5/0x3e0
<4>[    1.864532]  [<ffffffff990bbaa3>] __driver_attach+0x93/0xa0
<4>[    1.864534]  [<ffffffff990bba10>] ? __device_attach+0x50/0x50
<4>[    1.864535]  [<ffffffff990b9245>] bus_for_each_dev+0x75/0xc0
<4>[    1.864536]  [<ffffffff990bb01e>] driver_attach+0x1e/0x20
<4>[    1.864537]  [<ffffffff990baac0>] bus_add_driver+0x200/0x2d0
<4>[    1.864538]  [<ffffffff990bc134>] driver_register+0x64/0xf0
<4>[    1.864540]  [<ffffffffc02a0b36>] __vmbus_driver_register+0x76/0x80 [hv_vmbus]
<4>[    1.864541]  [<ffffffffc034a000>] ? 0xffffffffc0349fff
<4>[    1.864543]  [<ffffffffc034a021>] hvfb_drv_init+0x21/0x1000 [hyperv_fb]
<4>[    1.864545]  [<ffffffff98c0210a>] do_one_initcall+0xba/0x240
<4>[    1.864547]  [<ffffffff98d1e62a>] load_module+0x271a/0x2bb0
<4>[    1.864549]  [<ffffffff98fb4710>] ? ddebug_proc_write+0x100/0x100
<4>[    1.864551]  [<ffffffff98d1ebaf>] SyS_init_module+0xef/0x140
<4>[    1.864553]  [<ffffffff99395f92>] system_call_fastpath+0x25/0x2a
<4>[    1.864565] Code: ec b9 08 00 00 00 89 5d d0 eb 30 0f 1f 44 00 00 41 0f be 04 24 29 f9 4d 8d 71 04 d3 f8 44 21 d0 41 8b 1c 80 44 21 db 89 d8 31 f0 <41> 89 01 85 c9 75 06 49 83 c4 01 b1 08 4d 89 f1 83 ea 01 83 fa 
<1>[    1.864567] RIP  [<ffffffff99012443>] cfb_imageblit+0x4d3/0x510
<4>[    1.864567]  RSP <ffff89b07546b5a8>
<4>[    1.864567] CR2: ffffb84000cc3000
<4>[    1.864569] ---[ end trace f293fabe7364caa3 ]---
<0>[    1.864570] Kernel panic - not syncing: Fatal exception
<0>[    1.864602] Kernel Offset: 0x17c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Comment 49 Dexuan Cui 2021-03-26 19:36:44 UTC
Sorry for the mistake in my last reply -- I meant to say the issue repros with 3.10.0-1160.21.1, not .15.2

Actually the issue does not repro with 3.10.0-1160.15.2.el7.x86_64, so something must have changed between .15.2 and  .21.1.

Comment 50 Dexuan Cui 2021-03-26 19:41:18 UTC
Found the src code of the 2 kernels at https://vault.centos.org/7.9.2009/centosplus/Source/SPackages/
Trying to find the difference...

Comment 51 Dexuan Cui 2021-03-26 23:10:28 UTC
After I apply the patch 67e7cdb4829d ("video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host") to 3.10.0-1160.21.1 (we need to comment out the line "case VERSION_WIN10_V5:" in synthvid_connect_vsp(), because VERSION_WIN10_V5 is not supported in 3.10.0-1160.21.1), the issue is fixed. But I'm not sure how exactly the issue is caused when we don't have 67e7cdb4829d.

Comment 52 Dexuan Cui 2021-03-26 23:23:25 UTC
About the VRAM's cache type, please note:

For v5.5+, please use "video: hyperv_fb: Fix the cache type when mapping the VRAM" (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5f1251a48c17b54939d7477305e39679a565382c)

For v5.4 and older, please use the 2 patches (the first is a simple git-cherry-pick of the mainline patch, but unluckily it breaks Xorg, so I made the second patch to un-break it):

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=db49200b1dad3949fef14d0cf2aa426d879a7f16
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=9e60056b1f532520dae5333c24e2e2b944c929b7

To make the discussion easy, let me list all the patches involved here:

Patch A: 67e7cdb4829d ("video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host")
Patch B: d21987d709e8 ("video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver")

Patch C: 5f1251a48c17 ("video: hyperv_fb: Fix the cache type when mapping the VRAM")

Patch D1: db49200b1dad ("video: hyperv_fb: Fix the cache type when mapping the VRAM")  Note: Actually, D1 = C. 
Patch D2: 9e60056b1f53 ("video: hyperv_fb: Fix the mmap() regression for v5.4.y and older")

3.10.0-1160.21.1 takes patch B and C, and the issue happens. If it also takes A (with the line "case VERSION_WIN10_V5:" commented out), then the issue is fixed.

Note: for RHEL 8.x, can Red Hat also please make sure the kernel picks patch A + B + C together? If you pick up C without B, then Xorg is broken; if you don't want B, then please pick up D1 + D2 rather than C. If you pick up B without A, then I suspect RHEL 8 may have the same bug here.

Comment 53 Dexuan Cui 2021-03-27 02:17:04 UTC
I found out why 3.10.0-1160.21.1 fails in the case of hyperv_fb:1920x1080: because of patch B, info->screen_base points to a shadow VRAM buffer whose size is "screen_width * screen_height * screen_depth / 8" bytes:

static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info)
{
   ...

        dio_fb_size =
                screen_width * screen_height * screen_depth / 8;

  ...
        /* Allocate memory for deferred IO */
        par->dio_vp = vzalloc(round_up(dio_fb_size, PAGE_SIZE));
   ...
        info->fix.smem_len = dio_fb_size;
        info->screen_base = par->dio_vp;
        info->screen_size = dio_fb_size;

Note: at this time, screen_width and screen_height are still the initial values 1152 and 864, respectively. 


In hvfb_probe(), after hvfb_getmem() is called, hvfb_get_option() is called to update screen_width to 1920, and screen_height to 1080, and next the updated values are used:

        info->var.xres_virtual = info->var.xres = screen_width;
        info->var.yres_virtual = info->var.yres = screen_height;

Later, when the kernel framebuffer subsystem tries to access a pixel outside of the range 1152x864, the kernel tries to access a memory location outside of the buffer info->screen_base, and this causes a page fault:

<1>[    2.293602] BUG: unable to handle kernel paging request at ffffab6440dd9000
<1>[    2.293623] IP: [<ffffffffacc1e543>] cfb_imageblit+0x4d3/0x510
<4>[    2.293634] PGD 103170067 PUD 103171067 PMD 346b2067 PTE 0
<4>[    2.293639] Oops: 0002 [#1] SMP 
<4>[    2.293677] Modules linked in: ata_piix(+) crct10dif_pclmul crct10dif_common hyperv_fb(+) crc32c_intel libata serio_raw hv_vmbus floppy dm_mirror dm_region_hash dm_log dm_mod fuse
<4>[    2.293705] CPU: 0 PID: 269 Comm: systemd-udevd Not tainted 3.10.0 #1
<4>[    2.293708] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
<4>[    2.293710] task: ffff9b8ff4e9e2a0 ti: ffff9b8ff446c000 task.ti: ffff9b8ff446c000
<4>[    2.293717] RIP: 0010:[<ffffffffacc1e543>]  [<ffffffffacc1e543>] cfb_imageblit+0x4d3/0x510
<4>[    2.293719] RSP: 0018:ffff9b8ff446f5a8  EFLAGS: 00010286
<4>[    2.293720] RAX: 00000000ff000000 RBX: 0000000000000000 RCX: 0000000000000007
<4>[    2.293721] RDX: 000000000000047f RSI: 00000000ff000000 RDI: 0000000000000001
<4>[    2.293721] RBP: ffff9b8ff446f618 R08: ffffffffad069840 R09: ffffab6440dd9000
<4>[    2.293722] R10: 0000000000000001 R11: 0000000000aaaaaa R12: fff

To fix the bug, we can pick up patch A (I prefer this), or just apply the below change:


--- drivers/video/hyperv_fb.c.old       2021-03-26 19:09:14.694996600 -0700
+++ drivers/video/hyperv_fb.c   2021-03-26 19:09:37.602996600 -0700
@@ -956,13 +956,14 @@
                goto error1;
        }

+       hvfb_get_option(info);
+
        ret = hvfb_getmem(hdev, info);
        if (ret) {
                pr_err("No memory for framebuffer\n");
                goto error2;
        }

-       hvfb_get_option(info);
        pr_info("Screen resolution: %dx%d, Color depth: %d\n",
                screen_width, screen_height, screen_depth);

Comment 54 Marco Gregorini 2021-03-27 15:00:21 UTC
Hi Dexuan, thanks for your help, though I cannot find a way to apply the patch 67e7cdb4829d. I downloaded the source code of "video: hyperv_fb: Fix the cache type when mapping the VRAM", ad I will try to build it.

Regarding Redhat 8, I have "Red Hat Enterprise Linux release 8.3 (Ootpa)".

I set this kernelopts:
grub2-editenv - set "kernelopts=root=/dev/mapper/rhel_redhat8-root ro resume=/dev/mapper/rhel_redhat8-swap rd.lvm.lv=rhel_redhat8/root rd.lvm.lv=rhel_redhat8/swap ignore_level console=ttyS0 video=hyperv_fb:1920x1080"

On restart I can access the console with Putty, and the vmconnect show 1920x1080 resolution, with no error.

I attach the console output.

Thanks again, Marco

Comment 55 Marco Gregorini 2021-03-27 15:01:05 UTC
Created attachment 1766893 [details]
Console redhat 8

Comment 56 Dexuan Cui 2021-03-27 21:31:39 UTC
It's glad to know that RHEL 8 doesn't have the bug.

For RHEL 7.9, I suppose RedHat will integrate the fix soon, e.g. Patch A: 67e7cdb4829d ("video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host").

Comment 57 Akemi Yagi 2021-03-27 23:00:09 UTC
(In reply to Dexuan Cui from comment #56)
 
> For RHEL 7.9, I suppose RedHat will integrate the fix soon, e.g. Patch A:
> 67e7cdb4829d ("video: hyperv: hyperv_fb: Obtain screen resolution from
> Hyper-V host").

How about patch D2? C (=D1) is in RHEL 7, so wouldn't it need D2 to fix it?

Comment 58 Dexuan Cui 2021-03-27 23:19:57 UTC
(In reply to Akemi Yagi from comment #57)
> (In reply to Dexuan Cui from comment #56)
>  
> > For RHEL 7.9, I suppose RedHat will integrate the fix soon, e.g. Patch A:
> > 67e7cdb4829d ("video: hyperv: hyperv_fb: Obtain screen resolution from
> > Hyper-V host").
> 
> How about patch D2? C (=D1) is in RHEL 7, so wouldn't it need D2 to fix it?

If a kernel has patch B, then it does not need D2 (please refer to the changelog of pach D2 for the details).

If a kernel have B, then it should have patch A as well, otherwise this bug happens.

If a kernel neither has A nor B, then it can have patch D1 and D2 to fix the VRAM performace issue.

Comment 59 Marco Gregorini 2021-03-28 06:32:26 UTC
I hope Redhat engineers will let us know when it will be safe to download the patched kernel.

Thanks you all, Marco

Comment 60 Akemi Yagi 2021-03-28 23:46:49 UTC
(In reply to Dexuan Cui from comment #58)

> If a kernel has patch B, then it does not need D2 (please refer to the
> changelog of pach D2 for the details).
> 
> If a kernel have B, then it should have patch A as well, otherwise this bug
> happens.
> 
> If a kernel neither has A nor B, then it can have patch D1 and D2 to fix the
> VRAM performace issue.

The EL7 kernel 3.10.0-1160.21.1.el7 has the following patches applied:

- [video] hyperv_fb: Fix the cache type when mapping the VRAM (Mohammed Gamal) [1908896]
- [video] hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver (Mohammed Gamal) [1908896]

That is, the kernel has "Patch B" and "Patch C".

Therefore, I have added "Patch A" and rebuilt the kernel as a CentOSPlus kernel:

kernel-plus-3.10.0-1160.21.1.el7.centos.plus.bug18117.2.x86_64

It is available here:

https://people.centos.org/toracat/kernel/7/bugs/18117/

Anyone is welcome to give it a try. Please note that the packages are not signed and is provided for testing purposes only.

Comment 61 HuijingHei 2021-03-29 03:55:52 UTC
(In reply to Akemi Yagi from comment #60)
> (In reply to Dexuan Cui from comment #58)
> 
> > If a kernel has patch B, then it does not need D2 (please refer to the
> > changelog of pach D2 for the details).
> > 
> > If a kernel have B, then it should have patch A as well, otherwise this bug
> > happens.
> > 
> > If a kernel neither has A nor B, then it can have patch D1 and D2 to fix the
> > VRAM performace issue.
> 
> The EL7 kernel 3.10.0-1160.21.1.el7 has the following patches applied:
> 
> - [video] hyperv_fb: Fix the cache type when mapping the VRAM (Mohammed
> Gamal) [1908896]
> - [video] hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer
> driver (Mohammed Gamal) [1908896]
> 
> That is, the kernel has "Patch B" and "Patch C".
> 
> Therefore, I have added "Patch A" and rebuilt the kernel as a CentOSPlus
> kernel:
> 
> kernel-plus-3.10.0-1160.21.1.el7.centos.plus.bug18117.2.x86_64
> 
> It is available here:
> 
> https://people.centos.org/toracat/kernel/7/bugs/18117/

Thanks all for your effort! 
Test with 3.10.0-1160.21.1.el7.centos.plus.bug18117.2.x86_64, can start with 'video=hyperv_fb:1920x1080'. 

I have a question, if add "Patch A", does this mean it will support on host 'Set-VMVideo test1 -HorizontalResolution x -VerticalResolution x -ResolutionType Single'? Thanks!

Start gen2 vm (on host exec 'Set-VMVideo test1 -HorizontalResolution 7680 -VerticalResolution 4320 -ResolutionType Single'), after vm starts, get 'Screen resolution: 7680x4320'

Comment 62 Dexuan Cui 2021-03-29 04:54:32 UTC
(In reply to HuijingHei from comment #61)
> I have a question, if add "Patch A", does this mean it will support on host
> 'Set-VMVideo test1 -HorizontalResolution x -VerticalResolution x
> -ResolutionType Single'? Thanks!

Yes, with Patch A, Set-VMVideo is supposed to work, but for Gen-1 VM, the available VRAM size is only about 64MB (IMO this is a bug which even exists in the mainline. I just let the patch author know about this), so please make sure you don't set a too high resolution, e.g. I get the below error with 7680x4320:

[    2.468894] hv_vmbus: registering driver hyperv_fb
[    2.486380] hyperv_fb: Synthvid Version major 3, minor 5
[    2.486555] hyperv_fb: Screen resolution: 1920x1080, Color depth: 32
[    2.486561] hyperv_fb: Resource not available or (0x4000000 < 0x8000000)
[    2.486562] hyperv_fb: No memory for framebuffer
[    2.487018] hv_vmbus: probe failed for device 5620e0c7-8062-4dce-aeb7-520c7ef76171 (-12)
[    2.487023] hyperv_fb: probe of 5620e0c7-8062-4dce-aeb7-520c7ef76171 failed with error -12

This is because (7680 * 4320 * 32/8) / (1024.0*1024) = 126.5625, which > 64. 

I also tried 3840x4320, which worked fine. :-)

IMO Gen-2 doesn't have the 64MB VRAM size limit.
 
> Start gen2 vm (on host exec 'Set-VMVideo test1 -HorizontalResolution 7680
> -VerticalResolution 4320 -ResolutionType Single'), after vm starts, get
> 'Screen resolution: 7680x4320'

This is expected.

Comment 63 HuijingHei 2021-03-29 05:41:32 UTC
(In reply to Dexuan Cui from comment #62)
> (In reply to HuijingHei from comment #61)
> > I have a question, if add "Patch A", does this mean it will support on host
> > 'Set-VMVideo test1 -HorizontalResolution x -VerticalResolution x
> > -ResolutionType Single'? Thanks!
> 
> Yes, with Patch A, Set-VMVideo is supposed to work, but for Gen-1 VM, the
> available VRAM size is only about 64MB (IMO this is a bug which even exists
> in the mainline. I just let the patch author know about this), so please
> make sure you don't set a too high resolution, e.g. I get the below error
> with 7680x4320:
> 
> [    2.468894] hv_vmbus: registering driver hyperv_fb
> [    2.486380] hyperv_fb: Synthvid Version major 3, minor 5
> [    2.486555] hyperv_fb: Screen resolution: 1920x1080, Color depth: 32
> [    2.486561] hyperv_fb: Resource not available or (0x4000000 < 0x8000000)
> [    2.486562] hyperv_fb: No memory for framebuffer
> [    2.487018] hv_vmbus: probe failed for device
> 5620e0c7-8062-4dce-aeb7-520c7ef76171 (-12)
> [    2.487023] hyperv_fb: probe of 5620e0c7-8062-4dce-aeb7-520c7ef76171
> failed with error -12
> 
> This is because (7680 * 4320 * 32/8) / (1024.0*1024) = 126.5625, which > 64. 
> 
> I also tried 3840x4320, which worked fine. :-)
> 
> IMO Gen-2 doesn't have the 64MB VRAM size limit.
>  
> > Start gen2 vm (on host exec 'Set-VMVideo test1 -HorizontalResolution 7680
> > -VerticalResolution 4320 -ResolutionType Single'), after vm starts, get
> > 'Screen resolution: 7680x4320'
> 
> This is expected.

Thanks Dexuan for the confirmation!

Comment 64 Rick Barry 2021-03-29 16:02:28 UTC
Thanks for your help, everyone.

We plan to include the missing patch and push a fix out in an upcoming RHEL 7.9.z batch update. You can follow the current status in this BZ.

Comment 65 HuijingHei 2021-03-30 02:13:04 UTC
Thanks Rick! Add rhel-7.9.z? flag to review

Comment 72 HuijingHei 2021-04-08 03:25:26 UTC
(In reply to Dexuan Cui from comment #62)
> Yes, with Patch A, Set-VMVideo is supposed to work, but for Gen-1 VM, the
> available VRAM size is only about 64MB (IMO this is a bug which even exists
> in the mainline. I just let the patch author know about this), so please
> make sure you don't set a too high resolution, e.g. I get the below error
> with 7680x4320:
> 
> [    2.468894] hv_vmbus: registering driver hyperv_fb
> [    2.486380] hyperv_fb: Synthvid Version major 3, minor 5
> [    2.486555] hyperv_fb: Screen resolution: 1920x1080, Color depth: 32
> [    2.486561] hyperv_fb: Resource not available or (0x4000000 < 0x8000000)
> [    2.486562] hyperv_fb: No memory for framebuffer
> [    2.487018] hv_vmbus: probe failed for device
> 5620e0c7-8062-4dce-aeb7-520c7ef76171 (-12)
> [    2.487023] hyperv_fb: probe of 5620e0c7-8062-4dce-aeb7-520c7ef76171
> failed with error -12
> 
> This is because (7680 * 4320 * 32/8) / (1024.0*1024) = 126.5625, which > 64. 
> 
> I also tried 3840x4320, which worked fine. :-)
> 
> IMO Gen-2 doesn't have the 64MB VRAM size limit.
>  
> > Start gen2 vm (on host exec 'Set-VMVideo test1 -HorizontalResolution 7680
> > -VerticalResolution 4320 -ResolutionType Single'), after vm starts, get
> > 'Screen resolution: 7680x4320'
> 
> This is expected.

Hi Dexuan,

If I set video=hyperv_fb:3840x4320 in kernel parameter and restart guest, it does not make effect, but works using Set-VMVideo, is this by design? Thanks!

# dmesg | grep hyperv_fb
[    0.000000] Kernel command line: .... video=hyperv_fb:3840x4320
[    1.736969] hv_vmbus: registering driver hyperv_fb
[    1.739835] hyperv_fb: Synthvid Version major 3, minor 5
[    1.741653] hyperv_fb: Screen resolution option is out of range: skipped
[    1.743175] hyperv_fb: Screen resolution: 1024x768, Color depth: 32

Comment 74 Dexuan Cui 2021-04-08 10:00:54 UTC
(In reply to HuijingHei from comment #72)
> Hi Dexuan,
> 
> If I set video=hyperv_fb:3840x4320 in kernel parameter and restart guest, it
> does not make effect, but works using Set-VMVideo, is this by design? Thanks!
> 
> # dmesg | grep hyperv_fb
> [    0.000000] Kernel command line: .... video=hyperv_fb:3840x4320
> [    1.736969] hv_vmbus: registering driver hyperv_fb
> [    1.739835] hyperv_fb: Synthvid Version major 3, minor 5
> [    1.741653] hyperv_fb: Screen resolution option is out of range: skipped
> [    1.743175] hyperv_fb: Screen resolution: 1024x768, Color depth: 32

To make "video=hyperv_fb:3840x4320" work, we should make sure:
1. the kernel has "Patch A: 67e7cdb4829d ("video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host")".
2. "Get-VMVideo -VMName your_vm_name" should report a resolution >= 3840x4320. See https://docs.microsoft.com/en-us/powershell/module/hyper-v/set-vmvideo?view=windowsserver2019-ps:

-ResolutionType
Specifies the resolution type for the virtual machine display. The acceptable values for this parameter are:

Maximum. The input HorizontalResolution * VerticalResolution is the maximum supported resolution. All standard resolutions smaller than HorizontalResolution * VerticalResolution are also supported.
Single. The input HorizontalResolution * VerticalResolution is the only supported resolution.
Default. The supported resolutions are those in the list of standard resolutions. Input HorizontalResolution * VerticalResolution is ignored.

By default the max supported resolution for a VM should be 1920x1200. I guess this is why  "video=hyperv_fb:3840x4320" didn't work for you(?)

For the -ResolutionType parameter, we don't have to specify the "Single" (which forces the VM to only use that resolution specified) -- we can also use "-ResolutionType Maximum" (which means we specify the max supported resolution and then we can use video=hyperv_fb:AxB to use a resolution that's <= the one we specify by Set-VMVideo).

Comment 76 HuijingHei 2021-04-12 08:12:51 UTC
(In reply to Dexuan Cui from comment #74)
> 
> To make "video=hyperv_fb:3840x4320" work, we should make sure:
> 1. the kernel has "Patch A: 67e7cdb4829d ("video: hyperv: hyperv_fb: Obtain
> screen resolution from Hyper-V host")".
> 2. "Get-VMVideo -VMName your_vm_name" should report a resolution >=
> 3840x4320. See
> https://docs.microsoft.com/en-us/powershell/module/hyper-v/set-
> vmvideo?view=windowsserver2019-ps:
> 
> -ResolutionType
> Specifies the resolution type for the virtual machine display. The
> acceptable values for this parameter are:
> 
> Maximum. The input HorizontalResolution * VerticalResolution is the maximum
> supported resolution. All standard resolutions smaller than
> HorizontalResolution * VerticalResolution are also supported.
> Single. The input HorizontalResolution * VerticalResolution is the only
> supported resolution.
> Default. The supported resolutions are those in the list of standard
> resolutions. Input HorizontalResolution * VerticalResolution is ignored.
> 
> By default the max supported resolution for a VM should be 1920x1200. I
> guess this is why  "video=hyperv_fb:3840x4320" didn't work for you(?)

Yes, on host get VM default resolution is 1920x1200, change Maximum resolution to 3840x4320 with "Set-VMVideo RHEL-8.4-GEN1-B -HorizontalResolution 3840 -VerticalResolution 4320 -ResolutionType Maximum", VM can start with "video=hyperv_fb:3840x4320"
 
> For the -ResolutionType parameter, we don't have to specify the "Single"
> (which forces the VM to only use that resolution specified) -- we can also
> use "-ResolutionType Maximum" (which means we specify the max supported
> resolution and then we can use video=hyperv_fb:AxB to use a resolution
> that's <= the one we specify by Set-VMVideo).

Thanks Dexuan for you info!
Create bug 1948442 to track the gen1 VM issue with 7680x4320 in comment #62

Comment 77 Marco Gregorini 2021-04-12 12:36:03 UTC
Hi HuijingHei, can you make the bug 1948442 public, or should it remain private? Thanks Marco

Comment 78 HuijingHei 2021-04-13 03:03:32 UTC
(In reply to Marco Gregorini from comment #77)
> can you make the bug 1948442 public, or should it remain
> private? Thanks Marco

Hi Macro, bug 1948442 is public now, contact to me if you need other help. Thanks!

Comment 79 Marco Gregorini 2021-04-13 07:00:50 UTC
Thanks HuijingHei, Marco

Comment 88 HuijingHei 2021-04-30 02:28:05 UTC
Verify passed with 3.10.0-1160.27.1.el7.x86_64, VM works with video=hyperv_fb:1920x1080

Comment 94 errata-xmlrpc 2021-06-08 22:31:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2314