Bug 1588150 - [abrt] kthread_create_worker_on_cpu: WARNING: CPU: 0 PID: 200 at drivers/hv/hv_balloon.c:1224 balloon_up+0x347/0x420 [hv_balloon] [NEEDINFO]
Summary: [abrt] kthread_create_worker_on_cpu: WARNING: CPU: 0 PID: 200 at drivers/hv/h...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 28
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL: https://retrace.fedoraproject.org/faf...
Whiteboard: abrt_hash:0edd52bf0d4cdead4b5a15175aa...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-06 19:29 UTC by Tom Delany
Modified: 2019-03-22 14:03 UTC (History)
24 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-08-29 15:07:25 UTC
Type: ---
Embargoed:
jforbes: needinfo?


Attachments (Terms of Use)
File: dmesg (39.00 KB, text/plain)
2018-06-06 19:29 UTC, Tom Delany
no flags Details

Description Tom Delany 2018-06-06 19:29:28 UTC
Description of problem:
Occurred on reboot after upgrade from Fedora 27 to Fedora 28.

Additional info:
reporter:       libreport-2.9.5
WARNING: CPU: 0 PID: 200 at drivers/hv/hv_balloon.c:1224 balloon_up+0x347/0x420 [hv_balloon]
Modules linked in: vfat fat crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_rapl_perf hv_utils hv_balloon ptp pps_core joydev hv_storvsc scsi_transport_fc serio_raw hv_netvsc hid_hyperv hyperv_fb hyperv_keyboard crc32c_intel hv_vmbus
CPU: 0 PID: 200 Comm: kworker/0:2 Not tainted 4.16.13-200.fc27.x86_64 #1
Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v3.0 04/10/2018
Workqueue: events balloon_up [hv_balloon]
RIP: 0010:balloon_up+0x347/0x420 [hv_balloon]
RSP: 0018:ffffa54bc10bbe48 EFLAGS: 00010206
RAX: 0000000000007d35 RBX: ffffffffc04a3628 RCX: ffff96f278e204a0
RDX: ffff96f278e204a0 RSI: 0000000000000000 RDI: ffffffffc04a3628
RBP: ffff96f278e20480 R08: ffff96f278e215a0 R09: 0000000000000000
R10: 0000000000000000 R11: 00000000000f4240 R12: ffff96f278e26500
R13: 0000000000000000 R14: ffff96f26fd16cc0 R15: ffffffffc04a3630
FS:  0000000000000000(0000) GS:ffff96f278e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000559943bbb268 CR3: 000000001220a006 CR4: 00000000003606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 process_one_work+0x175/0x360
 worker_thread+0x2e/0x380
 ? process_one_work+0x360/0x360
 kthread+0x113/0x130
 ? kthread_create_worker_on_cpu+0x70/0x70
 ? do_syscall_64+0x74/0x180
 ? SyS_exit_group+0x10/0x10
 ret_from_fork+0x35/0x40
Code: 44 24 08 00 e9 49 ff ff ff 8b 0d 95 25 00 00 8b 54 24 04 48 c7 c6 18 22 4a c0 48 c7 c7 b0 31 4a c0 e8 0e eb fb c2 e9 13 ff ff ff <0f> 0b e9 da fc ff ff 48 c7 c7 50 22 4a c0 31 ed 48 c7 c3 10 36 

Potential duplicate: bug 1168002

Comment 1 Tom Delany 2018-06-06 19:29:40 UTC
Created attachment 1448416 [details]
File: dmesg

Comment 2 Vitaly Kuznetsov 2018-06-11 11:31:17 UTC
K. Y., Dexuan,

here we see the following warning firing:

        /* The host balloons pages in 2M granularity. */
        WARN_ON_ONCE(num_pages % PAGES_IN_2M != 0);

previously, we thought this should never happen.

Comment 3 Dexuan Cui 2018-06-13 23:38:50 UTC
Back in 2014, I added the WARN_ON_ONCE() because I was told the host always does ballooning in 2MB, and at that time IIRC the guest hv_balloon driver's code had this assumption -- the code could go wrong if the assumption is not true.

But I'm not sure if the latest version of the guest hv_balloon code still has this assumption -- I haven't read the driver's code for quite some time. :-)

If there is no such assumption in the guest any more, I think we can remove the WARN_ON_ONCE().

I'm trying to reproduce the issue with Fedora 28:
https://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git/log/?h=f28

Is there any special workload I should run? I'm asking because it's my first time to be told that this warning can be triggered.

BTW, as I checked, f28 is missing some latest fixes from you:

git log  --oneline upstream-fedora/f28..next-20180613  -- drivers/hv/hv_balloon.c
cf21be9 hv_balloon: trace post_status
bba072d1 hv_balloon: fix bugs in num_pages_onlined accounting
4f098af hv_balloon: simplify hv_online_page()/hv_page_online_one()
223e1e4 hv_balloon: fix printk loglevel

But I guess these missing fixed can't resolve this bug.

Comment 4 Dexuan Cui 2018-06-13 23:43:43 UTC
BTW, can the original bug report Tom provide the details about the VM and the host, e.g. how many vCPUs does the VM have? How much memory is initially configured? What's the host version (you can get this info from the VM by running dmesg |grep "Hyper-V Host Build" ? Any easy and consistent way to trigger the bug?

Comment 5 Parag Warudkar 2018-06-17 04:05:26 UTC
(In reply to Dexuan Cui from comment #4)
> BTW, can the original bug report Tom provide the details about the VM and
> the host, e.g. how many vCPUs does the VM have? How much memory is initially
> configured? What's the host version (you can get this info from the VM by
> running dmesg |grep "Hyper-V Host Build" ? Any easy and consistent way to
> trigger the bug?

Given as this happens regularly for my Fedora 28 VM running on NUC hardware here are a few details -

Host: i5-4250U/16GB RAM, Windows 10 Pro 1803 + Latest updates (happening since 1503 version at least)
VM : 
2 Cores Min Ram 1024 | Max RAM 8192 | Dynamic Memory Enabled
VM Config version 8.2 Generation 2

$dmesg |grep "Hyper-V Host Build"
[    0.000000] Hyper-V Host Build:17134-10.0-0-0.112

I don't have to really do anything in particular to trigger this issue - couple days of uptime with little activity (SSH/few commands here and there) will trigger it reliably.

Comment 6 Dexuan Cui 2018-06-17 16:35:53 UTC
Thanks for the detailed info! I'll try to reproduce it first.

Comment 7 Parag Warudkar 2018-06-17 23:52:43 UTC
I also have another VM running on that same host in parallel - it also has dynamic memory enabled, the 2nd VM is Windows server - RAM it can use: 8192MB, Min RAM 1024MB, Max RAM 8192MB. Just in case memory pressure helps trigger ballooning / shrinking faster.

Comment 8 Justin M. Forbes 2018-07-23 14:59:01 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 28 kernel bugs.

Fedora 28 has now been rebased to 4.17.7-200.fc28.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 9 Justin M. Forbes 2018-08-29 15:07:25 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 5 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Comment 10 Edgar ivan naranjo 2019-03-22 14:03:21 UTC
Description of problem:
Fue repentino .. al iniciar el computador despues de una actualizacion.

Version-Release number of selected component:
kernel-core-4.20.16-100.fc28

Additional info:
reporter:       libreport-2.9.5
cmdline:        BOOT_IMAGE=/vmlinuz-4.20.16-100.fc28.x86_64 root=/dev/mapper/fedora_localhost--live-root ro resume=/dev/mapper/fedora_localhost--live-swap rd.lvm.lv=fedora_localhost-live/root rd.lvm.lv=fedora_localhost-live/swap rhgb quiet LANG=es_CO.UTF-8
crash_function: pwq_unbound_release_workfn
kernel:         4.20.16-100.fc28.x86_64
runlevel:       N 5
type:           Kerneloops

Truncated backtrace:
WARNING: CPU: 0 PID: 1752 at net/mac80211/iface.c:1323 ieee80211_iface_work+0x2ec/0x350 [mac80211]
Modules linked in: fuse rfcomm ccm xt_CHECKSUM ipt_MASQUERADE tun nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack devlink ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep sunrpc arc4 ath9k ath9k_common ath9k_hw intel_rapl x86_pkg_temp_thermal snd_hda_codec_hdmi intel_powerclamp snd_hda_codec_realtek coretemp mac80211 kvm_intel snd_hda_codec_generic snd_hda_intel kvm snd_hda_codec ath irqbypass crct10dif_pclmul ath3k crc32_pclmul btusb snd_hda_core iTCO_wdt uvcvideo iTCO_vendor_support cfg80211 btrtl btbcm btintel ghash_clmulni_intel videobuf2_vmalloc bluetooth videobuf2_memops videobuf2_v4l2 intel_cstate videobuf2_common snd_hwdep intel_uncore snd_seq
 videodev intel_rapl_perf snd_seq_device rtsx_usb_ms snd_pcm memstick media joydev acer_wmi mei_me sparse_keymap wmi_bmof i2c_i801 ecdh_generic snd_timer mei snd rfkill lpc_ich soundcore pcc_cpufreq i915 rtsx_usb_sdmmc mmc_core i2c_algo_bit drm_kms_helper crc32c_intel drm serio_raw rtsx_usb video wmi
CPU: 0 PID: 1752 Comm: kworker/u16:48 Not tainted 4.20.16-100.fc28.x86_64 #1
Hardware name: Acer Aspire S3-391/Aspire S3-391, BIOS V1.17 05/24/2012
Workqueue: phy0 ieee80211_iface_work [mac80211]
RIP: 0010:ieee80211_iface_work+0x2ec/0x350 [mac80211]
Code: ef e8 08 9f ff ff e9 84 fe ff ff 48 63 4c 24 08 4c 89 f2 48 89 c6 48 89 ef e8 80 be ff ff e9 6c fe ff ff 0f 0b e9 8b fe ff ff <0f> 0b e9 8f fe ff ff 4c 89 fe 4c 89 ef e8 62 c7 04 00 e9 ab fd ff
RSP: 0000:ffffb1d181b33e58 EFLAGS: 00010246
RAX: 0000000000004288 RBX: ffff93944696cf00 RCX: 0000000000000088
RDX: ffff93944696cf20 RSI: 0000000000000246 RDI: 0000000000000246
RBP: ffff939449148780 R08: 0000000000000000 R09: 0000000030796870
R10: 8080808080808080 R11: 0000000000000006 R12: ffff93944696cf20
R13: ffff93944696c8c0 R14: ffff93938ba22af0 R15: ffff93939a697f00
FS:  0000000000000000(0000) GS:ffff93944b200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000043a0000 CR3: 0000000035550004 CR4: 00000000000606f0
Call Trace:
 process_one_work+0x1a1/0x3a0
 worker_thread+0x30/0x380
 ? pwq_unbound_release_workfn+0xd0/0xd0
 kthread+0x112/0x130
 ? kthread_create_on_node+0x60/0x60
 ret_from_fork+0x35/0x40


Note You need to log in before you can comment on or make changes to this bug.