Bug 1210857 - Very slow boot on multicore
Summary: Very slow boot on multicore
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 22
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: F22FinalFreezeException
TreeView+ depends on / blocked
 
Reported: 2015-04-10 17:52 UTC by Zeeshan Ali
Modified: 2015-05-04 19:19 UTC (History)
17 users (show)

Fixed In Version: kernel-4.0.1-300.fc22
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-05-03 17:22:33 UTC
Type: Bug


Attachments (Terms of Use)
Screenshot of `systemd-analyze blame` w/o SMP (523.30 KB, image/png)
2015-04-10 17:52 UTC, Zeeshan Ali
no flags Details
Screenshot of `systemd-analyze blame` with SMP (525.56 KB, image/png)
2015-04-10 17:53 UTC, Zeeshan Ali
no flags Details

Description Zeeshan Ali 2015-04-10 17:52:35 UTC
Created attachment 1013220 [details]
Screenshot of `systemd-analyze blame` w/o SMP

Description of problem: If I launch live ISO[1] with this commandline:

-------------
/usr/bin/qemu-system-x86_64 -machine accel=kvm \
                            -m 2048 \
                            -drive file=/home/zeenix/Downloads/Fedora-Live-Workstation-x86_64-22_Beta-TC8.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw \
                            -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1
-------------

The boot process till the desktop loads and I'm presented with the "Try or Install" dialog, takes about 35 seconds or less.

Now if I try to take advantage of multicore by adding "-smp 4,sockets=1,cores=2,threads=2" (which matches by host CPU configuration) to this commandline, I get much slower boot (1 minute 6 seconds). I can't be 100% sure of this but the system just feels very sluggish in general.

I tried booting on bare-metal on the same host and I didn't see any better performance so my guess is that Fedora 22 live ISOs have some sort of problem with multicore.

Version-Release number of selected component (if applicable): My host is F21 but I started doing these tests when Matthias told me about this issue on his F22 host so its unlikely to be specific to qemu on F21 either.

How reproducible: 100%

Additional info:

[1] http://dl.fedoraproject.org/pub/alt/stage/22_Beta_TC8/Workstation/x86_64/iso/Fedora-Live-Workstation-x86_64-22_Beta-TC8.iso

Comment 1 Zeeshan Ali 2015-04-10 17:53:53 UTC
Created attachment 1013221 [details]
Screenshot of `systemd-analyze blame` with SMP

Comment 2 Josh Boyer 2015-04-10 19:05:17 UTC
So you have two different host machines with two different kernel versions but both exhibit the same problem when using SMP in a guest?

Can you get the dmesg output from both kinds of boots from the guest?

Comment 3 Josh Boyer 2015-04-10 19:07:19 UTC
Oh, you said you also have the same problem on a bare-metal machine?  What are the details of that machine?

Comment 4 Zeeshan Ali 2015-04-10 20:54:57 UTC
(In reply to Josh Boyer from comment #2)
> So you have two different host machines with two different kernel versions
> but both exhibit the same problem when using SMP in a guest?

I only tested on one machine but Matthias and Christophe can reproduce this too so I think its not much dependent on some specific hardware.

> Can you get the dmesg output from both kinds of boots from the guest?

I could try, although it'd be difficult since libvirt does something internally to make the network in VM work and I can't just pass a param to qemu to set it up (or at least I'll need to refer to docs to find out how to do that).

(In reply to Josh Boyer from comment #3)
> Oh, you said you also have the same problem on a bare-metal machine?  What
> are the details of that machine?

Yeah, the boot time was pretty similar on bare-metal on the host machine. Here are its details:

-------------
$ cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 69
model name	: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz
stepping	: 1
microcode	: 0x1c
cpu MHz		: 2787.117
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs		:
bogomips	: 5387.73
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 69
model name	: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz
stepping	: 1
microcode	: 0x1c
cpu MHz		: 2438.121
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 2
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs		:
bogomips	: 5387.73
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 69
model name	: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz
stepping	: 1
microcode	: 0x1c
cpu MHz		: 2343.832
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 1
cpu cores	: 2
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs		:
bogomips	: 5387.73
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 69
model name	: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz
stepping	: 1
microcode	: 0x1c
cpu MHz		: 2423.671
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 1
cpu cores	: 2
apicid		: 3
initial apicid	: 3
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs		:
bogomips	: 5387.73
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

$ cat /proc/meminfo 
MemTotal:       11989160 kB
MemFree:         1912832 kB
MemAvailable:    8038164 kB
Buffers:          424644 kB
Cached:          6051816 kB
SwapCached:            0 kB
Active:          6697760 kB
Inactive:        2818392 kB
Active(anon):    3081944 kB
Inactive(anon):   419460 kB
Active(file):    3615816 kB
Inactive(file):  2398932 kB
Unevictable:          48 kB
Mlocked:              48 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:               236 kB
Writeback:             0 kB
AnonPages:       3039728 kB
Mapped:           484524 kB
Shmem:            461724 kB
Slab:             403348 kB
SReclaimable:     364000 kB
SUnreclaim:        39348 kB
KernelStack:        8672 kB
PageTables:        56436 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     5994580 kB
Committed_AS:   10062328 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      402616 kB
VmallocChunk:   34359284892 kB
HardwareCorrupted:     0 kB
AnonHugePages:   1466368 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      159644 kB
DirectMap2M:     8957952 kB
DirectMap1G:     4194304 kB
----------------------

But as I said above, this is reproducible on other machines (unless they both tried on the same laptop as I have: Lenovo T440s). Could you please try to reproduce with those Qemu commandlines I provided?

Comment 5 Josh Boyer 2015-04-10 21:53:33 UTC
I tried the qemu command and see slowness.  Doing a side by side comparison of the journal after a boot for SMP and non-SMP shows that the boot is virtually identical in timing until userspace starts, so I guess that's good.

Once userspace starts, the SMP guest seems to have a varied order of things and the timings are indeed slower.  By the time you can get to a shell and look at the top output, you can see why.  The load avg is in the high 40s.  There are over 200 kworker threads sitting there waiting to run.  Over time the load average drops and the system seems just as usable.  The kworker threads are high in number on the non-SMP setup as well, but the load avg is down to 2.x by the time you can get to a shell.

If you boot a 4 CPU guest but use maxcpus=2 on the command line, the load average is still high but the timings are faster.  So whatever it is seems to scale (poorly) with the number of CPUs available.  I tried booting this via virt-manager instead of raw qemu and got the same results, so it doesn't seem dependent on the virtualized hardware or in-kernel drivers for this.

My best guess at this point is that systemd or something it starts is keying off of the number of available CPUs and starting a massive amount of work in the SMP cases.  I have an installed KVM guest running rawhide userspace that boots just fine here, so I'm guessing it is something in the live environment.  A macbook running F22 kernel and userspace is similarly fine.  I'd really like to test with either a 3.19.3 kernel and the latest 4.0-rc7 kernel to see if the kernel version really matters, but I'm not sure how to spin an otherwise identical live image with those included.

When you say this is reproducible on the baremetal machine, did you mean if you boot the live image on the machine or after install?

Comment 6 Zeeshan Ali 2015-04-10 22:51:46 UTC
(In reply to Josh Boyer from comment #5)
> I tried the qemu command and see slowness.

Glad you can reproduce too. :)
 
> When you say this is reproducible on the baremetal machine, did you mean if
> you boot the live image on the machine or after install?

booting of live ISO off a usb stick. I also tried installing in a VM from the live image and seems the installed system is peforming well so its something specific to live setup.

Comment 7 Justin M. Forbes 2015-04-13 14:40:31 UTC
Is the process to create a test compose locally documented anywhere? I haven't actually done one in a few years now. It would be good to be able to test this with a bootchart setup. Since the problem goes away on install, I will need to do a compose.

Comment 8 Justin M. Forbes 2015-04-15 21:33:32 UTC
Okay, after a lot of testing, this is a kernel issue, and it was introduced in kernel-3.20.0-0.rc0.git5.1.fc22. I will look at figuring out the actual culprit tomorrow.

Comment 9 Zeeshan Ali 2015-04-17 15:34:59 UTC
(In reply to Justin M. Forbes from comment #8)
> Okay, after a lot of testing, this is a kernel issue, and it was introduced
> in kernel-3.20.0-0.rc0.git5.1.fc22. I will look at figuring out the actual
> culprit tomorrow.

Thanks so much for digging out the culprit!

Comment 10 Justin M. Forbes 2015-04-23 21:07:10 UTC
The culprit is specifically the loop driver block-mq conversion. I can revert these patches, but it is not the correct solution.  I have tested with those reverted though and performance returns to normal.  I am looking for a better solution to the scalability issue here but worse case we can revert them for the F22 release spin if necessary until a more suitable upstream solution is found.

Comment 11 Fedora Blocker Bugs Application 2015-04-27 14:13:43 UTC
Proposed as a Freeze Exception for 22-final by Fedora user jforbes using the blocker tracking app because:

 This bug seriously impacts performance of booting the live images. We have two tested solutions, we are just waiting for clarity on upstream direction.

Comment 12 DO NOT USE account not monitored (old adamwill) 2015-04-29 21:37:25 UTC
I think it affects more than just booting, FWIW - launching anaconda from the Welcome screen takes ~30 secs with two CPUs, ~15 secs with one. Though I suppose it might be caused just by leftover work from boot still happening in the background. This caused one of our openQA tests to time out too early...

Thanks for looking into this.

I'd be +1 FE for this in the first week or so of freeze.

Comment 13 Fedora Update System 2015-04-30 12:25:03 UTC
kernel-4.0.1-300.fc22 has been submitted as an update for Fedora 22.
https://admin.fedoraproject.org/updates/kernel-4.0.1-300.fc22

Comment 14 Fedora Update System 2015-05-01 16:54:29 UTC
Package kernel-4.0.1-300.fc22:
* should fix your issue,
* was pushed to the Fedora 22 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-4.0.1-300.fc22'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-7371/kernel-4.0.1-300.fc22
then log in and leave karma (feedback).

Comment 15 Fedora Update System 2015-05-03 17:22:33 UTC
kernel-4.0.1-300.fc22 has been pushed to the Fedora 22 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.