690930 – microcode_ctl loops, impossible to boot

Bug 690930 - microcode_ctl loops, impossible to boot

Summary: microcode_ctl loops, impossible to boot

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	microcode_ctl
Sub Component:
Version:	rawhide
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Anton Arapov
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (4):	690928 706611 715118 717723 (view as bug list)
Depends On:
Blocks:	F16Alphas390x F16Alphappc
TreeView+	depends on / blocked

Reported:	2011-03-25 20:34 UTC by Pete Zaitcev
Modified:	2014-06-18 08:03 UTC (History)
CC List:	28 users (show)
Fixed In Version:	microcode_ctl-1.17-18.fc16
Clone Of:
Environment:
Last Closed:	2011-08-17 00:52:06 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
intel microcode.dat -> ucode converter (3.14 KB, text/plain) 2011-07-24 23:03 UTC, Kay Sievers	no flags	Details
intel microcode.dat -> ucode converter (3.40 KB, text/plain) 2011-07-30 14:13 UTC, Kay Sievers	no flags	Details
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	722785	0	unspecified	CLOSED	No boot-time microcode update visible after microcode_ctl 1.17-15 update	2021-02-22 00:41:40 UTC

Internal Links: 722785

Description Pete Zaitcev 2011-03-25 20:34:49 UTC

Description of problem:

The microcode_ctl loops, screen fills with the message
"CPU architecture 15 not supported" (approximately).

Version-Release number of selected component (if applicable):

kernel-2.6.39-0.rc0.git11.0.fc16.x86_64

How reproducible:

100% on given hardware

Steps to Reproduce:
1. Install kernel and reboot
  
Actual results:

Endless loop, system does not boot.

Expected results:

System boots.

Additional info:

Kernel kernel-2.6.38-0.rc4.git0.1.fc15.x86_64 works fine.

[zaitcev@niphredil ~]$ more /proc/cpuinfo 
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 76
model name      : AMD Turion(tm) 64 Mobile Technology MK-36
stepping        : 2
cpu MHz         : 800.000
cache size      : 512 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt rdtscp lm 3dnowex
t 3dnow up rep_good nopl extd_apicid pni cx16 lahf_lm svm extapic cr8_legacy
bogomips        : 1595.96
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

Guys, this CPU is not that old! It's a 64-bit Opteron for crying out loud.
You cannot just bump it off supported list like that.

Comment 1 Andre Robatino 2011-03-25 21:10:04 UTC

The Version should be "rawhide", not "15", since it's a Rawhide (F16) kernel. I see the same thing on Rawhide with the latest 2.6.39 kernel, but not the earlier ones (which are all 2.6.38).

Comment 2 Andre Robatino 2011-03-25 21:11:52 UTC

Forgot to mention that for me, in Rawhide, it eventually (a few minutes) does come up in graphical mode, although dmesg shows the above error message continuing.

Comment 3 Andre Robatino 2011-03-25 21:50:09 UTC

After today's Rawhide updates, including microcode_ctl-1.17-13.fc16, the messages appear during initial bootup, but stop before X comes up.

Comment 4 Pete Zaitcev 2011-03-26 00:35:48 UTC

Yeah, weird. Usually after the branch systems stay with the fork, not trunk.
This must the that "nonstop Rawhide" we heard about. Changing to Rawhide.

Sorry, I did not know it would eventually boot after a few minutes.
I thought I waited that long.

Comment 5 John Ellson 2011-04-13 21:17:26 UTC

Same looping problem for me with all 2.6.39 kernels (2.6.38-0.rc7.git2.3 is the last kernel that works for me).   Both i686 and x86_64 kernels fail the same way.

The looping message on mine claims "CPU architecture 6 not supported"

Removing the microcode_ctl stops the looping -- but then there are still problems because the network doesn't come up.

This is on a kvm virtual host:

[root@fc16-64 ~]# cat /proc/cpuinfo
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 6
model		: 2
model name	: QEMU Virtual CPU version 0.13.0
stepping	: 3
cpu MHz		: 3400.026
cache size	: 512 KB
fpu		: yes
fpu_exception	: yes
cpuid level	: 4
wp		: yes
flags		: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm up nopl pni cx16 popcnt hypervisor lahf_lm abm sse4a
bogomips	: 6800.05
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

Comment 6 John Ellson 2011-04-14 13:05:25 UTC

See also: Bug #690928   -- possible dup

Comment 7 John Ellson 2011-04-14 13:13:57 UTC

See also: Bug #694390   -- possible dup

Comment 8 Stefan Krüger 2011-04-26 18:40:24 UTC

Same problem here with F15 Beta (both x86_64 and i686)

$ cat /proc/cpuinfo (from F14)

processor       : 0
vendor_id       : CentaurHauls
cpu family      : 6
model           : 15
model name      : VIA Nano U3300@1200MHz
stepping        : 10
cpu MHz         : 1196.990
cache size      : 1024 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush acpi mmx fxsr sse sse2 ss tm syscall nx lm constant_tsc up rep_good pni monitor vmx est tm2 ssse3 cx16 xtpr sse4_1 popcnt rng rng_en ace ace_en ace2 phe phe_en pmm pmm_en lahf_lm ida
bogomips        : 2393.98
clflush size    : 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual
power management:

When you wait a while, the F15 Live ISO actually boots into GNOME despite all the modprobe spam, but still keeps trying to to load the microcode.ko kernel module after that and thus floods the console with:

microcode: no support or this CPU vendor

This *did not* stop while I was using the Live FS, so I guess this would go on forever, i.e. spawning

/sbin/microcode_ctl -Ou

a few times every second (which… sucks IMHO)

Comment 9 John Ellson 2011-04-26 19:05:22 UTC

re comment#8 -  fc15 ?    Is this with kernel-2.6.38.3-18.fc15 ?   If so this is the first report of this problem with a pre 2.6.39 kernel.

Comment 10 Stefan Krüger 2011-04-26 19:47:00 UTC

re comment#9 u(In reply to comment #9)

uname -a reports 2.6.38.2-9.fc15.x86_64

Comment 11 Chuck Ebbert 2011-04-27 01:12:17 UTC

(In reply to comment #8)
> Same problem here with F15 Beta (both x86_64 and i686)
> 
> $ cat /proc/cpuinfo (from F14)
> 
> processor       : 0
> vendor_id       : CentaurHauls
> cpu family      : 6
> model           : 15
> model name      : VIA Nano U3300@1200MHz

I wonder if this is a udev / systemd problem? We have never supported loading microcode for those processors... if microcode loading fails, whatever is launching that should just give up.

Comment 12 Dave Jones 2011-04-27 04:13:36 UTC

I see this (also on a via machine).  I booted with init=/bin/sh and moved /sbin/microcode_ctl out of the way, which at least got me able to boot f15.

after the system has booted, running modprobe microcode, or microcode_ctl -u does not trigger the same loop.  I'm suspicious of the udev rules, though they've been unchanged since f14, so I don't understand yet why they would have broken.

the rules are..

KERNEL=="cpu[0-9]*", ACTION=="add", RUN+="/sbin/modprobe microcode"
KERNEL=="microcode", ACTION=="add", RUN+="/sbin/microcode_ctl -Qu"

This is very odd. The second rule should never get run. The modprobe microcode returns -ENODEV before it gets a chance to register anything which would kick off a microcode event.  True enough, looking in /dev/cpu, I see a microcode devnode has been created (though I have no idea why given the -ENODEV).

The reason for the loop is that the microcode_ctl is doing on-demand loading of microcode.ko, which starts the whole thing over again.

continuing to poke at it..

Comment 13 John Ellson 2011-05-13 15:55:39 UTC

Problem still exists in today's updates:
      kernel-2.6.39-0.rc7.git3.0.fc16.x86_64
      microcode_ctl-1.17-14.fc16.x86_64


Re: comment #10  - does that fc15 system use systemd by any chance?    Most fc15 are not using systemd yet, which might explain why there are not more reports against kernel-2.26.38, and which might be suggesting that this is a systemd related problem???

Comment 14 Stefan Krüger 2011-05-16 22:08:52 UTC

(In reply to comment #13)
> Re: comment #10  - does that fc15 system use systemd by any chance?    Most
> fc15 are not using systemd yet, which might explain why there are not more
> reports against kernel-2.26.38, and which might be suggesting that this is a
> systemd related problem???

rpm -qf /sbin/init reports systemd...

Comment 15 Stefan Krüger 2011-05-19 07:00:54 UTC

JFYI Problem still exists in the 15.RC3 Fedora-15-x86_64-Live-Desktop.iso ( http://serverbeach1.fedoraproject.org/pub/alt/stage/15.RC3/Live/x86_64/ ). And with approx. 5 days till release I guess F15 will be unusable for at least VIA CPU owners...

Comment 16 Steve Bennett 2011-05-31 19:52:33 UTC

It doesn't look like anyone has updated this since F15 release...

I've just done a test upgrade from F14->F15 on a Samsung netbook:
> $ head /proc/cpuinfo 
> 
> processor       : 0
> vendor_id       : CentaurHauls
> cpu family      : 6
> model           : 15
> model name      : VIA Nano U2250@1300+MHz

I also see the repeating "error inserting microcode" message.
After 99 seconds I get dumped into a login prompt for single user mode, so it's not too hard to work around.

What works for me: I've uninstalled the microcode_ctl package - just "rpm -e microcode_ctl" - the packages doesn't support VIA CPUs anyway (but maybe it should?).

It seems a bit poor that a known bug that appears to be affecting a whole vendors-worth of CPUs isn't in the "known bugs" list. Is this problem more specific than it appears? (I only have the one VIA device).

I've not looked very hard, but a cursory grep through files in /etc/udev shows no reference to microcode_ctl - I don't see where it's being called from.

Steve.

Comment 17 Persona non grata 2011-06-07 18:17:44 UTC

Still not solved? F15 Live XFCE x86_64 boots for me normally, even into graphical, but I also seem affected by this bug. CPU (Via Nano U2250) usage is always 100% and microcode tries to load few times per second. On F14 it also fails, but only once in first few seconds of booting, and then gives up and starts normally.

Comment 18 Michal Jaegermann 2011-06-18 18:39:14 UTC

See also bug 690928.

Comment 19 Anton Arapov 2011-06-24 11:45:14 UTC

*** Bug 706611 has been marked as a duplicate of this bug. ***

Comment 20 Anton Arapov 2011-06-24 12:00:29 UTC

Have no hardware to reproduce, but seems to me a kernel space issue described in bug 537697.

Comment 21 Bruno Wolff III 2011-06-24 13:21:43 UTC

It's not just via, old AMD cpus (Athlon MPs) are effected as well.

Comment 22 Persona non grata 2011-06-24 17:05:28 UTC

Huh, maybe if someone changed severity to "Urgent" (yes, this is urgent as it renders distro unusable for lots of CPUs), things would go faster..

Comment 23 Pete Zaitcev 2011-06-24 18:21:04 UTC

I worked around by rpm -e. It's not like anyone really needs this unless some very specific conditions with obsolete Xeons.

Comment 24 John Ellson 2011-06-24 19:29:46 UTC

Personally, I don't think the existance of a workaround that can be implemented by a few experts, makes the bug any less urgent.

And, my Rawhide kvm virtual hosts are suffering from this on a new AMD 6-core processor, so its hardly only obsolete hardware that is affected.

Comment 25 Peter Lister 2011-06-28 14:39:57 UTC

I have just encountered this problem on a VIA CN10000 (C7 Esther) on a vanilla Fedora 15 upgrade.  Please, Red Hat, go buy yourselves a VIA machine for testing. They're cheap enough.

I worked round with rpm -e microcode_ctl, but most people can't do that.  This was known about for weeks and had no business being left in F15.

It is an Urgent fix for all of us who have VIAs.

$ cat /proc/cpuinfo 
processor       : 0
vendor_id       : CentaurHauls
cpu family      : 6
model           : 10
model name      : VIA Esther processor 1000MHz
stepping        : 9
cpu MHz         : 800.000
cache size      : 128 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 mtrr pge cmov pat clflush acpi mmx fxsr sse sse2 tm up pni est tm2 rng rng_en ace ace_en ace2 ace2_en phe phe_en pmm pmm_en
bogomips        : 1596.17
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 32 bits virtual
power management:

Comment 26 Dave Jones 2011-06-28 22:26:25 UTC

I think I found the problem.

/lib/udev/rules.d/89-microcode.rules has this line..

KERNEL=="microcode", ACTION=="add", RUN+="/sbin/microcode_ctl -Qu"

The problem is 'microcode' gets matched on this..

KERNEL [1309299660.790342] add      /module/microcode (module)

on a system which doesn't support microcode updates, it gets followed by

KERNEL remove   /module/microcode (module)

but because we matched that add, we fire off a microcode_ctl, which finds the module isn't loaded, and when it tries to open /dev/cpu/microcode, we get another event, and end up looping.

The fix is to change the string to match on to something that only appears when we actually support microcode updates. Like this..

KERNEL=="devices/platform/microcode", ACTION=="add", RUN+="/sbin/microcode_ctl -Qu"

Anton ?

Comment 27 Anton Arapov 2011-06-29 06:45:00 UTC

Dave, I'm not a guru in udev rules, and agree with your thoughts and conclusion. The worst, I'm not able to test/reproduce it on any hardware around and looking for it. If someone will confirm the changed rule works, I'd be happy to pull it in git.

thanks a lot!

Comment 28 Michal Jaegermann 2011-06-29 08:50:14 UTC

(In reply to comment #20)
> Have no hardware to reproduce, but seems to me a kernel space issue described
> in bug 537697.

I would be quite surprised if these two turn out to be related (but at the moment I am away for a while yet and for now I have no way to check).

Comment 29 John Ellson 2011-06-29 16:27:02 UTC

> KERNEL=="devices/platform/microcode", ACTION=="add", RUN+="/sbin/microcode_ctl
-Qu"

Works for me.

Tested: kernel-3.0-0.rc5.git0.1.fc16.i686 and kernel-3.0-0.rc5.git0.1.fc16.x86_64
on KVM virtual hosts.


(I had to fully disable selinux to get these kernels to run, but thats a different problem.   I did reverify the the microcode problem still existed with selinux fully disabled.)

Comment 30 Adam Williamson 2011-06-29 17:16:18 UTC

*** Bug 715118 has been marked as a duplicate of this bug. ***

Comment 31 Adam Williamson 2011-06-29 17:16:52 UTC

propose as f16alpha blocker, as dupe was so proposed.

Comment 32 Joshua Covington 2011-06-29 18:28:20 UTC

> 
> KERNEL=="devices/platform/microcode", ACTION=="add", RUN+="/sbin/microcode_ctl
> -Qu"
>

This also fixed the problem for me. According to https://bugzilla.kernel.org/show_bug.cgi?id=35522#c8 the microcode driver has been fixed so that it doesn't load when an unsupported processor family is found but this triggers other events that end up in an endless loop (https://bugzilla.redhat.com/show_bug.cgi?id=690930#c26)

Tested with kernel-3.0-0.rc5.git0.1.fc16.x86_64 on a fully updated f15 (I also had to disable selinux but this is unrelated to this bug).

Comment 33 David Smith 2011-06-29 19:01:41 UTC

> KERNEL=="devices/platform/microcode", ACTION=="add", RUN+="/sbin/microcode_ctl
> -Qu"

Works for me also in my 3.0-0.rc5.git0.1.fc16.x86_64 vm.  That kernel wouldn't even boot for me without this fix.

Comment 34 Dave Jones 2011-06-29 19:42:34 UTC

*** Bug 717723 has been marked as a duplicate of this bug. ***

Comment 35 Fedora Update System 2011-06-30 06:04:30 UTC

microcode_ctl-1.17-15.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/microcode_ctl-1.17-15.fc15

Comment 36 Anton Arapov 2011-06-30 06:06:18 UTC

rawhide build: http://koji.fedoraproject.org/koji/taskinfo?taskID=3171088
fedora15 build: http://koji.fedoraproject.org/koji/taskinfo?taskID=3171092

Thank you, Dave.

Comment 37 Horst H. von Brand 2011-06-30 18:39:44 UTC

With microcode_ctl-1.17-15.fc16.x86_64 the kernel 3.0-0.rc5.git0.1.fc16.x86_64 booted fine on an old Atlon MP processor (family 15).

Thanks all.

Comment 38 Fedora Update System 2011-06-30 18:56:31 UTC

Package microcode_ctl-1.17-15.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing microcode_ctl-1.17-15.fc15'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/microcode_ctl-1.17-15.fc15
then log in and leave karma (feedback).

Comment 39 Fedora Update System 2011-07-01 19:01:16 UTC

microcode_ctl-1.17-15.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 40 Peter Lister 2011-07-04 16:08:37 UTC

Tried microcode_ctl-1.17-15.fc15 on VIA C7

I now get one message "no support for this CPU vendor" and boot as normal.

Thanks for the fix.

Comment 41 Anton Arapov 2011-07-21 10:46:04 UTC

It seems this update breaks the microcode update routine for the Intel CPUs.
Updated udev rule doesn't fix the problem it just doesn't work.

See bug 722785

Comment 42 Adam Williamson 2011-07-21 16:14:58 UTC

that's a different bug, so it was correct for it to be filed separately. this bug is fixed. the fix caused a new bug.

Comment 43 Dave Jones 2011-07-21 20:04:40 UTC

I'm not sure reverting that change as an update and reintroducing a "won't boot" situation was a good choice, as opposed to just fixing 722785.

If that update goes to stable, we're just going to reopen this one, and not booting is way more serious than microcode not being updated.

Comment 44 Dave Jones 2011-07-21 20:09:39 UTC

adding Kay to the Cc, who may spot the right fix.

Kay, for some reason the rule in comment #26 isn't getting matched, even though we see an event..

KERNEL[1311278368.768885] add      /devices/platform/microcode (platform)

any ideas ?

Comment 45 Kay Sievers 2011-07-21 20:46:03 UTC

The KERNEL== rules key expects a simple file name, the name of the device,
not a path to the device. The rule match should probably be limited by adding SUBSYSTEM=="platform", which will match the /sys/devices/platform/microcode
device.

The module event has SUBSYSTEM=="module" so it should not match this rule.

Comment 46 Dave Jones 2011-07-21 21:09:54 UTC

thanks. indeed, this rule seems to do the right thing on intel for me..

KERNEL=="microcode", SUBSYSTEM=="platform", ACTION=="add", RUN+="/sbin/microcode_ctl -Qu" 

[30221.936924] microcode: CPU0 sig=0x1067a, pf=0x80, revision=0xa04
[30221.954265] microcode: CPU1 sig=0x1067a, pf=0x80, revision=0xa04
[30221.975521] microcode: Microcode Update Driver: v2.00 <tigran.co.uk>, Peter Oruba
[30222.058081] microcode: CPU0 updated to revision 0xa0b, date = 2010-09-28
[30222.067079] microcode: CPU1 updated to revision 0xa0b, date = 2010-09-28

Comment 47 Joshua Covington 2011-07-21 21:44:48 UTC

Modifying the rule as in Comment #46 resulted again in endless microcode_ctl loops complaining that "CPU architecture 15 not supported". I couldn't boot at the end. This is not a solution for the unsupported amd CPUs.

Comment 48 Dave Jones 2011-07-22 02:50:46 UTC

ok, it looks like AMD uses a completely different mechanism, and doesn't need the microcode_ctl invocation at all.

this might need kernel changes.  Kay, should arch/x86/kernel/microcode_core.c only be registering the platform device for Intel ? It looks like the request_firmware stuff that the microcode_amd.c code uses would work without that unless I'm missing something..

In fact, it should skip it on everything but Intel afaics.

Comment 49 Joshua Covington 2011-07-22 05:38:51 UTC

Just an idea: can the microcode_ctl somehow ask and see that the microcode wasn't loaded because the machine doesn's support updates? So that it should just give up?

Comment 50 Borislav Petkov 2011-07-22 07:15:05 UTC

Right,

so from looking at the kernel side, microcrode_ctl should be getting
-EINVAL from the ->write method of the dev/cpu/microcode node and
microcode_ctl should handle that correctly (in theory).

But, Intel wanted to move to the request_firmware method long time
ago: http://lwn.net/Articles/189377/ and it seems they do support
both methods now for loading ucode. Assuming I'm not missing some
peculiarity with Intel's ucode loading, switching Fedora to simply

modprobe microcode

and dropping the microcode_ctl method should fix all issues, no?

Comment 51 Kay Sievers 2011-07-22 11:12:36 UTC

As far as I know, does Intel and AMD use the request_firmware()
interface from inside the kernel now and do not need an external
tool to load the microcode:

const char *fw_name = "amd-ucode/microcode_amd.bin";
if (request_firmware(&fw, fw_name, device))

sprintf(name, "intel-ucode/%02x-%02x-%02x",
  c->x86, c->x86_model, c->x86_mask);
if (request_firmware(&firmware, name, device))

I don't know all the details, if there might be cases, the tool is
still useful, but I at least have seen the firmware requests
happening on Intel boxes.

I think part of the problem was suspend, where the microcode needs
to be re-applied and which should be simpler to hook up with
request_firmware than with an existing device.

Comment 52 Dave Jones 2011-07-22 16:58:45 UTC

AIUI, Intel uses request_firmware for intel-ucode/ style updates, and the old microcode_ctl tool for using microcode.dat.  The problem is afaik they've never issued any intel-ucode/ updates, so we still need to support the old method there.

Comment 53 Tim Flink 2011-07-22 21:54:52 UTC

Discussed at the 2011-07-22 blocker bug review meeting. We agreed that the ability for AMD systems to boot takes precedence over microcode updates for intel chips and accepted it as a Fedora 16 alpha blocker under the following alpha release criteria:

When booting a system installed without a graphical environment, or when using a correct configuration setting to cause an installed system to boot in non-graphical mode, the system should boot to a state where it is possible to log in through at least one of the default virtual consoles.

If no fix for both issues is available for the alpha release, please re-apply the broken fix in order for AMD systems to boot.

Comment 54 John Ellson 2011-07-23 18:30:28 UTC

Damn!      

Yes - make AMD systems bootable again!!!!!


Some update (presumably microcode_ctl) just bricked my two Rawhide virtual hosts, and I no longer have the kernel on the systems that gave me a way to boot without tickling the bug!.    This bug was supposedly fixed!!!


This was a known problem!   Why was it reintroduced without testing!   Doesn't anyone test anything any more before releasing on the world?


Now I have to reinstall from scratch .... but I guess there is no point for a while because Rawhide DOESN'T FUCKING BOOT!

Comment 55 Kay Sievers 2011-07-23 18:41:18 UTC

I guess we need to find out what's really going on here.

Might be the the driver refuses to stay in the kernel at
modprobe time, but that microcode_ctl, which is about to
run, is accessing some device node and triggers the
autoload of the module again and again, when called from
the event.

Can someone please run:
  udevadm monitor
and modprobe the microcode module, and paste the output
of the looping events.

Please add:
  KERNEL=="microcode", SUBSYSTEM=="platform"
to the rules file:
  /lib/udev/rules.d/89-microcode.rules

Please make sure that there is only a single rule and not other
file trying the same, or some left-over from older packages:
  grep microcode_ctl /lib/udev/rules.d/*.rules /etc/udev/rules.d/*.rules

Comment 56 Kay Sievers 2011-07-23 18:44:12 UTC

(In reply to comment #54)
> Some update (presumably microcode_ctl) just bricked my two Rawhide virtual
> hosts, and I no longer have the kernel on the systems that gave me a way to
> boot without tickling the bug!.
> Now I have to reinstall from scratch .... but I guess there is no point for a
> while because Rawhide DOESN'T FUCKING BOOT!

Until it's fixed, just remove the microcode package from the VM, there
is nothing really to update in a VM anyway.

Or comment out the rule in:
  /lib/udev/rules.d/89-microcode.rules

Comment 57 Joshua Covington 2011-07-23 22:38:38 UTC

(In reply to comment #55)
> I guess we need to find out what's really going on here.
> 
> Might be the the driver refuses to stay in the kernel at
> modprobe time, but that microcode_ctl, which is about to
> run, is accessing some device node and triggers the
> autoload of the module again and again, when called from
> the event.
> 
> Can someone please run:
>   udevadm monitor
> and modprobe the microcode module, and paste the output
> of the looping events.
> 
> Please add:
>   KERNEL=="microcode", SUBSYSTEM=="platform"
> to the rules file:
>   /lib/udev/rules.d/89-microcode.rules
> 
> Please make sure that there is only a single rule and not other
> file trying the same, or some left-over from older packages:
>   grep microcode_ctl /lib/udev/rules.d/*.rules /etc/udev/rules.d/*.rules

This is the result (this repeats all the time, so this is just a cut off of it):

UDEV  [1311460575.956189] remove   /module/microcode (module)
UDEV  [1311460575.957040] add      /devices/platform/microcode (platform)
UDEV  [1311460575.961022] remove   /devices/platform/microcode (platform)
KERNEL[1311460576.039548] add      /module/microcode (module)
KERNEL[1311460576.042260] add      /devices/platform/microcode (platform)
UDEV  [1311460576.042779] add      /module/microcode (module)
KERNEL[1311460576.044529] remove   /devices/platform/microcode (platform)
KERNEL[1311460576.048731] remove   /module/microcode (module)
UDEV  [1311460576.051100] remove   /module/microcode (module)
UDEV  [1311460576.057081] add      /devices/platform/microcode (platform)
UDEV  [1311460576.061156] remove   /devices/platform/microcode (platform)
KERNEL[1311460576.141368] add      /module/microcode (module)
KERNEL[1311460576.142321] add      /devices/platform/microcode (platform)
KERNEL[1311460576.143501] remove   /devices/platform/microcode (platform)
UDEV  [1311460576.145257] add      /module/microcode (module)
KERNEL[1311460576.150730] remove   /module/microcode (module)
UDEV  [1311460576.153918] remove   /module/microcode (module)
UDEV  [1311460576.156773] add      /devices/platform/microcode (platform)
UDEV  [1311460576.160349] remove   /devices/platform/microcode (platform)
KERNEL[1311460576.238278] add      /module/microcode (module)
KERNEL[1311460576.240271] add      /devices/platform/microcode (platform)
UDEV  [1311460576.241964] add      /module/microcode (module)
KERNEL[1311460576.243261] remove   /devices/platform/microcode (platform)
KERNEL[1311460576.246708] remove   /module/microcode (module)

Comment 58 Kay Sievers 2011-07-24 16:49:49 UTC

Thanks! I guess what happens is:
 modprobe microcode
  platform_device_register_simple("microcode");
   sysdev_driver_register(.., &mc_sysdev_driver);
    mc_sysdev_add();
     microcode_init_cpu(cpu);
      collect_cpu_info(cpu);
       collect_cpu_info_amd(cpu, ...)
        --> not supported
         platform_device_unregister(microcode_pdev);
          module init fails

As soon as the microcode platform_device_register() happens, we run
microcode_ctl(8) from udev rules. At that point the microcode module
is likely already gone from the kernel, because it has failed to
load.

The executed microcode_ctl(8) accesses /dev/cpu/microcode, which
triggers the kernel module auto-loader to run 'modprobe microcode',
and the loop starts.

The platform device is needed as the parent device for the
request_firmware() calls also for AMD CPUs, so we can not really
remove it.

I see a couple of options:

1. Ask Intel to provide ucode files, or come up with a 'script' that
splits the microcode.dat into properly named ucode files.

2. If 1. isn't realistic, we can teach the kernel's Intel microcode
driver to request the microcode.dat file via firmware request, parse
it itself, like it is doing the ucode files, instead of relying on
microcode_ctl to do that.

For 1. + 2. we could remove the entire /dev/cpu/microcode infrastructure
from the kernel, and get rid of the microcode_ctl package that way.

3. We send out a KOBJ_CHANGE event when the microcode CPU driver gets
bound to the CPU, and hook udev into that event instead of the
platform device. More like a hack, but could work.

4. The .collect_cpu_info from 'struct microcode_ops' is able to tell
if a microcode update is possible for a specific CPU. Maybe we
should call it once for CPU0, before the platform device is registered?
Sub-optimal, we should probably not rely on anything in a udev rule,
that is created during modprobe of the same module.

Comment 59 Michal Jaegermann 2011-07-24 17:58:50 UTC

*** Bug 690928 has been marked as a duplicate of this bug. ***

Comment 60 Kay Sievers 2011-07-24 23:03:27 UTC

Created attachment 514937 [details]
intel microcode.dat -> ucode converter

Here is small program that converts microcode.dat to individual
intel-ucode/XX-XX-XX files. I didn't test anything, just looked
at the output of the program. Just in case someone wants to play
with it ...

$ gcc -Wall -o intel-microcode2ucode intel-microcode2ucode.c
$ rm intel-ucode/*; ./intel-microcode2ucode
/lib/firmware/microcode.dat: 432128(422k) bytes, 108032 integers

intel-ucode/0f-00-0a
signature: 0xf0a
flags:     0x02
revision:  0x15
date:      2002-08-21
size:      2048

intel-ucode/0f-02-05
signature: 0xf25
flags:     0x10
revision:  0x2c
date:      2004-08-26
size:      2048
...

$ ls intel-ucode/
06-03-02  06-06-0d  06-08-0a  06-0d-01  06-0f-0b  0f-03-02  0f-06-02
06-05-00  06-07-01  06-09-05  06-0d-06  06-0f-0d  0f-03-03  0f-06-04
06-05-01  06-07-02  06-0a-00  06-0e-04  0f-00-07  0f-03-04  0f-06-05
...

Comment 61 Joshua Covington 2011-07-28 21:15:28 UTC

Any developments here?

Comment 62 Anton Arapov 2011-07-29 09:30:35 UTC

I still haven't had a chance to look into. :( But in parallel I'm checking with Intel whether there are any development towards linux-firmware infrastructure use. I know they have it in plans...

Comment 63 Joshua Covington 2011-07-29 11:46:03 UTC

I see. Can you please in the mean time re-apply the broken fix (just for f16) so that the amd machines can test the nightly builds? This bug is tagged as f16blocker.

Comment 64 Anton Arapov 2011-07-29 12:21:46 UTC

(In reply to comment #63)
Yeah, apologizes. Comment is valid. I've just "re-"reverted the changes.

Comment 65 Joshua Covington 2011-07-29 12:50:42 UTC

Thanks, I just saw that f15 is also moving to the 2.6.40 kernel. So maybe you should port back this to f15?

Comment 66 Peter Lister 2011-07-29 17:19:31 UTC

As the current workaround for non Intel machines is to just delete the RPM, may I please request that yum only installs the RPM on machines which are known Genuine Intel or that there is absolutely no possibility whatsoever of the rules being installed on a non-Intel system or that the code checks the processor type first and bails out if it doesn't recognise the EXACT vendor / model etc.

Don't rely on udev rules until this is known fixed AND TESTED on every known non-Intel x86. 

Please don't tell AMD and VIA users to just remove it - it is NOT our fault that we chose another processor vendor. If the package is this brittle and the bug has consequences this bad, there must be no way it can run.

Comment 67 Dave Jones 2011-07-29 17:56:30 UTC

The ideal way forward as I see it, is that until Intel provide split-up firmware files, we use the converter Kay provided in comment #60 in the microcode_ctl package to split them up ourself in %post, and then remove the microcode_ctl binary completely.

However.. I just tried this, and although the microcode module gets loaded, it doesn't seem to do the request_firmware call, so we might need some kernel changes too.

Even hitting the sysfs knobs at /sys/devices/system/cpu/cpu*/microcode/reload don't seem to cause a fw request to happen.  Kay ?

Comment 68 Michal Jaegermann 2011-07-29 23:13:25 UTC

(In reply to comment #67)
> 
> However.. I just tried this, and although the microcode module gets loaded, it
> doesn't seem to do the request_firmware call, so we might need some kernel
> changes too.

AFAICS adding 'microcode' module to a blacklist of modprobe seems to provide a workaround.  Maybe there is a reasonable way to recognize on which platforms
/etc/microcode.d/blacklist-microcode.conf would surely be not needed?  An automatic removal of such file, where such action is sane, would possibly provide a way to tide us over the current mess.

Yes, I realize that this is not the right way to deal with the issue on a long run.

Comment 69 Edward Sheldrake 2011-07-30 08:18:08 UTC

(In reply to comment #67)
> However.. I just tried this, and although the microcode module gets loaded, it
> doesn't seem to do the request_firmware call, so we might need some kernel
> changes too.
> 
The Fedora kernel has the neuter_intel_microcode_load.patch which disables the intel new style firmware loading (dating back to some bug causing a long delay on missing firmware files).

I'm not sure the intel-microcode2ucode program is quite correct, because I've been successfully using a "vanilla" kernel and a ucode file I converted manually named "06-17-0a", but intel-microcode2ucode doesn't create any file with that name.

microcode: CPU0 sig=0x1067a, pf=0x80, revision=0xa07
microcode: CPU0 updated to revision 0xa0b, date = 2010-09-28

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Duo CPU     T6400  @ 2.00GHz
stepping	: 10

Comment 70 Borislav Petkov 2011-07-30 10:48:10 UTC

(In reply to comment #69)
> (In reply to comment #67)
> > However.. I just tried this, and although the microcode module gets loaded, it
> > doesn't seem to do the request_firmware call, so we might need some kernel
> > changes too.
> > 
> The Fedora kernel has the neuter_intel_microcode_load.patch which disables the
> intel new style firmware loading (dating back to some bug causing a long delay
> on missing firmware files).

That can be "fixed" by allowing the microcode driver to be built only as
a module so that there's userspace when it loads so that it can look for
ucode patches in /lib/firmware/...

> I'm not sure the intel-microcode2ucode program is quite correct, because I've
> been successfully using a "vanilla" kernel and a ucode file I converted
> manually named "06-17-0a", but intel-microcode2ucode doesn't create any file
> with that name.
> 
> microcode: CPU0 sig=0x1067a, pf=0x80, revision=0xa07
> microcode: CPU0 updated to revision 0xa0b, date = 2010-09-28
> 
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model  : 23
> model name : Intel(R) Core(TM)2 Duo CPU     T6400  @ 2.00GHz
> stepping : 10

So, provided you have splitted the ucode image files properly,
you should be able to load those with microcode.ko using the
request_firmware interface (after having backed out the neuter... patch,
of course).

Comment 71 Kay Sievers 2011-07-30 13:42:34 UTC

(In reply to comment #69)
 I'm not sure the intel-microcode2ucode program is quite correct, because I've
> been successfully using a "vanilla" kernel and a ucode file I converted
> manually named "06-17-0a", but intel-microcode2ucode doesn't create any file
> with that name.
> 
> microcode: CPU0 sig=0x1067a, pf=0x80, revision=0xa07
> microcode: CPU0 updated to revision 0xa0b, date = 2010-09-28

Ah, that's useful information. Thanks!

The signature is in the microcode.dat file:
  intel-ucode/06-07-0a
  signature: 0x1067a
  flags:     0xa0
  revision:  0xa0b
  date:      2010-09-28
  size:      8192


Maybe the reversing of the signature in the headers of the ucode sections
to $family-$model-$stepping needs a fix. I'll look into that.

Comment 72 Kay Sievers 2011-07-30 14:13:22 UTC

Created attachment 515975 [details]
intel microcode.dat -> ucode converter

Missed to add the extended bits, which add to model for some cpus.
Updated tool attached.

Comment 73 Kay Sievers 2011-07-30 14:50:54 UTC

(In reply to comment #67)
> The ideal way forward as I see it, is that until Intel provide split-up
> firmware files, we use the converter Kay provided in comment #60 in the
> microcode_ctl package to split them up ourself in %post, and then remove the
> microcode_ctl binary completely.

Sounds good to me. We should check a few boxes where microcode_ctl is known
to be able to update the CPUs. Seems, none of the boxes I have at home need that.

> However.. I just tried this, and although the microcode module gets loaded, it
> doesn't seem to do the request_firmware call, so we might need some kernel
> changes too.
> 
> Even hitting the sysfs knobs at /sys/devices/system/cpu/cpu*/microcode/reload
> don't seem to cause a fw request to happen.  Kay ?

I can confirm, Like Edward mentioned, that a kernel.org kernel works
fine with loading microcode.ko, or writing to
/sys/devices/system/cpu/cpu0/microcode/reload. Both cause the ucode
firmware requests.

Comment 74 Joshua Covington 2011-07-31 19:35:01 UTC

(In reply to comment #64)
> (In reply to comment #63)
> Yeah, apologizes. Comment is valid. I've just "re-"reverted the changes.

Please revert the changes also in fc15. kernel-2.6.40-4 has reached the update-testing repo and as you can see some testers have been confronted with the same error: https://admin.fedoraproject.org/updates/kernel-2.6.40-4.fc15

When the kernel reaches the public this will turn into a long topic in the mailing lists and the forums.

Comment 75 Ernesto Manríquez 2011-08-02 02:45:00 UTC

*** Bug 727391 has been marked as a duplicate of this bug. ***

Comment 76 Ernesto Manríquez 2011-08-02 02:48:25 UTC

This has reached the public, and I'm crashing into this. I filed bug 727391 as a dupe of this, but for Fedora 15. Please, change the priority to URGENT and fix it ASAP for Fedora 15, specially if the fix is a mere udev rule line.

Comment 77 Ernesto Manríquez 2011-08-02 12:56:36 UTC

Bug 727391 fixed! Discussion can continue :)

Comment 78 Adam Williamson 2011-08-02 18:07:18 UTC

joshua: the changes and reversions and re-reversions happen in microcode_ctl package, not kernel package.

the initial fix - which made booting okay on AMD, but broke microcode updates on Intel - happened in 1.17-15.fc15 . That was reverted in 1.17-16.fc15 , which fixed microcode updates on Intel but broke booting on AMD again. That was *re*-reverted - i.e. the 'fix' was put back - in 1.17-17.fc15 , which again makes booting on both arches okay, but stops anyone getting microcode updates for now.

The kernel package is not involved at all, so all that discussion on the kernel update is out of place.

tl;dr summary: you want microcode_ctl-1.17-17.fc15 if you're on AMD.

Comment 79 Joshua Covington 2011-08-02 18:17:15 UTC

My comment on the update states clearly:

joshuacov - 2011-07-31 09:43:54
No problems so far. 

If you accuse me of putting all those "Anonymous Tester" comments, I DEMAND A PUBLIC APPOLIGY.

Hier I just asked for the broken microcode_ctl be pushed also for the f15. nothing else.

I really have no idea what you meant with:

>
>The kernel package is not involved at all, so all that discussion on the kernel
>update is out of place.
>
>tl;dr summary: you want microcode_ctl-1.17-17.fc15 if you're on AMD.

I'm just waiting for a solution like everybody else. Period.

Comment 80 Adam Williamson 2011-08-02 18:35:11 UTC

joshua: er, I did not accuse you of anything. You linked to the kernel update and pointed out the discussion there, so I clarified that the important update is microcode_ctl and the kernel doesn't really have anything to do with it. I made no suggestion as to the source of any of the discussion on the kernel update thread, simply mentioned that it was out of place.

Comment 81 Edward Sheldrake 2011-08-02 18:43:18 UTC

If we want to switch to the new-style Intel firmware loading method, neuter_intel_microcode_load.patch needs to be dropped from the kernel.

Comment 82 Dave Jones 2011-08-02 19:21:00 UTC

thanks for the reminder Edward. I'll drop that in the next kernel update if we can agree that shipping the broken out files is what we're going to do.

Anton ?

Comment 83 Anton Arapov 2011-08-03 12:43:56 UTC

  Dave, I'm not sure what do "broken out" file mean.... But I've tested the kernel with the patch reverted. And it works flawless. I'd rather retire microcode_ctl and switch to linux-firmware, but not sure what is the correct way to maintain the latests microcode then. I can take the maintenance though...

Comment 84 Joshua Covington 2011-08-03 14:08:31 UTC

What about the option in Comment #67:
>The ideal way forward as I see it, is that until Intel provide split-up
>firmware files, we use the converter Kay provided in comment #60 in the
>microcode_ctl package to split them up ourself in %post, and then remove the
>microcode_ctl binary completely.

As the Dave suggested maybe the microcode_ctl should just deliver the microode and brake it down to the ucode files. The rest is done in the kernel itself. The binary is not needed then.

Comment 85 Anton Arapov 2011-08-03 19:37:44 UTC

Works for me as a temporary workaround, especially within the f16 time-frame.

Comment 86 Dave Jones 2011-08-03 21:45:47 UTC

ok. lets do this in f16/, get it right there, and then we can fix it up in f15 once we know it works.

If you sort out microcode_ctl, I'll drop that diff from the kernel once I see your build show up.  (AIUI, if I drop it now, we introduce a different bug where we stall during boot.. ugh what a mess).

Comment 87 Anton Arapov 2011-08-04 08:14:02 UTC

build for rawhide:
  http://koji.fedoraproject.org/koji/taskinfo?taskID=3251457

Take a look, test, if no objections or problems I will push it to f16/ .
Do we want this change in f15/ as well?

Comment 88 Anton Arapov 2011-08-04 08:23:29 UTC

I've tested the package with the kernel built without the neuter_intel_microcode_load.patch, it works for my laptop. :)

# rpm -q microcode_ctl
microcode_ctl-1.17-18.fc15.x86_64

# rpm -ql microcode_ctl
/lib/firmware/amd-ucode
/lib/firmware/amd-ucode/microcode_amd.bin
/lib/firmware/intel-ucode
/lib/firmware/intel-ucode/06-03-02
[snip]
/lib/firmware/intel-ucode/0f-06-08
/lib/udev/rules.d/89-microcode.rules
/sbin/intel-microcode2ucode
/sbin/microcode_ctl
/usr/share/doc/microcode_ctl-1.17
/usr/share/doc/microcode_ctl-1.17/INSTALL.microcode_amd
/usr/share/doc/microcode_ctl-1.17/LICENSE.microcode_amd
/usr/share/doc/microcode_ctl-1.17/README.microcode_amd
/usr/share/man/man8/microcode_ctl.8.gz

# date
Thu Aug  4 10:16:30 CEST 2011

# grep -i microcode /var/log/messages
Aug  4 10:15:49 b kernel: microcode: CPU0 sig=0x10676, pf=0x80, revision=0x60c
Aug  4 10:15:49 b kernel: microcode: CPU0 updated to revision 0x60f, date = 2010-09-29
Aug  4 10:15:49 b kernel: microcode: CPU1 sig=0x10676, pf=0x80, revision=0x60c
Aug  4 10:15:49 b kernel: microcode: CPU1 updated to revision 0x60f, date = 2010-09-29
Aug  4 10:15:49 b kernel: microcode: Microcode Update Driver: v2.00 <tigran.co.uk>, Peter Oruba

Comment 89 Joshua Covington 2011-08-04 09:00:12 UTC

(In reply to comment #87)
> build for rawhide:
>   http://koji.fedoraproject.org/koji/taskinfo?taskID=3251457
> Take a look, test, if no objections or problems I will push it to f16/ .
> Do we want this change in f15/ as well?

I saw in the build log that you built the microcode_ctl exec and then remove it. Won't it be easier to just remove/not build the microcode_ctl exec at all? It's not needed anymore, is it? Or I got the idea wrong?

Comment 90 Anton Arapov 2011-08-04 09:07:21 UTC

I didn't remove it, it's still there. I don't want to remove it until Intel will produce "proper" microcode update package.

Comment 91 Joshua Covington 2011-08-04 09:16:41 UTC

We know that the error resulted from /sbin/microcode_ctl trying to load the firmware. I still didn't have chance to test the code but I think as long as the "right" udev rules are there (not the modified one in comment #26 that disabled the microcode update on the intel cpus) then the very same /sbin/microcode_ctl will trigger the indefinite loop on the amd machines.

That's the reason I think it should be remove/renamed so that it cannot run.

Comment 92 Joshua Covington 2011-08-04 09:20:37 UTC

I agree with your concerns about the proper intel-ucode files but we have them with the converter. that's why I think that only the ucode/microcode should be shipped and left on the user's machine. The converter isn't needed either once it splits the intel-file. All udev rules and exec (the userspace part) is in the kernel now and is obsolete.

Comment 93 Anton Arapov 2011-08-04 09:24:50 UTC

I do want to put every bit we use to produce the results into the binary package, it's just handy have it. I've also removed the line from the udev rules that invokes /sbin/microcode_ctl.

Keep in mind we have this package as a workaround solution. I've pinged intel folks on this issue, I know they have related development towards linux-firmware. I will update this bug once have more info.

Comment 94 Anton Arapov 2011-08-04 10:00:38 UTC

for the interested parties: 
  kernel-2.6.40-4 with the neuter_intel_microcode_load.patch reverted, for the test: http://koji.fedoraproject.org/koji/taskinfo?taskID=3251512

Comment 95 Joshua Covington 2011-08-04 17:10:02 UTC

I tested this with the kernel in comment #94 on fc15 and the processor in https://bugzilla.kernel.org/show_bug.cgi?id=35522#c0 . This is a AMD CPU family 0xf processor that doesn't need updated firmware. The kernel booted fine without any loop messages. The only trace in dmesg says: 

microcode: CPU0: family 15 not supported

So it works for me.

The only thing that bothers me is that I usually got 2 message in the dmesg: one for CPU0 and the other for CPU1 (output from 2.6.38.6):
microcode: microcode: CPU0: AMD CPU family 0xf not supported
microcode: microcode: CPU1: AMD CPU family 0xf not supported
microcode: Microcode Update Driver: v2.00
<tigran.co.uk>, Peter Oruba

This time it tried to load the firmware only for CPU0. What about CPU1? Should the kernel try to load the firmware for it, too?

Comment 96 Dave Jones 2011-08-04 17:30:15 UTC

f16 kernel build with the neuter patch removed will be done later today

I've removed the diff from f15 too, but that won't be seeing an update until next week sometime. (We only just pushed one update, so want to batch up a bunch of other fixes to push at most 1 update a week).  There will be interim builds in koji for those who want to test.

Comment 97 Anton Arapov 2011-08-05 06:52:45 UTC

(In reply to comment #95)
> This time it tried to load the firmware only for CPU0. What about CPU1? Should
> the kernel try to load the firmware for it, too?
  What is the observation that led you to this conclusion? :)

  We load the microcode by loading microcode module. We need it loaded only once on *boot* and *restore* from hibernate and current udev rule works fine AFAICT.
  Below, 8:35 is system boot and 8:45 is wake up from hibernate for Intel Core2 Duo CPU P8600  @2.40GHz:
> Aug  5 08:35:01 b kernel: microcode: CPU0 sig=0x10676, pf=0x80, revision=0x60c
> Aug  5 08:35:01 b kernel: microcode: CPU0 updated to revision 0x60f, date = 2010-09-29
> Aug  5 08:35:01 b kernel: microcode: CPU1 sig=0x10676, pf=0x80, revision=0x60c
> Aug  5 08:35:01 b kernel: microcode: CPU1 updated to revision 0x60f, date = 2010-09-29
> Aug  5 08:35:01 b kernel: microcode: Microcode Update Driver: v2.00
> Aug  5 08:45:49 b kernel: microcode: CPU0 updated to revision 0x60f, date = 2010-09-29
> Aug  5 08:45:49 b kernel: microcode: CPU1 updated to revision 0x60f, date = 2010-09-29

Comment 98 Fedora Update System 2011-08-05 07:11:36 UTC

microcode_ctl-1.17-18.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/microcode_ctl-1.17-18.fc16

Comment 99 Fedora Update System 2011-08-05 07:20:35 UTC

microcode_ctl-1.17-18.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/microcode_ctl-1.17-18.fc15

Comment 100 Anton Arapov 2011-08-05 07:23:34 UTC

(In reply to comment #96)
> f16 kernel build with the neuter patch removed will be done later today
  microcode_ctl build is in bodhi, wants karma++ to hit the stable repo. So I wish it incremented after the patched kernel.
 
> I've removed the diff from f15 too, but that won't be seeing an update until
> next week sometime. (We only just pushed one update, so want to batch up a
> bunch of other fixes to push at most 1 update a week).  There will be interim
> builds in koji for those who want to test.
  Same for f15, but wants more karma, I'll be watching it not to hit stable before patched kernel.

thanks,

Comment 101 Joshua Covington 2011-08-05 07:28:39 UTC

(In reply to comment #97)
> (In reply to comment #95)
> > This time it tried to load the firmware only for CPU0. What about CPU1? Should the kernel try to load the firmware for it, too?
>   What is the observation that led you to this conclusion? :)
>   We load the microcode by loading microcode module. We need it loaded only
> once on *boot* and *restore* from hibernate and current udev rule works fine
> AFAICT.
>   Below, 8:35 is system boot and 8:45 is wake up from hibernate for Intel Core2
> Duo CPU P8600  @2.40GHz:
> > Aug  5 08:35:01 b kernel: microcode: CPU0 sig=0x10676, pf=0x80, revision=0x60c
> > Aug  5 08:35:01 b kernel: microcode: CPU0 updated to revision 0x60f, date = 2010-09-29
> > Aug  5 08:35:01 b kernel: microcode: CPU1 sig=0x10676, pf=0x80, revision=0x60c
> > Aug  5 08:35:01 b kernel: microcode: CPU1 updated to revision 0x60f, date = 2010-09-29
> > Aug  5 08:35:01 b kernel: microcode: Microcode Update Driver: v2.00
> > Aug  5 08:45:49 b kernel: microcode: CPU0 updated to revision 0x60f, date = 2010-09-29
> > Aug  5 08:45:49 b kernel: microcode: CPU1 updated to revision 0x60f, date = 2010-09-29


Maybe I explained this incorrectly:
As you can see from your output the microcode tries to update the code on both cores of the Core2 Duo CPU P8600  @2.40GHz: CPU0 and CPU1.

My processor is also a dual core AMD turion64 x2 tl-60, but in my case I see only one attempt (at boot) for core0:

microcode: CPU0: family 15 not supported

So the driver never tries to update the firmware on core1 (CPU1). why?

Comment 102 Borislav Petkov 2011-08-05 08:24:21 UTC

Well,

one explanation could be that if your kernel has f4203e3032e5ae74c3e89df85a5a6d96022d0c49 from upstream, then when we return an error while checking whether your CPU is supported by the driver, we unload the module and don't check the remaining cores.

HTH.

Comment 103 Kay Sievers 2011-08-05 09:25:26 UTC

Looks all promising. Thanks!

If that all works out as planned, I think, we should entirely drop the udev
rule and just place a file:
  /usr/lib/modules-load.d/microcode.conf
that unconditionally instructs 'init' to load the microcode.ko once
during bootup.

We currently call modprobe for every CPU on the system, which should not
be needed.

Comment 104 Borislav Petkov 2011-08-05 10:08:26 UTC

(In reply to comment #103)
> Looks all promising. Thanks!
> 
> If that all works out as planned, I think, we should entirely drop the udev
> rule and just place a file:
>   /usr/lib/modules-load.d/microcode.conf
> that unconditionally instructs 'init' to load the microcode.ko once
> during bootup.

Yep, actually every distro should do that. At least until we've
rewritten the whole microcode thing on x86 to load early during boot
from the bootloader and drop the module completely - then it'll all work
completely transparent for userspace, obviating any need for the last
doing anything. Oh well, one fine day...

> We currently call modprobe for every CPU on the system, which should
> not be needed.

Yes.

Comment 105 Anton Arapov 2011-08-05 11:05:01 UTC

(In reply to comment #103)
> If that all works out as planned, I think, we should entirely drop the udev
> rule and just place a file:
>   /usr/lib/modules-load.d/microcode.conf
> that unconditionally instructs 'init' to load the microcode.ko once
> during bootup.
  Kay would this work well for the hibernate->restore cycle? Will the module be loaded again?

Comment 106 Joshua Covington 2011-08-05 14:34:42 UTC

(In reply to comment #102)
> Well,
> one explanation could be that if your kernel has
> f4203e3032e5ae74c3e89df85a5a6d96022d0c49 from upstream, then when we return an
> error while checking whether your CPU is supported by the driver, we unload the
> module and don't check the remaining cores.
> HTH.

I googled for commit f4203e3032e5ae74c3e89df85a5a6d96022d0c49 but couldn't find anything. I think the reason is what you explained but can you give me a link to this commit?

Comment 107 Anton Arapov 2011-08-05 14:40:26 UTC

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f4203e3032e5ae74c3e89df85a5a6d96022d0c49

Comment 108 Joshua Covington 2011-08-05 16:23:30 UTC

(In reply to comment #107)
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f4203e3032e5ae74c3e89df85a5a6d96022d0c49

Thanks, fedora has this commit which explains the behavior. So everything is working fine now.

Comment 109 Adam Williamson 2011-08-05 16:51:36 UTC

so, I see a microcode_ctl update just submitted for f16, but no corresponding kernel update. will a kernel change also be required for this? we're in the middle of Alpha test composes, so please only submit a complete update that's really expected to work for both CPU types...thanks!

Comment 110 Kay Sievers 2011-08-05 18:16:10 UTC

(In reply to comment #105)
> > If that all works out as planned, I think, we should entirely drop the udev
> > rule and just place a file:
> >   /usr/lib/modules-load.d/microcode.conf
> > that unconditionally instructs 'init' to load the microcode.ko once
> > during bootup.
> 
> Kay would this work well for the hibernate->restore cycle? Will the module be
> loaded again?

If possible, we should try to avoid setups that require unloading kernel
modules. 

Usually drivers do request_firmware() again at resume time, by hooking
into the .resume callback of the power management infrastructure. The
microcode code has something like that already. I have not tested if it works,
but that's the way to go, I expect.

Comment 111 Fedora Update System 2011-08-05 20:06:52 UTC

Package microcode_ctl-1.17-18.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing microcode_ctl-1.17-18.fc16'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/microcode_ctl-1.17-18.fc16
then log in and leave karma (feedback).

Comment 112 Dave Jones 2011-08-06 05:12:15 UTC

> If that all works out as planned, I think, we should entirely drop the udev
> rule and just place a file:
>   /usr/lib/modules-load.d/microcode.conf
> that unconditionally instructs 'init' to load the microcode.ko once
> during bootup.

If we're going to unconditionally load it, we could just make it non-modular, and avoid all the userspace modprobing/removing entirely.  It's only 11Kb of code.

Comment 113 Borislav Petkov 2011-08-06 08:54:41 UTC

(In reply to comment #112)
> > If that all works out as planned, I think, we should entirely drop the udev
> > rule and just place a file:
> >   /usr/lib/modules-load.d/microcode.conf
> > that unconditionally instructs 'init' to load the microcode.ko once
> > during bootup.
> 
> If we're going to unconditionally load it, we could just make it non-modular,
> and avoid all the userspace modprobing/removing entirely.  It's only 11Kb of
> code.

I don't think you can do that when you're using the request_firmware() interface because you need userspace to load the firmware blob - it has to be module.

Comment 114 Kay Sievers 2011-08-06 14:14:13 UTC

(In reply to comment #113)
> > If we're going to unconditionally load it, we could just make it non-modular,
> > and avoid all the userspace modprobing/removing entirely.  It's only 11Kb of
> > code.
> 
> I don't think you can do that when you're using the request_firmware()
> interface because you need userspace to load the firmware blob - it has to be
> module.

Right, we need to delay the init of microcode.ko until userspace is ready to
fullfill firmware requests. Like with other drivers which need to request
firmware, it simplifies the setup when they are loadable modules.

The simplest fix today is to force-load the module after udev is running.
In the future, I expect, we will be able to add specific CPU aliases to
microcode.ko, which will allow us to drop the force-loading entry.

Comment 115 Fedora Update System 2011-08-09 00:04:58 UTC

kernel-3.0.1-3.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.0.1-3.fc16

Comment 116 Adam Williamson 2011-08-15 17:18:31 UTC

I think we agreed at the last blocker review meeting to drop this frm the list. It was on the list for observation to make sure we didn't wind up back in the state where AMDs didn't boot. For the record, Alpha will ship in the 'broken-but-working' state where microcode updates don't work, but everything boots. (Not messing with the secondary arch blockers, they can look after themselves).

Comment 117 Peter Hollenbeck 2011-08-15 19:14:27 UTC

Is there a simple fix that can be applied by a person with intermediate Linux skills?
I would like to run Fedora 15 on my Via Nano.
Thanks, Peter

Comment 118 Michal Jaegermann 2011-08-15 19:54:53 UTC

(In reply to comment #117)
> Is there a simple fix that can be applied by a person with intermediate Linux
> skills?
One of possible ways if you really need it. Drop in /etc/modprobe.d/ a file named, say, blacklist-microcode.conf (.conf suffix is important) with the following line in it:

blacklist microcode

That should block attempts to load that module. If you cannot boot then this can be done after booting "rescue" from some suitable media.

Comment 119 Joshua Covington 2011-08-16 11:53:55 UTC

microcode_ctl-1.17-18.fc16 in comment#111 DOES fix the problem. This is not a 'broken-but-working' patch. kernel-3.0.1-3.fc16 in comment#115 also has the needed kernel part.

Unfortunatelly alpha was shipped with still broken packages. I don't see why those two packages cannot be updated. This is definitely a blocker and a proven fix already exists.

Where's the problem???

Comment 120 Fedora Update System 2011-08-16 12:46:40 UTC

kernel-2.6.40.3-0.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.40.3-0.fc15

Comment 121 Adam Williamson 2011-08-16 17:46:02 UTC

joshua: we're not taking a hugely different kernel at this point in the process. The bug that 'microcode updates don't get loaded on boot' is not a blocker bug. 'AMD systems don't boot' would be a blocker bug, but we've made sure that isn't the case for Alpha. You can get the builds that correctly fix things so microcode gets loaded with your first post-install update.

Comment 122 Fedora Update System 2011-08-17 00:51:58 UTC

microcode_ctl-1.17-18.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 123 Joshua Covington 2011-08-17 05:52:24 UTC

(In reply to comment #121)
> joshua: we're not taking a hugely different kernel at this point in the
> process. The bug that 'microcode updates don't get loaded on boot' is not a
> blocker bug. 'AMD systems don't boot' would be a blocker bug, but we've made
> sure that isn't the case for Alpha. You can get the builds that correctly fix
> things so microcode gets loaded with your first post-install update.

[OFFTOPIC]

This is just an interpretation.

kernel-3.0.1 is a bug-fix for 3.0.0 and microcode_ctl-1.17-18 is a big-fix for -17. Where is this "hugely different kernel"?

Fedora should either get used to questions like "Is there a simple fix that can be applied by a person with intermediate Linux skills? (and the answer is: use the update button)" or continue with the current process of "download 4Gb of install media and you'll have to download 2GB more of already tested zero-date updates". It's simple, isn't it?

I do accept your position of "hugely different package at this point in the
process" but we both are talking about approved and tested bug-fixes that doesn't introduce any new features. Those should get in before the final release, but I think this is not the right plase to discuss this issue.

Comment 124 Adam Williamson 2011-08-17 08:30:08 UTC

They will get in before the final release. We're talking about the *Alpha* here.

Comment 125 Peter Hollenbeck 2011-08-17 16:48:42 UTC

I downloaded Fedora-15-x86_64-DVD.iso (3.6 GB) August 11.
The target is a Via Nano system. It is off the grid, with a slow satellite internet connection, so I can do only limited online updating. I am leaving for this location in two days. I would like to take all DVDs necessary to install and fix the microcode bug.

Should I fetch a new copy of Fedora-15? Where is the 2GB update referred to in comment 123? Can it be downloaded and installed subsequent to the primary install?

Should I be using Fedora 14, rather than 15?

Thanks for your reply(s),
Peter
Intermediate Linuxician (like intermediate skier)

Comment 126 Joshua Covington 2011-08-17 21:28:43 UTC

The fix consists of two packages (for 15):

1. microcode_ctl-1.17-18.fc15
2. kernel-2.6.40.3-0.fc15

Those should already be in the stable repos. If not you can download them from http://koji.fedoraproject.org/koji/. You don't need anything else.

Comment 127 Fedora Update System 2011-08-18 02:29:35 UTC

kernel-2.6.40.3-0.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 128 Fedora Update System 2011-08-22 14:50:23 UTC

kernel-3.0.1-3.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 129 Fedora Update System 2011-08-22 15:26:01 UTC

microcode_ctl-1.17-18.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 130 Peter Hollenbeck 2011-09-05 19:37:57 UTC

Can questions still be asked or does "Status: Closed Errata" mean this bug is closed to further questions or comments?
Peter

Comment 131 Adam Williamson 2011-09-07 23:06:32 UTC

you can always ask questions.

Note You need to log in before you can comment on or make changes to this bug.

alejandronova
antillon.maurizio
anton
awilliam
borislav.petkov
bruno
ejsheldrake
gansalmon
itamar
jdulaney
john.ellson
jonathan
joshuacov
karo1170
kay
kernel-maint
madhu.chinakonda
me
michal
nobody+296696
nobody
prl-bugzilla.redhat.com
pwhbeck
robatino
s.bennett
stadtkind2
tflink
vonbrand