476981 – Do not load powernow-k8 cpufreq driver on AMD Xen systems

Bug 476981 - Do not load powernow-k8 cpufreq driver on AMD Xen systems

Summary: Do not load powernow-k8 cpufreq driver on AMD Xen systems

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.3
Hardware:	All
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Prarit Bhargava
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-12-18 13:45 UTC by Prarit Bhargava
Modified:	2008-12-18 13:48 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-12-18 13:48:08 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Prarit Bhargava 2008-12-18 13:45:11 UTC

Description of problem: Patching and repatching lead to cases where the powernow-k8 driver oopsed when booting on the RHEL5 Xen kernel


Version-Release number of selected component (if applicable): -125.el5


How reproducible: 5% - 10%


Steps to Reproduce:
1. Boot Xen kernel

  
Actual results:
Checking for hardware changes [  OK  ]
Unable to handle kernel paging request at ffff8800000ce000 RIP: 
 [<ffffffff8020bbb1>] memcmp+0x8/0x22
PGD f5f067 PUD f60067 PMD f61067 PTE 0
Oops: 0000 [1] SMP 
last sysfs file: /class/net/eth0/address
CPU 0 
Modules linked in: powernow_k8 freq_table dm_multipath scsi_dh scsi_mod
parport_pc lp parport xennet pcspkr dm_snapshot dm_zero dm_mirror dm_log dm_mod
xenblk ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 817, comm: modprobe Not tainted 2.6.18-126.el5xen #1
RIP: e030:[<ffffffff8020bbb1>]  [<ffffffff8020bbb1>] memcmp+0x8/0x22
RSP: e02b:ffff88001d365bf0  EFLAGS: 00010206
RAX: 0000000000000041 RBX: 0000000000000000 RCX: 000000000000000a
RDX: 000000000000000a RSI: ffffffff881760fd RDI: ffff8800000ce000
RBP: ffff88001dd394c0 R08: 0000000000000001 R09: ffff880000098e00
R10: 0000000000000003 R11: 0000000000000000 R12: ffff8800000ce000
R13: 0000000000000000 R14: ffff880000098e00 R15: 00000000fffffff4
FS:  00002ae0b8bc7240(0000) GS:ffffffff805ba000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process modprobe (pid: 817, threadinfo ffff88001d364000, task ffff88001f64b820)
Stack:  ffffffff88174a3c  000000001d365c78  0000000000000003  0000000000000001 
 ffff88001fdb78c0  0000000000000001  ffff88000001fa70  0000000000000001 
 0000000000000000  ffff88000001fa68 
Call Trace:
 [<ffffffff88174a3c>] :powernow_k8:powernowk8_cpu_init+0x55c/0xdec
 [<ffffffff802855c8>] __wake_up_common+0x3e/0x68
 [<ffffffff8028816d>] __cond_resched+0x1c/0x44
 [<ffffffff80263a0d>] _spin_lock_irq+0x9/0x14
 [<ffffffff80262099>] wait_for_completion+0xa1/0xaa
 [<ffffffff80263a0d>] _spin_lock_irq+0x9/0x14
 [<ffffffff8026349f>] __down_write_nested+0x35/0x9a
 [<ffffffff804043f3>] cpufreq_add_dev+0x174/0x57f
 [<ffffffff8021a69c>] vsnprintf+0x559/0x59e
 [<ffffffff802639f9>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff80217548>] release_console_sem+0x1b1/0x205
 [<ffffffff8028b9f5>] vprintk+0x308/0x329
 [<ffffffff80261ead>] thread_return+0x96/0x113
 [<ffffffff80286bc9>] task_rq_lock+0x3f/0x71
 [<ffffffff8028830a>] set_cpus_allowed+0xb2/0xbf
 [<ffffffff8028ba68>] printk+0x52/0xc6
 [<ffffffff8039fb09>] sysdev_driver_register+0x61/0xbd
 [<ffffffff80403423>] cpufreq_register_driver+0xb9/0x194
 [<ffffffff802a01a7>] sys_init_module+0xaf/0x1e8
 [<ffffffff8025f106>] system_call+0x86/0x8b
 [<ffffffff8025f080>] system_call+0x0/0x8b

In -127.el5 a patch was put in that backed out part of the changes for commit
091abe5d3909330bb46200e58239c92ca415df5e, however there is stale code in the powernowk8_init() function that must be removed.

Expected results:  Panic no longer occurs after -126.el5 ... but we should clean up the code.


Additional info: The attempted method of aborting the module load will not work with newer AMD processors because cpu_family != CPU_OPTERON.  Just go back to the original code.

Comment 1 Prarit Bhargava 2008-12-18 13:48:08 UTC

Uh.  Nevermind.  -128.el5 looks correct:

static int __cpuinit powernowk8_init(void)
{
        unsigned int i, supported_cpus = 0;

#ifdef CONFIG_XEN
        if (!is_initial_xendomain()) {
                /* Xen PV domU's can't possibly do powersaving; bail */
                return -EPERM;
        }
#endif

        for_each_online_cpu(i) {
                if (check_supported_cpu(i))
                        supported_cpus++;
        }

        if (supported_cpus != num_online_cpus())
                return -ENODEV;

        /* AMD provides AGESA library modules for use in their BIOS. The
           default AGESA code creates the _PSD with the assumption the APICs
           are numbered per the BKDG HOWEVER, there is a callback
           (ibvPSDApicIDtoNode) to set the APIC ID to node translation for _PSD
           dependency domains if the system numbers the APICs differently.

           It looks like HP did not follow spec on both fronts (it numbered
           differently from the BKDG as well as did not implement the callback
           to set the domains properly).

           AMD reports that HP is the only vendor to implement CPU enumeration
           this way. */
        if (preregister_acpi_perf == 1 && cpu_family == CPU_OPTERON) {
                char * dmi_data = dmi_get_system_info(DMI_BIOS_VENDOR);
                if (dmi_data && !strncmp(dmi_data, "Hewlett-Packard", 15)) {
                        /* Disable preregistering ACPI data for HP AMD Opteron
                           systems */
                        preregister_acpi_perf = 0;
                }
        }

Note You need to log in before you can comment on or make changes to this bug.