Bug 583623 - wlan: p54pci kernel oops about freeing non dma memory + random hangs
Summary: wlan: p54pci kernel oops about freeing non dma memory + random hangs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 13
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Hans de Goede
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-04-19 09:18 UTC by Hans de Goede
Modified: 2010-08-07 05:54 UTC (History)
8 users (show)

Fixed In Version: kernel-2.6.33.3-79.fc13
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-08-07 05:54:59 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
excerpt from /var/log/messages with the oops (3.80 KB, text/plain)
2010-04-19 09:18 UTC, Hans de Goede
no flags Details
p54pci.ko for comment #4 oops (237.67 KB, application/octet-stream)
2010-04-21 15:43 UTC, Hans de Goede
no flags Details
p54pci.ko used with comment #4 and comment #12 oops (278.04 KB, application/octet-stream)
2010-04-21 18:43 UTC, Hans de Goede
no flags Details
verbose test patch (1.02 KB, patch)
2010-04-21 19:12 UTC, chunkeey
no flags Details | Diff
fix const ring index in p54p_check_tx_ring (502 bytes, patch)
2010-04-21 19:41 UTC, chunkeey
no flags Details | Diff
All patches (5.42 KB, patch)
2010-04-22 13:34 UTC, chunkeey
no flags Details | Diff
patch missing from the p54pci.ko from attachment in comment #9 (5.51 KB, patch)
2010-04-22 17:19 UTC, Hans de Goede
no flags Details | Diff

Description Hans de Goede 2010-04-19 09:18:01 UTC
Created attachment 407516 [details]
excerpt from /var/log/messages with the oops

I get a kernel oopses about the p54pci driver using dma memory free
functions on memory not allocated by the dma subsys (see attached file).

And when the wlan card in question (cardbus card) is present in the machine
I experience random hangs.

Note that as is visible in the logs I'm using the card with older firmware then recommended as this model does not work with the latest advised firmware (the firmware I'm using is from the windows driver which shipped with the card).

Comment 1 Hans de Goede 2010-04-19 09:27:47 UTC
Update: I tried with the new / recommended firmware as it was a while I last tried that and it might work with the new firmware now. I managed to get associated with the AP before it hung. It also seems to hang much much quicker / more often with the new firmware. It still is giving the same dma error.

Comment 2 Hans de Goede 2010-04-19 15:08:30 UTC
Note going through git trees and google found me:
https://patchwork.kernel.org/patch/53004/

And:
http://git.kernel.org/?p=linux/kernel/git/linville/wireless-testing.git;a=commitdiff;h=5988f385b4cffa9ca72c5be0188e5f4c9ef46d82

Which partially overlap. I'm going to build a kernel from src and try these patches, I'll let you know how it goes.

Comment 3 chunkeey 2010-04-19 15:32:09 UTC
These bugs could be fixed by:
 * p54pci: handle dma mapping errors
 * p54pci: move tx cleanup into tasklet
 * p54pci: revise tx locking
 * p54pci: prevent stuck rx-ring on slow system (+ sparse patch)

all but the last one have been merged into 2.6.34-rc.
However everything is present in wireless-testing or compat-wireless.

Comment 4 Hans de Goede 2010-04-19 21:23:31 UTC
Ok,

So after building the F-13 2.6.33.2-47 kernel + all of the patches from comment #3 (handle dma mapping errors is already in that kernel) I got the following kernel panic instead of the hang I had before:

BUG: unable to handle kernel paging request at 6b6b6b6b                         
IP: [<e122284a>] p54p_check_tx_ring+0x84/0xb1 [p54pci]                          
*pde = 00000000                                                                 
Oops: 0000 [#1] SMP                                                             
last sysfs file: /sys/devices/pci0000:00/0000:00:1f.1/host0/target0:0:0/0:0:0:0e
Modules linked in: ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv]
                                                                                
Pid: 0, comm: swapper Tainted: G        W  2.6.33.2-47.fc13.i686 #1 0744/Evo N6 
EIP: 0060:[<e122284a>] EFLAGS: 00010286 CPU: 0                                  
EIP is at p54p_check_tx_ring+0x84/0xb1 [p54pci]                                 
EAX: 6b6b6b6b EBX: df10b170 ECX: 00000003 EDX: 00000001                         
ESI: dc471500 EDI: d8acaeb0 EBP: c098be9c ESP: c098be84                         
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068                                   
Process swapper (pid: 0, ti=c098a000 task=c09ccfe0 task.ti=c098a000)            
Stack:                                                                          
 00000015 d8aca450 00000014 d8acb5a0 df10b080 d8aca450 c098bebc e1222b02        
<0> df10b080 00000020 d8acb5a0 d8acb51c c0abdb44 00000000 c098bed4 c0440568     
<0> d8acb520 00000018 0000000b 00000006 c098bf00 c0440ed3 00000000 c0abd740     
Call Trace:                                                                     
 [<e1222b02>] ? p54p_tasklet+0xaa/0xb5 [p54pci]                                 
 [<c0440568>] ? tasklet_action+0x78/0xcb                                        
 [<c0440ed3>] ? __do_softirq+0xbc/0x173                                         
 [<c0440fc5>] ? do_softirq+0x3b/0x5f                                            
 [<c0441108>] ? irq_exit+0x3a/0x6d                                              
 [<c04047a4>] ? do_IRQ+0x8b/0x9f                                                
 [<c0403975>] ? common_interrupt+0x35/0x3c                                      
 [<c046007b>] ? debug_mutex_wake_waiter+0x9b/0xd2                               
 [<c06279a3>] ? acpi_idle_enter_simple+0x11e/0x151                              
 [<c0701ef5>] ? cpuidle_idle_call+0x77/0xca                                     
 [<c04025be>] ? cpu_idle+0x9b/0xb5                                              
 [<c07a051b>] ? rest_init+0x67/0x69                                             
 [<c0a35928>] ? start_kernel+0x378/0x37d                                        
 [<c0a35099>] ? i386_start_kernel+0x99/0xa0                                     
Code: 8b 13 e8 9a f9 ff ff 85 f6 c7 03 00 00 00 00 c7 43 04 00 00 00 00 66 c7 4 
EIP: [<e122284a>] p54p_check_tx_ring+0x84/0xb1 [p54pci] SS:ESP 0068:c098be84    
CR2: 000000006b6b6b6b                                                           
---[ end trace ef906c2fe354d22c ]---                                            
Kernel panic - not syncing: Fatal exception in interrupt                        
Pid: 0, comm: swapper Tainted: G      D W  2.6.33.2-47.fc13.i686 #1             
Call Trace:                                                                     
 [<c07b17f3>] ? printk+0x14/0x19
 [<c07b1727>] panic+0x3e/0xf6                                                   
 [<c07b5537>] oops_end+0x97/0xa6                                                
 [<c0422366>] no_context+0x115/0x11f                                            
 [<c0422468>] __bad_area_nosemaphore+0xf8/0x100                                 
 [<c07b6816>] ? do_page_fault+0x0/0x337                                         
 [<c0422482>] bad_area_nosemaphore+0x12/0x15                                    
 [<c07b69d6>] do_page_fault+0x1c0/0x337                                         
 [<c07b6816>] ? do_page_fault+0x0/0x337                                         
 [<c07b4b24>] error_code+0x78/0x80                                              
 [<e122284a>] ? p54p_check_tx_ring+0x84/0xb1 [p54pci]                           
 [<e1222b02>] p54p_tasklet+0xaa/0xb5 [p54pci]                                   
 [<c0440568>] tasklet_action+0x78/0xcb                                          
 [<c0440ed3>] __do_softirq+0xbc/0x173                                           
 [<c0440fc5>] do_softirq+0x3b/0x5f                                              
 [<c0441108>] irq_exit+0x3a/0x6d                                                
 [<c04047a4>] do_IRQ+0x8b/0x9f                                                  
 [<c0403975>] common_interrupt+0x35/0x3c                                        
 [<c046007b>] ? debug_mutex_wake_waiter+0x9b/0xd2                               
 [<c06279a3>] ? acpi_idle_enter_simple+0x11e/0x151                              
 [<c0701ef5>] cpuidle_idle_call+0x77/0xca                                       
 [<c04025be>] cpu_idle+0x9b/0xb5                                                
 [<c07a051b>] rest_init+0x67/0x69                                               
 [<c0a35928>] start_kernel+0x378/0x37d                                          
 [<c0a35099>] i386_start_kernel+0x99/0xa0

Comment 5 Hans de Goede 2010-04-19 21:27:32 UTC
So I tried again this time only adding the
 * p54pci: prevent stuck rx-ring on slow system (+ sparse patch)

Changes to the 2.6.33.2-47 kernel, and I now managed to browse the web, with the new firmware. Then I tried a suspend / resume cycle, but the suspend never completed.

Upon powering off the laptop, and powering it back up, I know get the following
lockdep report:

=============================================                                   
[ INFO: possible recursive locking detected ]                                   
2.6.33.2-47.fc13.i686 #1                                                        
---------------------------------------------                                   
phy0/672 is trying to acquire lock:                                             
 (&(&priv->lock)->rlock){-.-...}, at: [<e1222b84>] p54p_tx+0x2f/0x11b [p54pci]  
                                                                                
but task is already holding lock:                                               
 (&(&priv->lock)->rlock){-.-...}, at: [<e122289f>] p54p_interrupt+0x2a/0xe9 [p5]
                                                                                
other info that might help us debug this:                                       
5 locks held by phy0/672:                                                       
 #0:  ((wiphy_name(local->hw.wiphy))){+.+.+.}, at: [<c044ecf1>] worker_thread+02
 #1:  ((&ifmgd->work)){+.+.+.}, at: [<c044ecf1>] worker_thread+0x15d/0x262      
 #2:  (&ifmgd->mtx){+.+.+.}, at: [<e1160b2d>] ieee80211_sta_work+0xd0/0xcf3 [ma]
 #3:  (&priv->conf_mutex){+.+.+.}, at: [<e119fe19>] p54_bss_info_changed+0x25/0]
 #4:  (&(&priv->lock)->rlock){-.-...}, at: [<e122289f>] p54p_interrupt+0x2a/0xe]
                                                                                
stack backtrace:                                                                
Pid: 672, comm: phy0 Tainted: G        W  2.6.33.2-47.fc13.i686 #1              
Call Trace:
 [<c07b17f3>] ? printk+0x14/0x19                                                
 [<c046382b>] __lock_acquire+0x83f/0xb89                                        
 [<c04575eb>] ? sched_clock_cpu+0x125/0x12d                                     
 [<c0463c08>] lock_acquire+0x93/0xb1                                            
 [<e1222b84>] ? p54p_tx+0x2f/0x11b [p54pci]                                     
 [<c07b3b41>] _raw_spin_lock_irqsave+0x37/0x6a                                  
 [<e1222b84>] ? p54p_tx+0x2f/0x11b [p54pci]                                     
 [<e1222b84>] p54p_tx+0x2f/0x11b [p54pci]                                       
 [<e119edac>] p54_tx_pending+0x16c/0x174 [p54common]                            
 [<e119ee3a>] p54_tx_qos_accounting_free+0x86/0xd0 [p54common]                  
 [<c071f231>] ? skb_unlink+0x3c/0x41                                            
 [<e119f99f>] p54_free_skb+0x26/0x33 [p54common]                                
 [<e1222826>] p54p_check_tx_ring+0x60/0xaf [p54pci]                             
 [<e1222915>] p54p_interrupt+0xa0/0xe9 [p54pci]                                 
 [<c0488f0d>] handle_IRQ_event+0x4b/0xf6                                        
 [<c048a970>] handle_level_irq+0x6a/0xb9                                        
 [<c0404ef0>] handle_irq+0x40/0x4c                                              
 [<c040475f>] do_IRQ+0x46/0x9f                                                  
 [<c0403975>] common_interrupt+0x35/0x3c                                        
 [<c046007b>] ? debug_mutex_wake_waiter+0x9b/0xd2                               
 [<c07b41b8>] ? _raw_spin_unlock_irqrestore+0x45/0x4d                           
 [<c071f268>] skb_queue_tail+0x32/0x37                                          
 [<e119f474>] p54_tx+0x13/0x1d [p54common]                                      
 [<e119e795>] p54_scan+0x2ce/0x30b [p54common]
 [<e119ff78>] p54_bss_info_changed+0x184/0x1d9 [p54common]                      
 [<e1158752>] ieee80211_bss_info_change_notify+0x140/0x149 [mac80211]           
 [<e119fdf4>] ? p54_bss_info_changed+0x0/0x1d9 [p54common]                      
 [<e1160360>] ieee80211_rx_mgmt_assoc_resp+0x77d/0x7fa [mac80211]               
 [<e1160e4c>] ieee80211_sta_work+0x3ef/0xcf3 [mac80211]                         
 [<c05c7fe7>] ? debug_object_deactivate+0x9a/0xc2                               
 [<c04613bd>] ? lock_release_holdtime+0x31/0xd6                                 
 [<c044ed33>] worker_thread+0x19f/0x262                                         
 [<c044ecf1>] ? worker_thread+0x15d/0x262                                       
 [<e1160a5d>] ? ieee80211_sta_work+0x0/0xcf3 [mac80211]                         
 [<c04524c8>] ? autoremove_wake_function+0x0/0x34                               
 [<c044eb94>] ? worker_thread+0x0/0x262                                         
 [<c045214c>] kthread+0x6f/0x74                                                 
 [<c04520dd>] ? kthread+0x0/0x74                                                
 [<c0403982>] kernel_thread_helper+0x6/0x10                                     
BUG: spinlock lockup on CPU#0, phy0/672, d86d3534                               
Pid: 672, comm: phy0 Tainted: G        W  2.6.33.2-47.fc13.i686 #1              
Call Trace:                                                                     
 [<c07b17f3>] ? printk+0x14/0x19                                                
 [<c05c7341>] do_raw_spin_lock+0xfd/0x123                                       
 [<c07b3b60>] _raw_spin_lock_irqsave+0x56/0x6a                                  
 [<e1222b84>] p54p_tx+0x2f/0x11b [p54pci]                                       
 [<e119edac>] p54_tx_pending+0x16c/0x174 [p54common]                            
 [<e119ee3a>] p54_tx_qos_accounting_free+0x86/0xd0 [p54common]
 [<c071f231>] ? skb_unlink+0x3c/0x41                                            
 [<e119f99f>] p54_free_skb+0x26/0x33 [p54common]                                
 [<e1222826>] p54p_check_tx_ring+0x60/0xaf [p54pci]                             
 [<e1222915>] p54p_interrupt+0xa0/0xe9 [p54pci]                                 
 [<c0488f0d>] handle_IRQ_event+0x4b/0xf6                                        
 [<c048a970>] handle_level_irq+0x6a/0xb9                                        
 [<c0404ef0>] handle_irq+0x40/0x4c                                              
 [<c040475f>] do_IRQ+0x46/0x9f                                                  
 [<c0403975>] common_interrupt+0x35/0x3c                                        
 [<c046007b>] ? debug_mutex_wake_waiter+0x9b/0xd2                               
 [<c07b41b8>] ? _raw_spin_unlock_irqrestore+0x45/0x4d                           
 [<c071f268>] skb_queue_tail+0x32/0x37                                          
 [<e119f474>] p54_tx+0x13/0x1d [p54common]                                      
 [<e119e795>] p54_scan+0x2ce/0x30b [p54common]                                  
 [<e119ff78>] p54_bss_info_changed+0x184/0x1d9 [p54common]                      
 [<e1158752>] ieee80211_bss_info_change_notify+0x140/0x149 [mac80211]           
 [<e119fdf4>] ? p54_bss_info_changed+0x0/0x1d9 [p54common]                      
 [<e1160360>] ieee80211_rx_mgmt_assoc_resp+0x77d/0x7fa [mac80211]               
 [<e1160e4c>] ieee80211_sta_work+0x3ef/0xcf3 [mac80211]                         
 [<c05c7fe7>] ? debug_object_deactivate+0x9a/0xc2                               
 [<c04613bd>] ? lock_release_holdtime+0x31/0xd6                                 
 [<c044ed33>] worker_thread+0x19f/0x262                                         
 [<c044ecf1>] ? worker_thread+0x15d/0x262                                       
 [<e1160a5d>] ? ieee80211_sta_work+0x0/0xcf3 [mac80211]                         
 [<c04524c8>] ? autoremove_wake_function+0x0/0x34                               
 [<c044eb94>] ? worker_thread+0x0/0x262                                         
 [<c045214c>] kthread+0x6f/0x74                                                 
 [<c04520dd>] ? kthread+0x0/0x74                                                
 [<c0403982>] kernel_thread_helper+0x6/0x10                                     
sending NMI to all CPUs:                                                        
<dead>

Comment 6 Hans de Goede 2010-04-19 21:51:06 UTC
Ok,

I did some more tests with 2.6.33.2-47 +
 * p54pci: prevent stuck rx-ring on slow system (+ sparse patch)

And it seems to work well. I've been unable to re-trigger the lockdep
issue (+ subsequent hang)

Note that with this combi, I still get the oops from the original report, but the random system hangs seem gone.

I would also like to point out once more that the following 2 commits:
 * p54pci: move tx cleanup into tasklet
 * p54pci: revise tx locking

(Not tried out separately only as set) caused a kernel panic (I've not checked if it is reproducable). I'm doing all this with a cardbus card with an isl3886 on a compaq ev0 n600c laptop (a bit old, but 1Ghz PIII + 512 MB still work fine with F-13), the advantage of using such an old laptop is that it comes with a serial port which is how I've been able to capture the lockdep hang and kernel panic backtraces,

Let me know if / what sort of other tests I can do to help pin this issue down.

Regards,

Hans

Comment 7 chunkeey 2010-04-20 09:59:38 UTC
Hmm, this is unfortunate, because
 * p54pci: move tx cleanup into tasklet
 * p54pci: revise tx locking
have fixed a bug which is described in

https://bugzilla.kernel.org/show_bug.cgi?id=11386

But back to bug from comment #4:

BUG: unable to handle kernel paging request at 6b6b6b6b                         
IP: [<e122284a>] p54p_check_tx_ring+0x84/0xb1 [p54pci]                          
*pde = 00000000                                                                 
Oops: 0000 [#1] SMP                                                             
last sysfs file:
/sys/devices/pci0000:00/0000:00:1f.1/host0/target0:0:0/0:0:0:0e
Modules linked in: ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables
Pid: 0, comm: swapper Tainted: G        W  2.6.33.2-47.fc13.i686 #1 0744/Evo N6 
EIP: 0060:[<e122284a>] EFLAGS: 00010286 CPU: 0                                  
EIP is at p54p_check_tx_ring+0x84/0xb1 [p54pci]                                 
EAX: 6b6b6b6b EBX: df10b170 ECX: 00000003 EDX: 00000001                         
ESI: dc471500 EDI: d8acaeb0 EBP: c098be9c ESP: c098be84                         
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068                                   
Process swapper (pid: 0, ti=c098a000 task=c09ccfe0 task.ti=c098a000)            
Stack:                                                                          
 00000015 d8aca450 00000014 d8acb5a0 df10b080 d8aca450 c098bebc e1222b02        
<0> df10b080 00000020 d8acb5a0 d8acb51c c0abdb44 00000000 c098bed4 c0440568     
<0> d8acb520 00000018 0000000b 00000006 c098bf00 c0440ed3 00000000 c0abd740     
Call Trace:                                                                     
 [<e1222b02>] ? p54p_tasklet+0xaa/0xb5 [p54pci]                                 
 [<c0440568>] ? tasklet_action+0x78/0xcb                                        
 [<c0440ed3>] ? __do_softirq+0xbc/0x173                                         
 [<c0440fc5>] ? do_softirq+0x3b/0x5f                                            
 [<c0441108>] ? irq_exit+0x3a/0x6d                                              
 [<c04047a4>] ? do_IRQ+0x8b/0x9f                                                
 [<c0403975>] ? common_interrupt+0x35/0x3c                                      
 [<c046007b>] ? debug_mutex_wake_waiter+0x9b/0xd2                               
 [<c06279a3>] ? acpi_idle_enter_simple+0x11e/0x151                              
 [<c0701ef5>] ? cpuidle_idle_call+0x77/0xca                                     
 [<c04025be>] ? cpu_idle+0x9b/0xb5                                              
 [<c07a051b>] ? rest_init+0x67/0x69                                             
 [<c0a35928>] ? start_kernel+0x378/0x37d                                        
 [<c0a35099>] ? i386_start_kernel+0x99/0xa0                                     
Code: 8b 13 e8 9a f9 ff ff 85 f6 c7 03 00 00 00 00 c7 43 04 00 00 00 00 66 c7 4 
EIP: [<e122284a>] p54p_check_tx_ring+0x84/0xb1 [p54pci] SS:ESP 0068:c098be84    
CR2: 000000006b6b6b6b                                                           
---[ end trace ef906c2fe354d22c ]---                                            
Kernel panic - not syncing: Fatal exception in interrupt

The kernel oopses _somewhere_ inside p54p_check_tx_ring,
because it is trying to access 0x6b6b6b6b. Or in other words,
We dereference a pointer which was stored inside
a recently freed container...

Can you please run that oops through ksymoops and friends
in order to trace down at which "code" line the bug happens,
or simply attach the p54* modules and I'll take a look later.

Comment 8 Hans de Goede 2010-04-21 15:42:27 UTC
Hi,

I'm afraid I've not kept the p54pci.ko from when I hit the trace in comment #4, as said I reverted the following patches:

 * p54pci: move tx cleanup into tasklet
 * p54pci: revise tx locking
 * p54pci: prevent stuck rx-ring on slow system (+ sparse patch)
   (partial, only the moving of the p54p_check_tx_ring functions, as that
    seemed related to the above 2).

And then did "make modules" again. I've just re-applied them and did "make modules" again. I'll attach the resulting p54pci.ko which should be the same but I cannot 100% guarantee that. I also tried reproducing the oops, but it did not happen a second time :(

p.s.

All docs I can find tell me that ksymoops should not be used with 2.6 kernels, can you give me a quick howto on how to translate things like:
"p54p_check_tx_ring+0x84/0xb1" into a line number inside the function, I've googled but not found anything helpful. I would like to help debug this further and knowing how to go to a line number would certainly help me do this.

Thanks,

Hans

Comment 9 Hans de Goede 2010-04-21 15:43:10 UTC
Created attachment 408116 [details]
p54pci.ko for comment #4 oops

Comment 10 Hans de Goede 2010-04-21 15:46:46 UTC
Note that as said I tried to produce the panic from comment #4 with a p54pci.ko with all patches from wireless-next applied, but I've failed, I did get a lockdep oops followed by a hung machine like in comment #5, here is the oops output (from the serial console again):

=============================================                                   
[ INFO: possible recursive locking detected ]                                   
2.6.33.2-47.fc13.i686 #1                                                        
---------------------------------------------                                   
phy0/664 is trying to acquire lock:                                             
 (&(&priv->lock)->rlock){-.-...}, at: [<e17efb84>] p54p_tx+0x2f/0x11b [p54pci]  
                                                                                
but task is already holding lock:                                               
 (&(&priv->lock)->rlock){-.-...}, at: [<e17ef89f>] p54p_interrupt+0x2a/0xe9 [p5]
                                                                                
other info that might help us debug this:                                       
5 locks held by phy0/664:                                                       
 #0:  ((wiphy_name(local->hw.wiphy))){+.+.+.}, at: [<c044ecf1>] worker_thread+02
 #1:  ((&ifmgd->work)){+.+.+.}, at: [<c044ecf1>] worker_thread+0x15d/0x262      
 #2:  (&ifmgd->mtx){+.+.+.}, at: [<e1160b2d>] ieee80211_sta_work+0xd0/0xcf3 [ma]
 #3:  (&priv->conf_mutex){+.+.+.}, at: [<e1423b82>] p54_conf_tx+0x27/0x7e [p54c]
 #4:  (&(&priv->lock)->rlock){-.-...}, at: [<e17ef89f>] p54p_interrupt+0x2a/0xe]
                                                                                
stack backtrace:                                                                
Pid: 664, comm: phy0 Tainted: G        W  2.6.33.2-47.fc13.i686 #1              
 * p54pci: move tx cleanup into tasklet
 * p54pci: revise tx locking
 * p54pci: prevent stuck rx-ring on slow system (+ sparse patch)
 [<e1422206>] p54_set_edcf+0x95/0x9f [p54common]                                
 [<e1423bc0>] p54_conf_tx+0x65/0x7e [p54common]                                 
 [<e1423b5b>] ? p54_conf_tx+0x0/0x7e [p54common]                                
 [<e116b4a7>] ieee80211_set_wmm_default+0xef/0x104 [mac80211]                   
 [<e11601d5>] ieee80211_rx_mgmt_assoc_resp+0x5f2/0x7fa [mac80211]               
 [<e1160e4c>] ieee80211_sta_work+0x3ef/0xcf3 [mac80211]                         
 [<c05c7fe7>] ? debug_object_deactivate+0x9a/0xc2                               
 [<c04613bd>] ? lock_release_holdtime+0x31/0xd6                                 
 [<c044ed33>] worker_thread+0x19f/0x262                                         
 [<c044ecf1>] ? worker_thread+0x15d/0x262                                       
 [<e1160a5d>] ? ieee80211_sta_work+0x0/0xcf3 [mac80211]                         
 [<c04524c8>] ? autoremove_wake_function+0x0/0x34                               
 [<c044eb94>] ? worker_thread+0x0/0x262                                         
 [<c045214c>] kthread+0x6f/0x74                                                 
 [<c04520dd>] ? kthread+0x0/0x74                                                
 [<c0403982>] kernel_thread_helper+0x6/0x10                                     
BUG: spinlock lockup on CPU#0, phy0/664, d7d93534                               
Pid: 664, comm: phy0 Tainted: G        W  2.6.33.2-47.fc13.i686 #1              
Call Trace:                                                                     
 [<c07b17f3>] ? printk+0x14/0x19                                                
 [<c05c7341>] do_raw_spin_lock+0xfd/0x123                                       
 [<c07b3b60>] _raw_spin_lock_irqsave+0x56/0x6a                                  
 [<e17efb84>] p54p_tx+0x2f/0x11b [p54pci]                                       
 [<e1422dac>] p54_tx_pending+0x16c/0x174 [p54common]                            
 [<e1422e3a>] p54_tx_qos_accounting_free+0x86/0xd0 [p54common]                  
 [<c071f231>] ? skb_unlink+0x3c/0x41                                            
 [<e142399f>] p54_free_skb+0x26/0x33 [p54common]                                
 [<e17ef826>] p54p_check_tx_ring+0x60/0xaf [p54pci]                             
 [<e17ef915>] p54p_interrupt+0xa0/0xe9 [p54pci]                                 
 [<c0488f0d>] handle_IRQ_event+0x4b/0xf6                                        
 [<c048a970>] handle_level_irq+0x6a/0xb9                                        
 [<c0404ef0>] handle_irq+0x40/0x4c                                              
 [<c040475f>] do_IRQ+0x46/0x9f                                                  
 [<c0403975>] common_interrupt+0x35/0x3c                                        
 [<c046007b>] ? debug_mutex_wake_waiter+0x9b/0xd2                               
 [<c07b41b8>] ? _raw_spin_unlock_irqrestore+0x45/0x4d                           
 [<c071f268>] skb_queue_tail+0x32/0x37                                          
 [<e1423474>] p54_tx+0x13/0x1d [p54common]                                      
 [<e1422206>] p54_set_edcf+0x95/0x9f [p54common]                                
 [<e1423bc0>] p54_conf_tx+0x65/0x7e [p54common]                                 
 [<e1423b5b>] ? p54_conf_tx+0x0/0x7e [p54common]                                
 [<e116b4a7>] ieee80211_set_wmm_default+0xef/0x104 [mac80211]                   
 [<e11601d5>] ieee80211_rx_mgmt_assoc_resp+0x5f2/0x7fa [mac80211]               
 [<e1160e4c>] ieee80211_sta_work+0x3ef/0xcf3 [mac80211]                         
 [<c05c7fe7>] ? debug_object_deactivate+0x9a/0xc2                               
 [<c04613bd>] ? lock_release_holdtime+0x31/0xd6                                 
 [<c044ed33>] worker_thread+0x19f/0x262                                         
 [<c044ecf1>] ? worker_thread+0x15d/0x262                                       
 [<e1160a5d>] ? ieee80211_sta_work+0x0/0xcf3 [mac80211]                         
 [<c04524c8>] ? autoremove_wake_function+0x0/0x34                               
 [<c044eb94>] ? worker_thread+0x0/0x262                                         
 [<c045214c>] kthread+0x6f/0x74                                                 
 [<c04520dd>] ? kthread+0x0/0x74                                                
 [<c0403982>] kernel_thread_helper+0x6/0x10                                     
sending NMI to all CPUs:

Comment 11 Hans de Goede 2010-04-21 15:51:41 UTC
Note, that I'm still seeing the original oops I attached when filing this bug, with all variants of some / all / no patches applied every time the driver loads.

And the comment #4 oops also happened directly when the driver loaded (where as the lockdep oopses are sometime later). I'm not sure if in the case of the comment #4 oops, I also got the original report oops, I don't remember I'm afraid it might very well be I did not get it that time around.

I'm mentioning this because I believe the comment #4 oops and the original oops are related. It seems my card reports a spurious tx complete immediately upon loading the driver, and p54p_check_tx_ring then starts doing "wrong" stuff. Usually resulting in the pci_unmap_single oops from the original report, but maybe with some bad luck also sometimes triggering the comment #4 panic ?

Just a hunch I have :)

More after dinner (and putting the children in bed).

Comment 12 Hans de Goede 2010-04-21 18:41:40 UTC
Please ignore comments 8-11, I made a brown paper bag mistake :|

When trying to reproduce the panic from comment #4, after re-applying the patches I had reverted to work around comment #4, I copied over p54usb rather then p54pci to the laptop (I'm doing the compiles on a faster machine). So comment 8-11 was still using the old p54pci with the patches reverted. This also explains why it
was showing the same lockdep warning and subsequent hang as before.

So now I've tested again with the proper p54pci.ko and I can reproduce #4, here is the new panic:

BUG: unable to handle kernel paging request at 6b6b6b6b
IP: [<e09f484a>] p54p_check_tx_ring+0x84/0xb1 [p54pci]
*pde = 00000000 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed
Modules linked in: arc4 ecb p54pci p54common mac80211 cfg80211 rfkill vfat fat ]

Pid: 0, comm: swapper Tainted: G        W  2.6.33.2-47.fc13.i686 #1 0744/Evo N6 
EIP: 0060:[<e09f484a>] EFLAGS: 00010286 CPU: 0
EIP is at p54p_check_tx_ring+0x84/0xb1 [p54pci]
EAX: 6b6b6b6b EBX: dcafc0d4 ECX: 00000003 EDX: 00000001
ESI: dc7a1c00 EDI: dcaa8e80 EBP: c098be90 ESP: c098be78
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process swapper (pid: 0, ti=c098a000 task=c09ccfe0 task.ti=c098a000)
Stack:
 00000008 dcaa8420 00000007 dcaa9570 dcafc080 dcaa8420 c098beb0 e09f4b02        
<0> dcafc080 00000020 dcaa9570 dcaa94ec c0abdb44 00000000 c098bec8 c0440568     
<0> dcaa94f0 00000018 0000000b 00000006 c098bef4 c0440ed3 00000000 c0abd740     
Call Trace:                                                                     
 [<e09f4b02>] ? p54p_tasklet+0xaa/0xb5 [p54pci]                                 
 [<c0440568>] ? tasklet_action+0x78/0xcb                                        
 [<c0440ed3>] ? __do_softirq+0xbc/0x173                                         
 [<c0440fc5>] ? do_softirq+0x3b/0x5f                                            
 [<c0441108>] ? irq_exit+0x3a/0x6d                                              
 [<c04047a4>] ? do_IRQ+0x8b/0x9f                                                
 [<c0403975>] ? common_interrupt+0x35/0x3c                                      
 [<c046007b>] ? debug_mutex_wake_waiter+0x9b/0xd2                               
 [<c045f492>] ? tick_nohz_stop_sched_tick+0x375/0x3a1                           
 [<c0402597>] ? cpu_idle+0x74/0xb5                                              
 [<c07a051b>] ? rest_init+0x67/0x69
 [<c0a35928>] ? start_kernel+0x378/0x37d                                        
 [<c0a35099>] ? i386_start_kernel+0x99/0xa0                                     
Code: 8b 13 e8 9a f9 ff ff 85 f6 c7 03 00 00 00 00 c7 43 04 00 00 00 00 66 c7 4 
EIP: [<e09f484a>] p54p_check_tx_ring+0x84/0xb1 [p54pci] SS:ESP 0068:c098be78    
CR2: 000000006b6b6b6b                                                           
---[ end trace e0812f25241c25d9 ]---                                            
Kernel panic - not syncing: Fatal exception in interrupt                        
Pid: 0, comm: swapper Tainted: G      D W  2.6.33.2-47.fc13.i686 #1             
Call Trace:                                                                     
 [<c07b17f3>] ? printk+0x14/0x19                                                
 [<c07b1727>] panic+0x3e/0xf6                                                   
 [<c07b5537>] oops_end+0x97/0xa6                                                
 [<c0422366>] no_context+0x115/0x11f                                            
 [<c0422468>] __bad_area_nosemaphore+0xf8/0x100                                 
 [<c07b6816>] ? do_page_fault+0x0/0x337                                         
 [<c0422482>] bad_area_nosemaphore+0x12/0x15                                    
 [<c07b69d6>] do_page_fault+0x1c0/0x337                                         
 [<c07b6816>] ? do_page_fault+0x0/0x337                                         
 [<c07b4b24>] error_code+0x78/0x80                                              
 [<e09f484a>] ? p54p_check_tx_ring+0x84/0xb1 [p54pci]                           
 [<e09f4b02>] p54p_tasklet+0xaa/0xb5 [p54pci]                                   
 [<c0440568>] tasklet_action+0x78/0xcb                                          
 [<c0440ed3>] __do_softirq+0xbc/0x173                                           
 [<c0440fc5>] do_softirq+0x3b/0x5f
 [<c0441108>] irq_exit+0x3a/0x6d                                                
 [<c04047a4>] do_IRQ+0x8b/0x9f                                                  
 [<c0403975>] common_interrupt+0x35/0x3c                                        
 [<c046007b>] ? debug_mutex_wake_waiter+0x9b/0xd2                               
 [<c045f492>] ? tick_nohz_stop_sched_tick+0x375/0x3a1                           
 [<c0402597>] cpu_idle+0x74/0xb5                                                
 [<c07a051b>] rest_init+0x67/0x69                                               
 [<c0a35928>] start_kernel+0x378/0x37d                                          
 [<c0a35099>] i386_start_kernel+0x99/0xa0                                       

I'll also attach the *right* p54pci.ko to use with this oops.

Comment 13 Hans de Goede 2010-04-21 18:43:05 UTC
Created attachment 408147 [details]
p54pci.ko used with comment #4 and comment #12 oops

Comment 14 Hans de Goede 2010-04-21 19:06:18 UTC
Ok, I've only been staring at the code so far, not run any tests yet, but these
bits definitely are very suspicious:

First of all, it seems our tx code only uses the txdata ring, not the txmgmt ring:

static void p54p_tx(struct ieee80211_hw *dev, struct sk_buff *skb)
{
        <snip>
        device_idx = le32_to_cpu(ring_control->device_idx[1]);
        idx = le32_to_cpu(ring_control->host_idx[1]);

        <snip>

        desc = &ring_control->tx_data[i];

        <snip>

        ring_control->host_idx[1] = cpu_to_le32(idx + 1);

        <snip>
}

Notice all the hardcoded [1] array indexes here.

Then when checking the tx rings, we do:

                p54p_check_tx_ring(dev, &priv->tx_idx_mgmt,
                                   3, ring_control->tx_mgmt,
                                   ARRAY_SIZE(ring_control->tx_mgmt),
                                   priv->tx_buf_mgmt);

                p54p_check_tx_ring(dev, &priv->tx_idx_data,
                                   1, ring_control->tx_data,
                                   ARRAY_SIZE(ring_control->tx_data),
                                   priv->tx_buf_data);

Notice the 3th argument to p54p_check_tx_ring is the ring_index:

static void p54p_check_tx_ring(struct ieee80211_hw *dev, u32 *index,
        int ring_index, struct p54p_desc *ring, u32 ring_limit,
        void **tx_buf)
{

But we are not using it, instead we are always looking at ring 1, the
tx_data ring:

        i = (*index) % ring_limit;
        (*index) = idx = le32_to_cpu(ring_control->device_idx[1]);
        idx %= ring_limit;

So we are comparing the priv->tx_idx_mgmt with ring_control->device_idx[1],
where we should be comparing it with priv->tx_idx_mgmt[3].

I think this causes us to think there are completed skbs in the ring,
while in reality there are not, and then we start looking at bogus data,
this explains the pci_unmap_single oops, and maybe also the change of that
to the panic with the new move tx to tasklet patches.

I wonder if we need to call p54p_check_tx_ring for tx_mgmt at all as we are not
using tx_mgmt, but we definitely need to fixup the ring_control->device_idx
indexing in p54p_check_tx_ring. I'm going to take a break now, and then run
some tests with this fixed.

Regards,

Hans

Comment 15 chunkeey 2010-04-21 19:12:49 UTC
Created attachment 408149 [details]
verbose test patch

Thanks for your extensive notes.

From the p54pci.ko file it looks like the skb will be freed before
the pci sublayer has a chance to undo the DMA mapping. I've added
a barrier to prevent gcc from doing this optimization.

BTW, if you can speed up the module built a little bit by using
make M=drivers/net/wireless/p54/ .

Also, if you are still interested on how to get get _affected_ C
code line from the assembler output you should take a look at
the last paragraph in Documentation/BUG-HUNTING 
(under "Fixing the bug")

Regards,
   Chr

Comment 16 chunkeey 2010-04-21 19:41:26 UTC
Created attachment 408154 [details]
fix const ring index in p54p_check_tx_ring

>First of all, it seems our tx code only uses the txdata ring, not the txmgmt
>ring:

txmgmt fragments are processed in front of txdata fragments by the firmware.
One could say, they have a higher priority. But other than that, there's
no difference. In theory, the txmgmt ring _could_ be used for canceling
stuck frames and other critical stuff. However this functionality is not
implemented. 

>static void p54p_tx(struct ieee80211_hw *dev, struct sk_buff *skb)
>{
>        <snip>
>       device_idx = le32_to_cpu(ring_control->device_idx[1]);
>        idx = le32_to_cpu(ring_control->host_idx[1]);
>
>        <snip>
>
>        desc = &ring_control->tx_data[i];
>
>        <snip>
>
>        ring_control->host_idx[1] = cpu_to_le32(idx + 1);
>
>        <snip>
>}
>
>Notice all the hardcoded [1] array indexes here.
>
>jup
>Then when checking the tx rings, we do:
>
>                p54p_check_tx_ring(dev, &priv->tx_idx_mgmt,
>                                   3, ring_control->tx_mgmt,
>                                   ARRAY_SIZE(ring_control->tx_mgmt),
>                                   priv->tx_buf_mgmt);
>
>                p54p_check_tx_ring(dev, &priv->tx_idx_data,
>                                   1, ring_control->tx_data,
>                                   ARRAY_SIZE(ring_control->tx_data),
>                                   priv->tx_buf_data);
>
>Notice the 3th argument to p54p_check_tx_ring is the ring_index:
>
>static void p54p_check_tx_ring(struct ieee80211_hw *dev, u32 *index,
>        int ring_index, struct p54p_desc *ring, u32 ring_limit,
>        void **tx_buf)
>{
>
>But we are not using it, instead we are always looking at ring 1, the
>tx_data ring:
>
>        i = (*index) % ring_limit;
>        (*index) = idx = le32_to_cpu(ring_control->device_idx[1]);
>        idx %= ring_limit;
>
>So we are comparing the priv->tx_idx_mgmt with ring_control->device_idx[1],
>where we should be comparing it with priv->tx_idx_mgmt[3].
>
>I think this causes us to think there are completed skbs in the ring,
>while in reality there are not, and then we start looking at bogus data,
>this explains the pci_unmap_single oops, and maybe also the change of that
>to the panic with the new move tx to tasklet patches.

you are right, the hardcoded index in p54p_check_tx_ring is wrong.
I've attached a patch to fix this issue. Let me know if this
has any effect, because I won't be able to test these changes
until friday.

Comment 17 Hans de Goede 2010-04-22 11:41:03 UTC
I spend a couple hours of printk debugging this this morning and I now have it nailed.

The problem is the innocent looking moving of the tx processing to after the rx processing in the tasklet. Quoting from the changelog:

This patch does the same way, except that it also prioritize
rx data processing, simply because tx routines *can* wait.

This is causing an issue with us referencing already freed memory, because some
skb's we transmit, we immediately receive back, such as those for reading the eeprom (*) and getting stats.

What can happen because of the moving of the tx processing to after the rx processing is that when the tasklet first runs after doing a special skb tx (such as eeprom) we've already received the answer to it.

Then the rx processing ends up calling p54_find_and_unlink_skb to find the matching tx skb for the just received special rx skb and frees the tx skb.

Then after the processing of the rx skb answer, and thus freeing the tx skb,
we go process the completed tx ring entires, and then dereference the free-ed skb, to see if it should free free-ed by p54p_check_tx_ring().

Putting the p54p_check_tx_ring calls back in front of the p54p_check_rx_ring
calls fixes things for me, and makes my p54pci card completely stable (**).

*)  I've seen the comment #4 panic also a couple of times when still reading
    the eeprom.
**) Except that it is not working after suspend, but that is a different issue.

Regards,

Hans

Comment 18 chunkeey 2010-04-22 13:34:23 UTC
Created attachment 408325 [details]
All patches

Hello Hans,

I've prepared the following patches. Any comments?
If not, I'll post them to linux-wireless
today, or on friday.

Regards, Chr

Comment 19 Hans de Goede 2010-04-22 13:42:13 UTC
Hi,

(In reply to comment #18)
> Created an attachment (id=408325) [details]
> All patches
> 
> Hello Hans,
> 
> I've prepared the following patches.

Thanks.

> Any comments?

The barrier is not needed, the use of free memory which we were
seeing comes solely from the moving of the tx ring processing. I've conducted
a number of tests without the barrier, but with the tx ring processing
moved back to before the rx ring processing and all was well.

Also gcc really should not be moving code around over function calls,
and I don't believe it is doing that here.

Other then that the 2 patches look fine.

Regards,

Hans

Comment 20 chunkeey 2010-04-22 17:01:20 UTC
comment #19:

>The barrier is not needed, the use of free memory which we were
>seeing comes solely from the moving of the tx ring processing. I've conducted
>a number of tests without the barrier, but with the tx ring processing
>moved back to before the rx ring processing and all was well.
>
>Also gcc really should not be moving code around over function calls,
>and I don't believe it is doing that here.

Wait a sec... Here is an extract from p54pci.ko comment #9 (generated by objdump -r -d):

>000007c6 <p54p_check_tx_ring>:
> 7c6:	55                   	push   %ebp
> 7c7:	89 e5                	mov    %esp,%ebp
> 7c9:	57                   	push   %edi
> 7ca:	56                   	push   %esi
> 7cb:	53                   	push   %ebx
> 7cc:	83 ec 0c             	sub    $0xc,%esp
> 7cf:	e8 fc ff ff ff       	call   7d0 <p54p_check_tx_ring+0xa>
>			7d0: R_386_PC32	mcount
> 7d4:	89 45 ec             	mov    %eax,-0x14(%ebp)
> 7d7:	8b 78 24             	mov    0x24(%eax),%edi
> 7da:	89 d1                	mov    %edx,%ecx
> 7dc:	8b 02                	mov    (%edx),%eax
> 7de:	31 d2                	xor    %edx,%edx
> 7e0:	f7 75 0c             	divl   0xc(%ebp)
> 7e3:	89 55 f0             	mov    %edx,-0x10(%ebp)
  i = (*index) % ring_limit;

> 7e6:	8b 87 a8 06 00 00    	mov    0x6a8(%edi),%eax
> 7ec:	31 d2                	xor    %edx,%edx
> 7ee:	8b 40 14             	mov    0x14(%eax),%eax
 (*index) = idx = le32_to_cpu(ring_control->device_idx[ring_index]);

> 7f1:	89 01                	mov    %eax,(%ecx)
> 7f3:	f7 75 0c             	divl   0xc(%ebp)
> 7f6:	89 55 e8             	mov    %edx,-0x18(%ebp)
 idx %= ring_limit;

> 7f9:	eb 6a                	jmp    865 <p54p_check_tx_ring+0x9f> 
       this is where we jump into the while (i != idx) loop {

> 7fb:	8b 55 10             	mov    0x10(%ebp),%edx
> 7fe:	8b 45 f0             	mov    -0x10(%ebp),%eax
> 801:	6b 5d f0 0c          	imul   $0xc,-0x10(%ebp),%ebx
> 805:	03 5d 08             	add    0x8(%ebp),%ebx
> 808:	8d 34 82             	lea    (%edx,%eax,4),%esi
                desc = &ring[i];

                skb = tx_buf[i];

> 80b:	8b 16                	mov    (%esi),%edx
                tx_buf[i] = NULL;

------> And now the most interesting bits <------

> 80d:	85 d2                	test   %edx,%edx
> 80f:	74 15                	je     826 <p54p_check_tx_ring+0x60>
> 811:	8b 82 ac 00 00 00    	mov    0xac(%edx),%eax
> 817:	66 81 38 01 80       	cmpw   $0x8001,(%eax)
> 81c:	75 08                	jne    826 <p54p_check_tx_ring+0x60>
> 81e:	8b 45 ec             	mov    -0x14(%ebp),%eax
> 821:	e8 fc ff ff ff       	call   822 <p54p_check_tx_ring+0x5c>
>			822: R_386_PC32	p54_free_skb

     if (skb && FREE_AFTER_TX(skb))
         p54_free_skb(dev, skb);

> 826:	c7 06 00 00 00 00    	movl   $0x0,(%esi)
> 82c:	8b 87 64 06 00 00    	mov    0x664(%edi),%eax
> 832:	0f b7 4b 08          	movzwl 0x8(%ebx),%ecx
> 836:	6a 01                	push   $0x1
> 838:	8b 13                	mov    (%ebx),%edx
> 83a:	e8 81 f9 ff ff       	call   1c0 <pci_unmap_single>

      
         pci_unmap_single(priv->pdev, le32_to_cpu(desc->host_addr),
                          le16_to_cpu(desc->len), PCI_DMA_TODEVICE);

> 83f:	31 d2                	xor    %edx,%edx

> 841:	c7 03 00 00 00 00    	movl   $0x0,(%ebx)
> 847:	c7 43 04 00 00 00 00 	movl   $0x0,0x4(%ebx)
> 84e:	66 c7 43 08 00 00    	movw   $0x0,0x8(%ebx)
> 854:	66 c7 43 0a 00 00    	movw   $0x0,0xa(%ebx)
                desc->host_addr = 0;
                desc->device_addr = 0;
                desc->len = 0;
                desc->flags = 0;


> 85a:	8b 45 f0             	mov    -0x10(%ebp),%eax
> 85d:	5e                   	pop    %esi
> 85e:	40                   	inc    %eax
> 85f:	f7 75 0c             	divl   0xc(%ebp)
> 862:	89 55 f0             	mov    %edx,-0x10(%ebp)
                i++;
                i %= ring_limit;


> 865:	8b 45 e8             	mov    -0x18(%ebp),%eax
> 868:	39 45 f0             	cmp    %eax,-0x10(%ebp)
> 86b:	75 8e                	jne    7fb <p54p_check_tx_ring+0x35>
 that's the  i != idx comparison of our while loop

 }

> 86d:	8d 65 f4             	lea    -0xc(%ebp),%esp
> 870:	5b                   	pop    %ebx
> 871:	5e                   	pop    %esi
> 872:	5f                   	pop    %edi
> 873:	5d                   	pop    %ebp
> 874:	c3                   	ret    
[end]

p54_free_skb is already _done_ before pci_unmap_single had a
chance to destroy the mapping. I'm not sure why gcc did that,
I've never seen this before either and I also thought that
GCC wouldn't be so daft to do that....
but now, it really looks like gcc did at least "once".

Regards,
   Chr

Comment 21 Hans de Goede 2010-04-22 17:19:27 UTC
Created attachment 408393 [details]
patch missing from the p54pci.ko from attachment in comment #9

(In reply to comment #20)
> comment #19:
> 
> >The barrier is not needed, the use of free memory which we were
> >seeing comes solely from the moving of the tx ring processing. I've conducted
> >a number of tests without the barrier, but with the tx ring processing
> >moved back to before the rx ring processing and all was well.
> >
> >Also gcc really should not be moving code around over function calls,
> >and I don't believe it is doing that here.
> 
> Wait a sec... Here is an extract from p54pci.ko comment #9 (generated by
> objdump -r -d):
> 

<snip>

> I also thought that GCC wouldn't be so daft to do that....

The problem is not GCC being daft, but me being daft, as explained in comment #12 I accidentally did the testing in comments 8-11 with the wrong version (and also attached the wrong version.

The p54pci.ko from comment #9 does not include the attached patch and thus indeed has the order of the free and the map the other way around, as this missing patch is the one which puts them in the right order (amongst other doing things).

I'm sure if you disassemble the correct p54pci.ko I attached later the order will be correct.

Sorry about the confusion.

Comment 22 chunkeey 2010-04-22 17:51:48 UTC
>(In reply to comment #20)
>> I also thought that GCC wouldn't be so daft to do that.
>
>The problem is not GCC being daft, but me being daft, as explained in comment
>#12 I accidentally did the testing in comments 8-11 with the wrong version (and
>also attached the wrong version.
>
>The p54pci.ko from comment #9 does not include the attached patch and thus
>indeed has the order of the free and the map the other way around, as this
>missing patch is the one which puts them in the right order (amongst other
>doing things).
>
>I'm sure if you disassemble the correct p54pci.ko I attached later the order
>will be correct.
>
>Sorry about the confusion.

hmm, there's only one p54pci.ko from comment #9, comment #12 added p54usb.ko
(which is named p54pci.ko?).

Anyway, that's not important ;-)

I've removed the barrier(); and send it to linux-wireless.

Regards,
   Chr

Comment 23 Hans de Goede 2010-04-22 19:16:45 UTC
(In reply to comment #22)
> hmm, there's only one p54pci.ko from comment #9, comment #12 added p54usb.ko
> (which is named p54pci.ko?).
> 

Comment #12 added a p54usb.ko ? I must have mis-clicked when attaching ...

> Anyway, that's not important ;-)
> 

Agreed :)

> I've removed the barrier(); and send it to linux-wireless.

Looks good, thanks!

Regards,

Hans

p.s.

Linville, any chance we could get the patches from comment #3 + the 2 new ones chunkeey just posted into the Fedora kernel ?

Comment 24 Hans de Goede 2010-04-22 19:17:42 UTC
(In reply to comment #23)
> Linville, any chance we could get the patches from comment #3 + the 2 new ones
> chunkeey just posted into the Fedora kernel ?    

Note I have commit access to the Fedora kernel pkg cvs (for the webcam driver work I do), so I can do this myself if doing so is ok with you.

Comment 25 John W. Linville 2010-04-22 19:24:49 UTC
Fine by me... :-)

Comment 26 Hans de Goede 2010-04-22 20:57:17 UTC
(In reply to comment #25)
> Fine by me... :-)    

Done, the necessary changes are in pkg cvs and will get picked up by the next F-13 kernel build.

Comment 27 Fedora Update System 2010-04-28 18:38:01 UTC
kernel-2.6.33.3-72.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/kernel-2.6.33.3-72.fc13


Note You need to log in before you can comment on or make changes to this bug.