469669 – Different MAC address for the rt61 card on every boot

Bug 469669 - Different MAC address for the rt61 card on every boot

Summary: Different MAC address for the rt61 card on every boot

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	10
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	John W. Linville
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-11-03 12:25 UTC by Alex Betis
Modified:	2009-01-20 18:52 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-01-20 18:52:22 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Good MAC script output (7.43 KB, text/plain) 2008-12-04 19:58 UTC, Alex Betis	no flags	Details
Bad MAC script output (7.43 KB, text/plain) 2008-12-09 18:54 UTC, Alex Betis	no flags	Details
Increase EEPROM delay (739 bytes, patch) 2008-12-13 23:07 UTC, Ivo van Doorn	no flags	Details \| Diff
View All

Description Alex Betis 2008-11-03 12:25:27 UTC

Description of problem:
On every boot my WiFi card (Edimax EW-7128G, rt61 driver) has a different MAC address.
For users with MAC filter configured in their access point it will be impossible to connect.
My guess is that NVRAM is not properly read, or something like that since the difference in MACs are alsways a matter of bits.

Version-Release number of selected component (if applicable):


How reproducible:
Reboot and run ifconfig to see the MAC address.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
Different MAC every boot.

Expected results:
The card should be up with the same MAC address all the time.

Additional info:

Comment 1 Brennan Ashton 2008-11-04 05:09:37 UTC

Please post the results of these commands,
uname -r
/sbin/lspci | grep Network

Comment 2 Nicolas Chauvet (kwizart) 2008-11-04 07:59:42 UTC

re-assigned to the kernel

Comment 3 Alex Betis 2008-11-04 19:42:14 UTC

[alex@htpc ~]$ uname -r
2.6.27.4-68.fc10.i686
[alex@htpc ~]$ /sbin/lspci | grep Network
05:02.0 Network controller: RaLink RT2561/RT61 802.11g PCI

Comment 4 Bug Zapper 2008-11-26 04:40:59 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 5 Nicolas Chauvet (kwizart) 2008-12-02 15:26:43 UTC

Does this problem still appear ?

Comment 6 Alex Betis 2008-12-02 15:52:08 UTC

I had to configure multiple entries in my access point to allow connection, so I will not see it.
I've upgraded my kernel few days ago. Will remove duplicate entries from access point and will update. Might take some time since I don't reboot the system too much.

Comment 7 Ivo van Doorn 2008-12-02 21:44:29 UTC

Does the following message appear in your logfile:
EEPROM recovery MAC: xx:xx:xx:xx:xx:xx

If so the EEPROM contains invalid data, and each time the driver loads it will perform a recovery operation by generating a random MAC address.

Comment 8 Alex Betis 2008-12-03 05:54:49 UTC

The MAC is not random, its very similar to the real one (the one I see in Windoze or the one I saw in Fedora 8), but has differences in 1 or 2 bits. The changed bits are not constant, its moving all over the 6 MAC address bytes every time.
I don't have access to my access point, I will post here the MAC addresses I had to add in order to allow my Linux machine to connect.

Comment 9 Ivo van Doorn 2008-12-03 13:49:06 UTC

So that means you don't get the "EEPROM recovery" messages?

Comment 10 Ivo van Doorn 2008-12-03 13:50:05 UTC

P.S. Could you use the script:
http://kernel.org/pub/linux/kernel/people/ivd/tools/rt2x00_regdump.sh

this will create a dump of your registers and will indicate what MAC value is stored in the device.

Comment 11 Alex Betis 2008-12-03 20:32:08 UTC

Card MAC address as reported by Windows: 
00:0E:2E:D5:52:40

The entries I had to add in order to connect (that means I saw FC10 boot with those addresses):
00:0E:2E:D5:52:40
00:0F:2E:D5:52:40
00:0E:27:D1:52:40
00:0E:2E:C7:52:40
There was one more time with another mac, but I had no time, so I just reboot the machine (means there are more combinations).

I've removed the fake addresses from my AP list, will see if it happen again in few next days. Currently 2 boots were normal.

Comment 12 Alex Betis 2008-12-03 20:36:07 UTC

The script gave me:
debugfs not mounted!

I'm referenced to ifconfig output by now:
wlan0     Link encap:Ethernet  HWaddr 00:0E:2E:D5:52:40

My current kernel is:
2.6.27.5-117.fc10.i686

Comment 13 Ivo van Doorn 2008-12-03 22:07:41 UTC

Please mount debugfs and try again.

Comment 14 Alex Betis 2008-12-04 19:57:18 UTC

Attaching output of the script when system booted with good MAC address.

Comment 15 Alex Betis 2008-12-04 19:58:02 UTC

Created attachment 325738 [details]
Good MAC script output

Comment 16 Alex Betis 2008-12-04 20:51:44 UTC

No "EEPROM recovery" was found in any of the logs

Comment 17 Ivo van Doorn 2008-12-04 21:22:05 UTC

ok, thats interesting.

When you get a different MAC address again, please post the new script output so I can compare them.

Comment 18 Alex Betis 2008-12-09 18:53:42 UTC

Happened again today. This time it woke up with 00:06:2E:D5:52:40 MAC address.
Compare it:
00:06:2E:D5:52:40 - bad
00:0E:2E:D5:52:40 - good

1 bit is missing.
Couldn't find "EEPROM" nor "recovery" in any of the /var/log files.
I did 'grep -nir "EEPROM *' and 'grep -nir "recovery" *'.

Attaching output of the script when there was a problem.

Comment 19 Alex Betis 2008-12-09 18:54:50 UTC

Created attachment 326380 [details]
Bad MAC script output

Comment 20 Ivo van Doorn 2008-12-13 18:53:23 UTC

Well looking at the logs I fear that there is a hardware problem.
Just a snip from your 2 EEPROM dumps:
Attachment 325738 [details]:
0 :0x2561
1 :0x0100
2 :0x0e00
3 :0xd52e
4 :0x4052
5 :0x0301
6 :0x1814

Attachment 326380 [details]:
0 :0x2561
1 :0x0100
2 :0x0600
3 :0xd52e
4 :0x4052
5 :0x0301
6 :0x080c

You see that words 2 and 6 are changing. Number 2 is part of your MAC address and word 6 is the vendor identifier (which must be 0x1814). This means that each time the driver reads the EEPROM from the device it is getting different values.

This sounds like a hardware problem rather then a driver problem. Is the card correctly inserted in the PCI slot? Does it work on other operating systems without changing MAC addresses? Does it work in a different computer?

Comment 21 Alex Betis 2008-12-13 20:28:38 UTC

As I wrote already, it works perfect in Windows XP and worked without MAC address problems on Fedora 8 that was upgraded a month ago. All that on the same machine without touching the hardware. So let me doubt the assumption that its a hardware problem. If the card wasn't correctly fit in its slot I would expect many other problems beside the MAC address issue, such as system lockup, WiFi not working at all and such. Even when the MAC is not correctly read from EEPROM the system works perfectly for days (I switch it off when I don't need it).

Since I can clearly see that card/driver/firmware behavior changed (in a good way) that resolved some WiFi performance issues as well as collisions with other WiFi devices on the network I can assume that some major changes have been done somewhere between Fedora 8 release and current Fedora 10. Those changes might also affected reading from EEPROM. I don't know what interface is used to read EEPROM in that case, but generally there are some delays placed between access to hardware BUS, the problem sounds to me that there was not enough delay between several hardware read/writes.

What does that script do? Does it read from the driver cache or from EEPROM directly?
If there was a way to read from EEPROM directly (using driver of cause) I would expect to see different readings once in a while if I'll run that script many times.

Comment 22 Ivo van Doorn 2008-12-13 23:07:04 UTC

The script reads the driver cache. This cache is initialized once and only read afterwards. At no time will it write data back to the EEPROM.

You could rmmod && insmod rt61pci several times to see if it reads the EEPROM differently each time.

The driver which handles the EEPROM reading is eeprom_93cx6 but that driver has had no codechanges since 2006. There is a timeout within that module of 450ns between reads which is well above the rt61 specs (which indicate <100ns delays).

I'll attach a patch which increases the timeout, let me know if it helps.

Comment 23 Ivo van Doorn 2008-12-13 23:07:42 UTC

Created attachment 326840 [details]
Increase EEPROM delay

Comment 24 Alex Betis 2008-12-14 07:41:08 UTC

Ok, reloading the driver is a good idea. I'll do some tests to determine how often the problem appears.

I'll also try to lower the delay to 200ns to know whenever its really a timing issue.

Will report the results when I'll have it.
Thanks.

Comment 25 Alex Betis 2008-12-18 21:11:21 UTC

I wrote the following script and run it about 125 times with default drivers:
==============================
#!/bin/sh

TIME=`date +%s`


rmmod rt61pci
rmmod eeprom_93cx6
sleep 2
modprobe rt61pci
sleep 5

ifconfig 1> MAC$TIME.txt
/media/storage/Download/rt2x00_regdump.sh > MAC$TIME.log

==============================

In all the cases MAC addresses were the same. Is there anything wrong with my script? Or is there anything that can affect EEPROM reading in boot time when CPU is probably pretty busy?

I have E6600 Intel Core2.

How reliable is the ndelay function that is used in the driver? Can it be that when CPU is busy the delay will be much longer than 450 nSec?

Fedora10 provided kernel doesn't compile for some reason, I'll try to find a time to compile a kernel from kernel.org and try with longer delay.

Or maybe there is something else I can do to reproduce the problem?

Comment 26 Ivo van Doorn 2008-12-18 23:29:51 UTC

no the script should be fine.
The ndelay comes directly from the rt61pci chipset specifications, however their implementation used mdelay() so they did decide to use a bigger interval (the reason for this timeout is not provided).
But the patch attached to this bugreport will change the timeout to exactly the same value as the Ralink implementation, so perhaps that is useful.

I don't have experience with kernels from Fedora myself, but did you try the kernel configuration which is located in /proc/config.gz?

Comment 27 Alex Betis 2008-12-19 07:56:09 UTC

I still can't compile kernel that came from yum updates. There is no /proc/config.gz, I found kernel configuration file in /boot, probably installed together with the kernel.

Are wrote that their implementation used mdelay (milliSec), while your change was to increase the delay to 1000 nanoSec, you also wrote that the timeout will be the same. Is it a typo in your comment? What value should be used?

I've compiled 2.6.27.8 custom kernel with udelay(1) (1 microSec) just in case ndelay has some problems. Will update if the problem comes back.

Comment 28 Alex Betis 2008-12-28 18:43:02 UTC

After trying to backup my system using WiFi card and moving around 100GigaByte of data I've noticed that many files had 1 bit differences in several places - same symptom as the MAC address issue. I've run extensive memtests, including test #9 (bit fading), but no memory problems were detected.
Although memtest didn't find any problem, I've replaced the memory chips with new ones in hope that it will solve my problems. Its too early to make conclusions, but the system looks stable so far.

I'll update later whenever the problem is gone so we'll close that issue.

Comment 29 Nicolas Chauvet (kwizart) 2009-01-20 15:32:17 UTC

Does the problem still appear ?

Comment 30 Alex Betis 2009-01-20 18:52:22 UTC

No.
Apparently failed memory chips caused the problem.

The issue can be closed.

Sorry for the noise.

Note You need to log in before you can comment on or make changes to this bug.