Description of problem: On every boot my WiFi card (Edimax EW-7128G, rt61 driver) has a different MAC address. For users with MAC filter configured in their access point it will be impossible to connect. My guess is that NVRAM is not properly read, or something like that since the difference in MACs are alsways a matter of bits. Version-Release number of selected component (if applicable): How reproducible: Reboot and run ifconfig to see the MAC address. Steps to Reproduce: 1. 2. 3. Actual results: Different MAC every boot. Expected results: The card should be up with the same MAC address all the time. Additional info:
Please post the results of these commands, uname -r /sbin/lspci | grep Network
re-assigned to the kernel
[alex@htpc ~]$ uname -r 2.6.27.4-68.fc10.i686 [alex@htpc ~]$ /sbin/lspci | grep Network 05:02.0 Network controller: RaLink RT2561/RT61 802.11g PCI
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle. Changing version to '10'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Does this problem still appear ?
I had to configure multiple entries in my access point to allow connection, so I will not see it. I've upgraded my kernel few days ago. Will remove duplicate entries from access point and will update. Might take some time since I don't reboot the system too much.
Does the following message appear in your logfile: EEPROM recovery MAC: xx:xx:xx:xx:xx:xx If so the EEPROM contains invalid data, and each time the driver loads it will perform a recovery operation by generating a random MAC address.
The MAC is not random, its very similar to the real one (the one I see in Windoze or the one I saw in Fedora 8), but has differences in 1 or 2 bits. The changed bits are not constant, its moving all over the 6 MAC address bytes every time. I don't have access to my access point, I will post here the MAC addresses I had to add in order to allow my Linux machine to connect.
So that means you don't get the "EEPROM recovery" messages?
P.S. Could you use the script: http://kernel.org/pub/linux/kernel/people/ivd/tools/rt2x00_regdump.sh this will create a dump of your registers and will indicate what MAC value is stored in the device.
Card MAC address as reported by Windows: 00:0E:2E:D5:52:40 The entries I had to add in order to connect (that means I saw FC10 boot with those addresses): 00:0E:2E:D5:52:40 00:0F:2E:D5:52:40 00:0E:27:D1:52:40 00:0E:2E:C7:52:40 There was one more time with another mac, but I had no time, so I just reboot the machine (means there are more combinations). I've removed the fake addresses from my AP list, will see if it happen again in few next days. Currently 2 boots were normal.
The script gave me: debugfs not mounted! I'm referenced to ifconfig output by now: wlan0 Link encap:Ethernet HWaddr 00:0E:2E:D5:52:40 My current kernel is: 2.6.27.5-117.fc10.i686
Please mount debugfs and try again.
Attaching output of the script when system booted with good MAC address.
Created attachment 325738 [details] Good MAC script output
No "EEPROM recovery" was found in any of the logs
ok, thats interesting. When you get a different MAC address again, please post the new script output so I can compare them.
Happened again today. This time it woke up with 00:06:2E:D5:52:40 MAC address. Compare it: 00:06:2E:D5:52:40 - bad 00:0E:2E:D5:52:40 - good 1 bit is missing. Couldn't find "EEPROM" nor "recovery" in any of the /var/log files. I did 'grep -nir "EEPROM *' and 'grep -nir "recovery" *'. Attaching output of the script when there was a problem.
Created attachment 326380 [details] Bad MAC script output
Well looking at the logs I fear that there is a hardware problem. Just a snip from your 2 EEPROM dumps: Attachment 325738 [details]: 0 :0x2561 1 :0x0100 2 :0x0e00 3 :0xd52e 4 :0x4052 5 :0x0301 6 :0x1814 Attachment 326380 [details]: 0 :0x2561 1 :0x0100 2 :0x0600 3 :0xd52e 4 :0x4052 5 :0x0301 6 :0x080c You see that words 2 and 6 are changing. Number 2 is part of your MAC address and word 6 is the vendor identifier (which must be 0x1814). This means that each time the driver reads the EEPROM from the device it is getting different values. This sounds like a hardware problem rather then a driver problem. Is the card correctly inserted in the PCI slot? Does it work on other operating systems without changing MAC addresses? Does it work in a different computer?
As I wrote already, it works perfect in Windows XP and worked without MAC address problems on Fedora 8 that was upgraded a month ago. All that on the same machine without touching the hardware. So let me doubt the assumption that its a hardware problem. If the card wasn't correctly fit in its slot I would expect many other problems beside the MAC address issue, such as system lockup, WiFi not working at all and such. Even when the MAC is not correctly read from EEPROM the system works perfectly for days (I switch it off when I don't need it). Since I can clearly see that card/driver/firmware behavior changed (in a good way) that resolved some WiFi performance issues as well as collisions with other WiFi devices on the network I can assume that some major changes have been done somewhere between Fedora 8 release and current Fedora 10. Those changes might also affected reading from EEPROM. I don't know what interface is used to read EEPROM in that case, but generally there are some delays placed between access to hardware BUS, the problem sounds to me that there was not enough delay between several hardware read/writes. What does that script do? Does it read from the driver cache or from EEPROM directly? If there was a way to read from EEPROM directly (using driver of cause) I would expect to see different readings once in a while if I'll run that script many times.
The script reads the driver cache. This cache is initialized once and only read afterwards. At no time will it write data back to the EEPROM. You could rmmod && insmod rt61pci several times to see if it reads the EEPROM differently each time. The driver which handles the EEPROM reading is eeprom_93cx6 but that driver has had no codechanges since 2006. There is a timeout within that module of 450ns between reads which is well above the rt61 specs (which indicate <100ns delays). I'll attach a patch which increases the timeout, let me know if it helps.
Created attachment 326840 [details] Increase EEPROM delay
Ok, reloading the driver is a good idea. I'll do some tests to determine how often the problem appears. I'll also try to lower the delay to 200ns to know whenever its really a timing issue. Will report the results when I'll have it. Thanks.
I wrote the following script and run it about 125 times with default drivers: ============================== #!/bin/sh TIME=`date +%s` rmmod rt61pci rmmod eeprom_93cx6 sleep 2 modprobe rt61pci sleep 5 ifconfig 1> MAC$TIME.txt /media/storage/Download/rt2x00_regdump.sh > MAC$TIME.log ============================== In all the cases MAC addresses were the same. Is there anything wrong with my script? Or is there anything that can affect EEPROM reading in boot time when CPU is probably pretty busy? I have E6600 Intel Core2. How reliable is the ndelay function that is used in the driver? Can it be that when CPU is busy the delay will be much longer than 450 nSec? Fedora10 provided kernel doesn't compile for some reason, I'll try to find a time to compile a kernel from kernel.org and try with longer delay. Or maybe there is something else I can do to reproduce the problem?
no the script should be fine. The ndelay comes directly from the rt61pci chipset specifications, however their implementation used mdelay() so they did decide to use a bigger interval (the reason for this timeout is not provided). But the patch attached to this bugreport will change the timeout to exactly the same value as the Ralink implementation, so perhaps that is useful. I don't have experience with kernels from Fedora myself, but did you try the kernel configuration which is located in /proc/config.gz?
I still can't compile kernel that came from yum updates. There is no /proc/config.gz, I found kernel configuration file in /boot, probably installed together with the kernel. Are wrote that their implementation used mdelay (milliSec), while your change was to increase the delay to 1000 nanoSec, you also wrote that the timeout will be the same. Is it a typo in your comment? What value should be used? I've compiled 2.6.27.8 custom kernel with udelay(1) (1 microSec) just in case ndelay has some problems. Will update if the problem comes back.
After trying to backup my system using WiFi card and moving around 100GigaByte of data I've noticed that many files had 1 bit differences in several places - same symptom as the MAC address issue. I've run extensive memtests, including test #9 (bit fading), but no memory problems were detected. Although memtest didn't find any problem, I've replaced the memory chips with new ones in hope that it will solve my problems. Its too early to make conclusions, but the system looks stable so far. I'll update later whenever the problem is gone so we'll close that issue.
Does the problem still appear ?
No. Apparently failed memory chips caused the problem. The issue can be closed. Sorry for the noise.