2009-04-23

Ethernet Card Issue When Using AMD 64 Architecture of Linux

i would first like to give a shout out to fellow members of the Slugnet Mailing List - especially Patrick Haller - who helped me to diagnose and figure out the root cause of the problem. It was a real community effort! :-)

Previously, i was running Ubuntu Intrepid Ibex (8.10) i386 and did not encounter any networking issue. When the release candidate for Jaunty Jackalope became available, i decided to try it out (with a fresh install), opting for the AMD64 architecture so as to fully utilise the 4 GB of RAM that i have on my system.

Installation went smoothly, but after starting up, the NetworkManager applet reported that a network connection could not be established (i.e. it could not get a DHCP lease from my router).

Thereafter, a series of diagnosis and troubleshooting ensued and took up the best part of the weekend:

1. ifconfig showed that the ethernet card got detected (eth0), listing the correct HWaddr (MAC address). But of course, there was no IP address for the interface since it failed to get a lease via DHCP.

2. Networking worked fine with the i386 architecture of Jaunty Jackalope (tried using live CD). Furthermore, the NetworkManager settings on the AMD64 run were the same as those on the i386 run.

3. Doing a grep on dmesg showed that the ethernet card was correctly detected, and that the link became ready. The relevant lines were also the same across the AMD64 and the i386 runs.

4. Doing a tcpdump (while the NetworkManager applet was trying to establish a connection) showed that DHCP request packets were being correctly sent out, but with no offer coming back.

5. Statically setting the IP address, netmask, gateway and default route (instead of relying on DHCP) did not work either - regardless of whether they were set via the NetworkManager applet, or via the ifconfig and route commands). i still could not reach my router either by ping, telnet, or using the browser (router configuration web page). Furthermore, arp -an after pinging to the router - unsuccessfully (at 192.168.1.254) gave the response of (192.168.1.254) at <incomplete> on eth0.

6. The same issue surfaced when using the x86_64 architecture of Fedora 10 (again, tried with the live CD).

At that point, suspicions have started to narrow towards the ethernet card itself and/or its driver. In a subsequent post to the mailing list, i mentioned the model of the motherboard that i was using (Asus M2A-MVP) as well as its onboard LAN (Marvell 88E8001). There were apparently problems reported when using it with Ubuntu 7.04 x86_64 (http://hardware4linux.info/component/5811/).

This led to Patrick with his winning entry: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/131965 (in particular, this comment - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/131965/comments/19). Booting up my system with mem=3G (and having a working network connection) showed that it was the same issue as what Patrick had identified. Final confirmation was with Anton's contribution of http://kerneltrap.org/mailarchive/linux-netdev/2009/2/10/4944484.

So that was that! In hindsight i might have done better by searching for reported issues with the particular model of network card, but i suppose i was thrown off the hardware / device driver trail when tcpdump told me that the ethernet card was sending out DHCP requests correctly. For now, i am running my system on 3 GB of memory, but i will probably get a new ethernet card soon.

1 comments:

RM said...

i got here from the ubuntu forums;

i'm going to check if booting with mem=3g works... i have the same problem, since i upgraded my ram from 2 to 6 gb and re installed all my OSes.

linux reports same problem with skge driver, i have same mainboard m2a-mvp.

vista64/win7-64 detects correcly but cant dhcp either...winxp-32 runs fine though.

within all my livecds the only one that was able to work was a i386 version of Freesbie, not even i386 ubuntu/debian/opensuse/gentoo did the trick. =/

I believe it's a memory issue > 4gb regardless of the OS, and the root of the problem is the crap lan chipset from marvell.

so did you go to the store and buy a new one? i'm thinking about it... lol

btw, couldnt get the kerneltrap.org link to work here.