Bug #1850

Realtek 8111F fails at high traffic 're0 Watchdog Timeout Error'

Added by owling - over 2 years ago. Updated about 1 month ago.

Status:3rd party to resolveStart date:
Priority:ExpectedDue date:
Assignee:-% Done:

0%

Category:-
Target version:-
Seen in:

Description

When using a ASUS C60M1-I motherboard with a Realtek 8111F interface and copy alot of data via NFS, I looses connection after +/-20 minutes. The console prints:

"re0 Watchdog Timeout Error" 

When I start a shell from the console, I can't ping anything except my loopback interface.
Before the permanent timeout occurs, it looses connection for a short time but continues after 4 seconds. When this happens, a 're0: watchdog timeout' is also printed to the console.

At the short and permanent timeout, messages show this:
/var/log/messages

Oct 21 15:39:37 stor kernel: re0: watchdog timeout
Oct 21 15:39:37 stor kernel: re0: link state changed to DOWN
Oct 21 15:39:41 stor kernel: re0: link state changed to UP
Oct 21 15:43:37 stor kernel: re0: watchdog timeout
Oct 21 15:43:37 stor kernel: re0: link state changed to DOWN
Oct 21 15:43:41 stor kernel: re0: link state changed to UP
Oct 21 15:43:46 stor kernel: re0: watchdog timeout
Oct 21 15:43:46 stor kernel: re0: link state changed to DOWN
Oct 21 15:43:50 stor kernel: re0: link state changed to UP
Oct 21 15:45:07 stor kernel: re0: watchdog timeout
Oct 21 15:45:07 stor kernel: re0: link state changed to DOWN
Oct 21 15:45:11 stor kernel: re0: link state changed to UP

This is what happens with the connection a few times before it becomes permanently lost:

64 bytes from 192.168.1.2: icmp_req=1536 ttl=64 time=0.118 ms
64 bytes from 192.168.1.2: icmp_req=1537 ttl=64 time=7998 ms
64 bytes from 192.168.1.2: icmp_req=1538 ttl=64 time=6989 ms
64 bytes from 192.168.1.2: icmp_req=1539 ttl=64 time=5989 ms
64 bytes from 192.168.1.2: icmp_req=1540 ttl=64 time=4989 ms
64 bytes from 192.168.1.2: icmp_req=1544 ttl=64 time=989 ms
64 bytes from 192.168.1.2: icmp_req=1545 ttl=64 time=0.249 ms

Eventually it looses connection
64 bytes from 192.168.1.2: icmp_req=2144 ttl=64 time=0.121 ms
From 192.168.1.92 icmp_seq=2235 Destination Host Unreachable

ifconfig re0 down, ifconfig re0 up has no effect. tcpdump capture 0 packets on re0 after it looses connection.

When I reboot, the connection is restored.

I have "Enable autotune" off in my configuration and have not added any "Tuneables". I have recreated this 4 times.

if_re.ko - Compiled module for 64-bit FreeNAS (351 KB) Martin Bailey, 12/18/2014 09:15 PM

rtl_bsd_drv_v188.tgz - Realtek source code 1.88 (2014/10/31) (80.8 KB) Martin Bailey, 12/18/2014 09:15 PM

History

#1 Updated by mjboerma - almost 2 years ago

I upgraded to FreeNAS 9.x and I am still having issues under heavy load. Also using Asus C60M1-I, Realtek 8111. The first couple of times when the re0: watchdog message appears, network drops but it recovers. Eventually it doesn't recover and it requires a reboot. I ordered a Intel Gigabit CT PCI-E Network Adapter EXPI9301CTBLK to see if that fixes the problem.

re0: watchdog timeout
re0: link state changed to DOWN
re0: link state changed to UP

Build FreeNAS-9.1.0-BETA-e5ef238-x64
Platform AMD C-60 APU with Radeon(tm) HD Graphics
Memory 7767MB

ifconfig

re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=8209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
    ether 60:a4:4c:3f:db:68
    inet 192.168.37.103 netmask 0xffffff00 broadcast 192.168.37.255
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
    media: Ethernet autoselect (1000baseT <full-duplex>)
    status: active
ipfw0: flags=8801<UP,SIMPLEX,MULTICAST> metric 0 mtu 65536
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
    options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
    inet6 ::1 prefixlen 128 
    inet6 fe80::1%lo0 prefixlen 64 scopeid 0xa 
    inet 127.0.0.1 netmask 0xff000000 
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

pciconf -lv | grep -C 4 re0

none2@pci0:1:0:0:    class=0x028000 card=0x84b61043 chip=0x817810ec rev=0x01 hdr=0x00
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8188CE 802.11b/g/n [[WiFi]] Adapter'
    class      = network
re0@pci0:4:0:0:    class=0x020000 card=0x85051043 chip=0x816810ec rev=0x09 hdr=0x00
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168B PCI Express Gigabit Ethernet controller'
    class      = network
    subclass   = ethernet

netstat -m

259/1031/1290 mbufs in use (current/cache/total)
257/523/780/262144 mbuf clusters in use (current/cache/total/max)
257/511 mbuf+clusters out of packet secondary zone in use (current/cache)
0/237/237/131072 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/65536 9k jumbo clusters in use (current/cache/total/max)
0/0/0/32768 16k jumbo clusters in use (current/cache/total/max)
578K/2251K/2830K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
97 requests for I/O initiated by sendfile
0 calls to protocol drain routines

#2 Updated by Greyproc - almost 2 years ago

I looked for a release of FreeNAS 9.x, but all I see is an alpha version with a big warning about how it's not for production use. I'm using my FreeNAS for production use: do I understand correctly that no one cares about a release that's barely been out 3 months at this point?

Is there a significantly improved version of the re driver in 9.x?

#3 Updated by William Grzybowski almost 2 years ago

Nobody is giving any attention to re0 and FreeBSD 8.3.

You might have better luck with FreeNAS 9.X.

#4 Updated by Greyproc - almost 2 years ago

I've been experiencing this, also (Board: Asus C60M1-I, Realtek 8111E).

Something which might be worth noting is one of the first things I did when installing FreeNAS was to set mtu 9000. I'm going to remove that, and see if it helps, as well as any features I don't think I need (such as WOL_MAGIC) and try the above suggested -tso.

Regarding mtu, though, the Realtek specification explicitly states it can handle it; maybe BSD's driver isn't doing something correctly? I haven't noticed any pattern; I've only been using CIFS; turned off NFS, FTP, etc, and was only accessing with one system, if that helps narrow it down. (However, was doing large copy operations, so heavy load.)

System information:

Build: [[FreeNAS]]-8.3.1-RELEASE-p2-x64 (r12686+b770da6_dirty)
Platform: AMD C-60 APU with Radeon(tm) HD Graphics
Memory:    8126MB

dmesg | grep re0

re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F PCIe Gigabit Ethernet> port 0xd000-0xd0ff mem 0xfe004000-0xfe004fff,0xfe000000-0xfe003fff irq 17 at device 0.0 on pci4
re0: Using 1 MSI-X message
re0: Chip rev. 0x48000000
re0: MAC rev. 0x00000000

ifconfig:

re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
 options=2098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC>
 ether 30:85:a9:3e:10:c7
 inet 192.128.0.200 netmask 0xffffff00 broadcast 192.128.0.255
 media: Ethernet autoselect (1000baseT <full-duplex>)
 status: active

pciconf -lv | grep -C 4 re0:

re0@pci0:4:0:0:  class=0x020000 card=0x85051043 chip=0x816810ec rev=0x09 hdr=0x00
    vendor     = 'Realtek Semiconductor'
    device     = 'Gigabit Ethernet NIC(NDIS 6.0) (RTL8168/8111/8111c)'
    class      = network
    subclass   = ethernet

netstat -m:

258/1407/1665 mbufs in use (current/cache/total)
0/404/404/262144 mbuf clusters in use (current/cache/total/max)
0/384 mbuf+clusters out of packet secondary zone in use (current/cache)
0/54/54/131072 4k (page size) jumbo clusters in use (current/cache/total/max)
256/818/1074/65536 9k jumbo clusters in use (current/cache/total/max)
0/0/0/32768 16k jumbo clusters in use (current/cache/total/max)
2368K/8737K/11106K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
17 requests for I/O initiated by sendfile
0 calls to protocol drain routines

#5 Updated by Anker - about 2 years ago

I also have the same problem with 8.3.1 p2. It happens while extracting files with Winrar, or if two processes use the shared drive simultaniously. Speeds get slower over time, but are fine after reboot.

NIC is On-board Realtek RTL8111E.

I get this output:

May 14 00:36:37 freenas kernel: bridge0: Ethernet address: 02:29:2f:29:79:00
May 14 00:36:37 freenas kernel: epair0a: Ethernet address: 02:f1:6e:00:0b:0a
May 14 00:36:37 freenas kernel: epair0b: Ethernet address: 02:f1:6e:00:0c:0b
May 14 00:36:37 freenas kernel: epair0a: link state changed to UP
May 14 00:36:37 freenas kernel: epair0b: link state changed to UP
May 14 00:36:37 freenas kernel: epair0a: promiscuous mode enabled
May 14 00:36:37 freenas kernel: re0: promiscuous mode enabled
May 14 00:36:37 freenas kernel: re0: link state changed to DOWN
May 14 00:36:40 freenas kernel: re0: link state changed to UP
May 14 00:39:31 freenas kernel: re0: watchdog timeout
May 14 00:39:31 freenas kernel: re0: link state changed to DOWN
May 14 00:39:34 freenas kernel: re0: link state changed to UP
May 14 00:40:11 freenas kernel: re0: watchdog timeout
May 14 00:40:11 freenas kernel: re0: link state changed to DOWN
May 14 00:40:14 freenas kernel: re0: link state changed to UP
May 14 00:43:39 freenas kernel: re0: watchdog timeout
May 14 00:43:39 freenas kernel: re0: link state changed to DOWN
May 14 00:43:42 freenas kernel: re0: link state changed to UP
May 14 00:44:14 freenas kernel: re0: watchdog timeout
May 14 00:44:14 freenas kernel: re0: link state changed to DOWN
May 14 00:44:17 freenas kernel: re0: link state changed to UP
May 14 00:44:32 freenas kernel: re0: watchdog timeout
May 14 00:44:32 freenas kernel: re0: link state changed to DOWN
May 14 00:44:35 freenas kernel: re0: link state changed to UP
May 14 00:45:24 freenas kernel: re0: watchdog timeout
May 14 00:45:24 freenas kernel: re0: link state changed to DOWN
May 14 00:45:27 freenas kernel: re0: link state changed to UP

[[Image(http://i43.tinypic.com/2w4juqh.png)]]

#6 Updated by Phillip Marshall over 2 years ago

I'm getting the same issue on my ASRock B75 Pro3, which has the Realtek 8111E, transferring a bunch of data over AFP. Oddly, it didn't break a sweat running a 5TiB replication over SSH when I first installed it yesterday.

EDIT: Rather than worrying about this, I just dropped $25 on an Intel NIC, as apparently Realtek ones are trash: http://www.newegg.com/Product/Product.aspx?Item=N82E16833106121

#7 Updated by stuom - over 2 years ago

This bug also applies to 8.3.0-RELEASE. Tested with CIFS share on high load (three clients reading data from share and one client writing).

"Enable autotune" is disabled and no "Tuneables" has been configured.

#8 Updated by chris3955 - over 2 years ago

I am seeing the same issue with my system.
MB: MSI E350IS-E45
NIC: RealTek 8111E

This issue is reproduced during large transfers over CIFS protocol.

#9 Updated by Vadim - over 2 years ago

Built-in driver for realtek 8.3.0 works correctly. My problem was due to the router, which blocked a virtual machine :)

#10 Updated by kingcharles - over 2 years ago

I am also using a ASUS C60M1-I as the original bug report and can also reproduce this just by copying large files with NFS. The C60M1 only has one expansion slot which is used for a storage board in my setup so adding another NIC is not possible for me.
Using 8.3.0 Release.

#11 Updated by Moritz - over 2 years ago

I do have the exact same error with the same hardware and same freenas release.
It also occurs whilst having high load: Writing to the RaidZ1 array via a CFS share @ roughly 37 mb/s with simultanious streaming of SD-Video materials.

I also do not like the idea of putting another NIC into the one expansion slot as I might want to put another SATA controller there in the future.

Are there any fixes to be hoped for with future relases of FreeNas or is this a general problem with the "bad" onboard NIC of the C60M1?

#12 Updated by inaki_mtz - over 2 years ago

Hi everyone,
same problem using an ASUS C60M1-I board. I only have FTP enabled for file transfers and MiniDLNA for streaming media. If I'm uploading or downloading big files from the NAS, I get "re0: watchdog timeout" and MiniDLNA stops.
Currently using FreNAS 8.3.0 - P1 x64.

#13 Updated by cflemm - over 2 years ago

Same problem here with:

MB: Biostar A681-350 Deluxe
CPU: AMD Fusion 350D
NIC: On-board Realtek 8111F

Under heavy load I get intermittent and increasing amounts of "re0: watchdog timeout" errors until the connection fails altogether. Hard reboot of server required to get the NIC working again.

EDIT: I seem to be having some success with the following suggestion. I still get watchdog errors occasionally however they no longer as severe as to lead to a failed connection.
http://www.thewebernets.com/2011/06/20/freenas-re0-watchdog-timeout-error/

#14 Updated by owling - over 2 years ago

Replying to [comment:9 cflemm]:

EDIT: I seem to be having some success with the following suggestion. I still get watchdog errors occasionally however they no longer as severe as to lead to a failed connection.
http://www.thewebernets.com/2011/06/20/freenas-re0-watchdog-timeout-error/

(As stated in the original ticket) I have also tried disabling "Enable autotune" and "Tuneables", but it didn't work.

#15 Updated by William Grzybowski over 2 years ago

Long shot, but try to disable TSO before the problem happens:

  1. ifconfig re0 -tso

#16 Updated by Anonymous over 2 years ago

For the record, we (iX) have a Gigabyte and now ASRock board that uses this CPU/NIC chipset combo and have never seen any 'watchdog timeout' messages while testing the Gigabyte board under NAS load. The RTL8111E chip tends to just hang if there are Layer 1 Ethernet issues, so check your cabling/switch/etc. and ensure you are using quality Cat 6 cables and your switch is operating normally.

#17 Updated by cflemm - about 2 years ago

Replying to [comment:10 owling]:

Replying to [comment:9 cflemm]:

EDIT: I seem to be having some success with the following suggestion. I still get watchdog errors occasionally however they no longer as severe as to lead to a failed connection.
http://www.thewebernets.com/2011/06/20/freenas-re0-watchdog-timeout-error/

(As stated in the original ticket) I have also tried disabling "Enable autotune" and "Tuneables", but it didn't work.

My issue is still there, not solved. Very noticeable when under traffic from more than 1 source. I can copy files to my freeness server at 70MB/s and never get any timeout errors, but if somebody else is trying to access the server simultaneously, the timeout errors begin and both connections are interrupted.

#18 Updated by Djef - about 2 years ago

Same here with 8.3.1 p2... Many watchdog error, the link goes down and goes up many time..

Motherboard : C60M1-i

#19 Updated by Antonio Bugan over 1 year ago

owling - wrote:

When using a ASUS C60M1-I motherboard with a Realtek 8111F interface and copy alot of data via NFS, I looses connection after +/-20 minutes. The console prints:

I Have the same Issue with the "ASUS C60M1-I" (GB 6* WD Red 3TB)

@System Information

Hostname FreeNas.local
Build FreeNAS-8.3.1-RELEASE-p2-x64 (r12686+b770da6_dirty)
Platform AMD C-60 APU with Radeon(tm) HD Graphics
Memory 7773MB
System Time Tue Sep 03 13:39:34 CEST 2013
Uptime 1:39PM up 20 mins, 0 users
Load Average 0.82, 0.87, 0.76@--

Should I Try with an other NIC ?

#20 Updated by Jordan Hubbard over 1 year ago

  • Status changed from Unscreened to 3rd party to resolve
  • Seen in set to

Assuming realtek driver still sucks, this one is going to be a FreeBSD problem to fix. Just not on our roadmap.

#21 Updated by Martin Bailey 5 months ago

Jordan Hubbard wrote:

Assuming realtek driver still sucks, this one is going to be a FreeBSD problem to fix. Just not on our roadmap.

I felt it would be wasteful to buy an Intel NIC when the onboard RTL8111 works flawlessly in other operating systems, so I found a solution. After tweaking every possible setting with nothing resolving the constant watchdog timeout issue, I realized Realtek publishes its own driver for FreeBSD 9. It definitely looks like some binary firmware code is embedded in the source file, but I'm glad to report it works flawlessly. I can finally transfer files with 9KB MTU at 99% network utilization without a single hiccup.

A better solution would be to compare the registers being set for both the open source and binary FreeBSD drivers to understand how to fix the open source driver, but in the meantime, here's how to make your Realtek network work.

Download attached module or alternatively, compile driver yourself : Setup a FreeNAS jail with FreeBSD ports. Dowload the latest source code from Realtek site or see tgz attachment. Extract the source files to /usr/src/sys/dev/re and the Makefile to /usr/src/sys/modules/re. From the latter folder, type 'make'. Copy the if_re.ko module file outside the jail.

Move the if_re.ko module to your /boot/kernel folder. Add if_re_load="YES" to your loader config. Reboot and confirm you see "re0: version:1.88" in the 'dmesg' boot output.

Enjoy!

#22 Updated by Tom B 2 months ago

Martin Bailey wrote:

Jordan Hubbard wrote:

Assuming realtek driver still sucks, this one is going to be a FreeBSD problem to fix. Just not on our roadmap.

I felt it would be wasteful to buy an Intel NIC when the onboard RTL8111 works flawlessly in other operating systems, so I found a solution. After tweaking every possible setting with nothing resolving the constant watchdog timeout issue, I realized Realtek publishes its own driver for FreeBSD 9. It definitely looks like some binary firmware code is embedded in the source file, but I'm glad to report it works flawlessly. I can finally transfer files with 9KB MTU at 99% network utilization without a single hiccup.

A better solution would be to compare the registers being set for both the open source and binary FreeBSD drivers to understand how to fix the open source driver, but in the meantime, here's how to make your Realtek network work.

Download attached module or alternatively, compile driver yourself : Setup a FreeNAS jail with FreeBSD ports. Dowload the latest source code from Realtek site or see tgz attachment. Extract the source files to /usr/src/sys/dev/re and the Makefile to /usr/src/sys/modules/re. From the latter folder, type 'make'. Copy the if_re.ko module file outside the jail.

Move the if_re.ko module to your /boot/kernel folder. Add if_re_load="YES" to your loader config. Reboot and confirm you see "re0: version:1.88" in the 'dmesg' boot output.

Enjoy!

Old bug... surprised to find it's still causing issues in the 9.3x builds.
Driver issue... Not a bug?

Found I was getting the same errors when pushing through large files via ftp and wget between two FreeNAS Servers.

Martin... Thanks for your information, using the driver you provided worked and has actually made the interface far more responsive than it was in the past.

#23 Updated by Kamal Soor about 1 month ago

I too am having this problem since building my server, about 6 months ago.

I'm relatively new to FreeNas, so I'm not sure how to save a log to submit, sorry.

I access my FreeNas server via a Mac and use Apple AFP shares. Use the Freenas server mainly to store files and to run the Plex server.

I've had failures if I try to copy files from one volume to another, via the mac (drag to copy). although no problems using a unix copy when I log on using terminal and ssh. I'm not sure, but I think I also have had the freenas network link go down when I've tried to copy over very large amount of data. sometime it works and other times I have issues.

Just a few minutes ago I tried to copy two streams at the same time and got a failure and had to reboot my freeNas server.

I see there is a solution by Martin Bailey (post #21), but I'm not comfortable with fiddling and patching the system.

I had hoped that the freeness or BSD community would have fixed this issue considering how long it's been known. This doesn't give me a very good feeling, I have over 20TB of data on my server, it makes me nervous.
K

My system is a Asus Z87-A motherboard and yes it uses Realtek 8111GR gigabit lan controller
FreeNAS-9.3-STABLE-201503200528
Platform Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz
Memory 16230MB

Also available in: Atom PDF