View Full Version : Random Network Outages

05-14-2015, 09:30 AM
A post published on petri.co.il forum (https://www.petri.com/forums/forum/networking/general-networking/68754-random-network-outages).

__________________________________________________ ___________

We have a networking issue that I hope someone may be able to help me with.

We run a Windows domain - single domain, single subnet. Three domain controllers: Windows 2012 R2 Standard, Windows 2008 Standard (32bit) and Windows Server 2003 Standard. The Win 2008 DC holds all the FSMO roles. DNS is setup on the 2008 and 2012 DC's with the 2008 DC having a secondary DNS server installation. DHCP is setup on the 2008 DC. The 2003 server only functions as a DC - no roles apart from that are installed and it will be demoted very soon.

Two member servers: Windows Storage Server 2008 (64bit) and Windows 2012 Standard. The Storage server hosts 99% of our data with the other 1% being on the 2008 DC. Two email server software installations (not Exchange) - one on the storage server and one on the 2012 R2 member server. NPS Routing and Remote Access is configured on the 2012 member server to handle VPN connections.

35 client PC's: 34 run Windows 7 and one runs Vista.

We have two Draytek routers on the network - one acts as the gateway to the Internet and the other provides wireless coverage. There are two networked printers - a Ricoh MFD 'workgroup' printer and a small mono Brother.

Network shares are accessed via DFS. The Servers, the Ricoh printer and routers have static IP's, most of the clients and the Brother printer use DHCP.

All cables terminate at one of two patch panels which then feed to one or more switches. Small desktop switches are used to expand the network where needed. The network is divided into two segments, hence two patch panels, but both run under the same 192.168.0.xxx/ subnet.

Before we upgraded all our computers to Windows 7 the network was fine. The present network was built using CAT5 cabling in 2004 (we'd used BNC before that). We rarely had any network issues and when we did it would affect all clients. When I first introduced Windows 7 it was on three PC's and one or more of them would randomly have problems accessing the network and Internet.

When I upgraded all our machines to Windows 7 we are seeing one or more machines experiencing network problems most days.

What happens is the (any) computer will suddenly stop - Applications accessing files across the network e.g. email and Access will hang and report as (not responding) and the 'busy' cursor appears. Try and save an office document and the busy cursor appears and the application hangs. The Start menu is not accessible - for example I always have my Taskbar hidden and when this happens on my machine moving the mouse to the bottom of the screen does nothing, the Taskbar stays hidden. Sometimes, the Taskbar may appear, but nothing happens when the Start button is clicked. When trying to access shared folders via Computer the green bar slowly moves through the Address Bar and after a while it reports the share is not accessible. The affected computers are also unable to open any web pages.

The hang will last for anything from 20 secs to a minute or more after which the computer will continue operating normally. On very rare ocassions the the computer will not recover after 10mins or more and I have to force a shutdown but I assume this is not directly related to the issues I am seeing.

The short outages are completely random and leave no trace of a problem in the System or Application Logs. When a machine does not recover (which is very rare) the System Log reports that a DNS server could not be reached.

This will happen on a single PC and others will be fine. But, it may affect several PC's over the course of a day.

When the outage happens I can Winkey+R, open cmd.exe and successfully ping the DNS servers by IP and name.

All the systems are up to date. I ensure that Windows Updates are installed on the Servers when they are released and the clients are updated the next day via WSUS.

I am pretty sure that this is also affecting Active Directory. I am seeing transient errors on the Domain Controllers where, for example, a Global Catalog can not be contacted. Both DC's are GC's and when I run nltest to test the connection to a GC within 30 minutes of the error being reported the test passes. I assume the servers are also experiencing these random outages.

The problem I have is that because they are random and because, on the clients at least, no errors are logged I cannot reproduce the problem and have no idea what may be causing the issue.

The problem is not a general network issue as it randomly affects one client at a time - if was a general network problem I would expect all the clients to lose connectivity.

Has anyone else seen a similar issue and know what the cause was, please?


05-14-2015, 09:33 AM
An answer from an user from that forum.

Hi, i've played with the free version of Capsa before. It's quite nice. It will show you in great detail whats happening with the network traffic. Can't hurt to give it a try. http://www.colasoft.com/download/products/capsa_free.php

Edit: Another great monitoring program which I can recommend is PRTG, this one is better for monitoring endpoints. It's free for 30 days, or has a limited freeware version https://www.paessler.com/prtg/download

Both will take a bit of playing with to get used to but hopefully they can help to shed some light on what's happening.