| TCP/IP Troubleshooting: Part 1: An Introduction |
|
|
|
| Written by David Noel-Davies | |||||||
| Friday, 10 August 2007 | |||||||
|
This is the first of a series of articles on TCP/IP troubleshooting, and future articles will focus on key issues highlighted in this article. What do you think of when you hear the phrase "TCP/IP troubleshooting"? People who are visually imaginative may see a flowchart. More linear-minded types may see a series of numbered steps. Others (far too common) may feel a sense of inadequacy and frustration. TCP/IP troubleshooting should be simple, right? After all, it's just a protocol—a series of steps to transfer bits over the network. But what a protocol—four layers, and multiple protocols at each layer. The Traditional ApproachSome years ago when I first learned about TCP/IP networking, I was taught a simple follow-these-steps approach to troubleshooting problems. The method went something like this:
I call this the "brain-dead approach" because it's so methodical you can basically turn off your brain and just follow the steps. It's also somewhat inefficient, for it automatically assumes that your problem most likely starts with your own computer and that the problem is more likely to be closer to you (your network card, your computer's IP address configuration, your local subnet) than further away (other subnets). And it's a method that was probably developed before the Internet really took off, that is, before DNS became ubiquitous for name resolution and before firewalls and VPNs became a fact of life for most corporate networks. What I mean is this: one of your users says "I can't connect to the server right now." What could be the problem? It helps to dissect this simple sentence to understand the issues that may be involved. For example:
Is this the only user who has called in reporting network problems? If there are others, do they have similar issues? If so, then right away it's clear you don't need to take a brain-dead approach and begin your troubleshooting at the user's computer. Instead, the issue is most likely "out there" somewhere, and that could mean maybe your DNS server is offline or your DNS provider services may be experiencing difficulty. Or maybe a router on your internal network may be going crazy and dropping packets. Or maybe the server your users are trying to connect to may have crashed. You should also stop and think about any commonalities these users who are having problems may have. For example, are their machines all on the same subnet? If so, then maybe the default gateway for that subnet is misconfigured or the router crashed. Or maybe a contractor working in the plenum crawlspace has accidentally cut a network cable connecting the subnet's workgroup switch to the department's main Ethernet backbone switch. Or maybe someone malicious has installed a rogue DHCP server on that subnet and it's stealing machines as their leases come up for renewal and assigning them unroutable addresses to create a denial of service condition. If it's only that one user though who has the problem, then it's probably time to play brain-dead and start asking questions like "OK, is your computer turned on? Is the network cable securely attached at the back of your machine?" and so on.
A good question to ask this user is "What do you mean by connect?" That's because "connect" is a technical-sounding word that users often use to impress Help Desk to show they know what they're talking about. Well, they usually don't. Why? Because there are different kinds of connectivity including MAC-level communications, TCP sessions, password-authentication, access rights and privileges, NAT-traversal connectivity, firewall pass-through, application-level sessions, and so on. What kind of connectivity problem are they actually having? What are they actually trying to do when they say they want to "connect to" the server? Are they trying to access a share on that server? Do they get an "Access denied" message when they do this? Are they getting a login box prompting them for credentials? Is it rejecting their credentials? Are they having trouble finding the share in Active Directory? Is it a mapped drive they are having problems with? Are they trying to browse to find the server in My Network Places? And so on. And is it just that server they're having trouble connecting to, or are they having problems connecting to anything on the network? Determining the scope of the problem here is important: is connectivity failing in just one way or many ways?
You've got this user over here, and this server over there, and the network between. They can't connect. Why? Well, where exactly is that server anyway? Is it on the user's subnet? On an adjacent subnet? In a different department? On a different floor? In a different building? On a different continent? What kind of network connects the user with that particular server? A wired Ethernet LAN? A wireless LAN (WLAN)? A fractional T1 line? Frame Relay? A VPN tunnel over the Internet? A dial-up modem connection? Cable modem or DSL? First determine the type of connection (possibly several types) between the user and the server, and then ponder where things might break down. Maybe the CSU/DSU has gone wonky, try recycling its power or contact your service provider who should be monitoring it. Maybe the janitor is cleaning the server room and he bumped a power bar and an Ethernet switch has gone offline. Check for an alert message from your network management software, assuming you're using managed switches. Maybe there's been a power blackout at the remote branch office where that server is located. Call them on the phone and see what's happening. And is it server or servers? Is the user having trouble connecting to only that server or to other servers as well? Are others having problems connecting to other servers also? What are the commonalities (if any) between all the servers being affected? (Or apparently being affected—remember, the problem may be with the users' computers or more likely with the network infrastructure itself.)
The time element is crucial in troubleshooting. Did the problem just start happening? When was the last time you successfully connected to the server? How long has it been going on for? Is it continuous or intermittent? Intermittent network problems involving unreliable WAN links and other issues can be difficult to troubleshoot, especially if they're transient i.e. brief and occasional. Time can also help you relate the problem to other circumstances that might be impacting your network. Did the problem start this morning at 10 am? What else happened on your network around then? Were patches applied by a WSUS server? Did scheduled maintenance on a domain controller occur? Was a construction crew in the building compound using a backhoe to repair a water main break? A Structured ApproachMy own approach to TCP/IP troubleshooting is structured around three critical areas:
ConclusionTroubleshooting TCP/IP networks can be frustrating, but it can also be fun. In future articles we'll zoom in on the troubleshooting steps and tools you need to be able to do in order to successfully solve the issues that might arise on your network. Until then, stay connected!
Powered by !JoomlaComment 3.26
3.26 Copyright (C) 2008 Compojoom.com / Copyright (C) 2007 Alain Georgette / Copyright (C) 2006 Frantisek Hliva. All rights reserved." |
|||||||
| < Prev | Next > |
|---|








