5. The basics of networking
5.1. Motivation and plan
Although computer science and software engineering courses often put much of their time into very theoretical algorithms, such as CPU scheduling algorithms, the bulk of a computer kernel’s code has to do with networking drivers.
Networking software deals with a very imperfect world: network communications can drop out, stutter, come back or not come back, and get attacked by malicious parties. It has taken several decades to sort out how to do this well.
We will start by discussing the basic networking commands
5.2. Concepts before we start
Kaley Martinez has written a very good tutorial with nice pictures depicting a networking setup from a physical point of view. It would be good to look at those slides if you don’t have a feeling for how networks look.
We will focus here on running simple commands to see what the networking setup is on our computer.
Two important topics are worth investigating: the ideas of abstraction (how the operating system takes the big complex topic of network connections and turns it in to a set of commands and behaviors), and layers (the layers from physical up to application defined in the standard networking model). FIXME: I should put rererences to good articles on those topics.
Finally: many types of networks have existed and some still exist today, but we will only worry about the internet (forerly arpanet). This is defined by a set of protocols called TCP (transmission control protocol) and IP (internet protocol), so internet networking is often called TCP/IP networking.
5.3. Terminology
Our usual philosophy on tech terminology is even more important in the world of networking. You have to deal with a flood of acronyms and other terms, and not feel that you’re lost in it. Here are some acronyms that we will use.
- LAN
local area network
- WAN
wide area network
- TCP
transmission control protocol
- IP
internet protocol
- DNS
domain name service (maps names like wikipedia.org to their IP address, which today is 208.80.153.224 – see the
host
command)
You may now forget their precise meaning, but remember that they exist, they relate to computer networking, and that you can look them up when you need to delve into them.
5.4. Devices and interfaces
You can connect through the internet through a few differente types of devices:
Ethernet cards.
Radio links (wi-fi).
Modems with PPP or SLIP.
Virtual interfaces for when you don’t have physically separate machines. This comes up when you have virtual machines or containers.
Virtual interfaces for when you have physically separate machines, but you want to bridge them as if they were in a single virtual network. This comes up when you have virtual private networks.
Loopback interface: this just comes back to the computer itself. It is used for a variety of tests.
A computer can have several interfaces, but we will look at the situation where you have a single ethernet or wifi interface, since that’s what most desktop and laptop users will have.
The command we will use for much of this chapter is called ip
.
Old timers (like me) are more accustomed to the older ifconfig
,
but the world has now transitioned to using the ip
command.
To see what the network interfaces exist you can type:
$ ip link
[...]
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 80:ee:73:b1:0f:4d brd ff:ff:ff:ff:ff:ff
3: wlp3s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DORMANT group default qlen 1000
link/ether d8:fc:93:81:b4:2e brd ff:ff:ff:ff:ff:ff
[...]
There is a lont of information that comes out of that, so part of the skill you need to develop here (and elsewhere as well) is to ignore much of the output and zero in on what’s important. In this case that’s anything that starts with eth (ethernet) or en (ethernet) or wl (wireless LAN).
And within those entries you can ignore much of it and focus on
whether the interface is UP
.
5.5. Addresses
Note: I will only discuss ipv4 (traditional addresses) here. Although ipv6 is interesting, it is not yet necessary to learn about it, since most of the world still stumbles along with ipv4.
There are two addresses associated with your computer: the ip address (which is given to you by your network administrator, and can change), and the hardware address (also called MAC address or ethernet address), which is unique and hard-wired in your ethernet card.
An IP address is how you specify which machine you want to talk to. They look like 129.49.21.102, or 128.165.112.17, or 192.168.1.211, or 10.0.0.185, or anything else with 4 numbers in the range of [0, 254].
To find out your own address you can use the ip address
command.
I get this:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 80:ee:73:b1:0f:4d brd ff:ff:ff:ff:ff:ff
inet 192.168.1.211/24 brd 192.168.1.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::8a3f:464d:668:9835/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: wlp3s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether d8:fc:93:81:b4:2e brd ff:ff:ff:ff:ff:ff
[...]
Once again let’s focus on what’s important: the IP address, which
looks like 4 numbers with .
between them, and the MAC address,
which looks like six 2-digit hexadecimal numbers with :
in between
them. (Sometimes they will have -
instead of :
to separate
them).
The loopback interface lo always has address 127.0.0.1. We don’t pay much attention to that.
The eth0 interface is my link to the world, and it has IP address 192.168.1.211. If another computer is on my network and wants to talk to my machine, it will say “I want to talk to 192.168.1.211”.
Other parts of that information are seldom used.
Note that if you were on a laptop with a wi-fi card you would probably have a wl interface instead of eth or en.
So users always use the IP address. But the MAC address (in my case 80:ee:73:b1:0f:4d) is used under the hood to uniquely identify my machine.
A final thing to mention is the idea of “unrouted IP addresses”.
There are some devices (most, in fact!) which do not need to be visible by the outside network. Nobody is going to come to a web page on your thermostate or your printer from outside.
So the IP addresses given out in a typical home network are not visible to the outside world, and there are specific ranges of IP addresses that are designated just for this purpose. Adresses that start with 10 or with 192.168 are of that type.
The fact you don’t “dial in” to these addresses means that they do not need to be unique, and in fact every household or office can give out a lot of addresses in the 10 and 192.168 ranges. As you saw, my home machine is a 192.168 address.
To better understand these “special ranges” you can look at the “Reserved IP addresses” article in wikipedia:
5.6. Routing
What if you are going to a host that is not on your local network? Like a host across the world?
New devices are put on the internet every few seconds, and their addresses can change a lot, as well as the path of connected routers to get there. This means that there is no hope of your host knowing the path to any possible device in the world.
What you need to know and do is:
I can find any device on my local network - the local network is aware of them and passes my network packets along.
I can then tell my router about any non-local addresses, and the router will pass them on to a trusted next step.
Eventually a next step in routing will be one of those special routers which actually keep up to date information on various networks around the world, and they will take care of it. This step is deliberately fuzzily stated because there is a lot of detailed work that happens there.
How do you find out what the routing is on your host and your network?
The ip route
command gives information about the “first step”.
Here is what mine shows:
$ ip route
default via 192.168.1.1 dev eth0 proto dhcp metric 100
169.254.0.0/16 dev virbr0 scope link metric 1000 linkdown
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.18.0.0/16 dev br-6dbe8ffac78b proto kernel scope link src 172.18.0.1 linkdown
192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.211 metric 100
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
Yours is probably a bit simpler. The important thing to notice in this mass of output is that:
All addresses that start with 192.168.1 are to be reached through device eth0, or something like wlp2s0 for wifi, or enp2s0 for more recent wired ethernet.
You can ignore the lines with 169.254 and 172 addresses - they are for special virtual machines that I run on my machine.
If an address is not listed, then you pass those packets on to the address designated by default.
Default is 192.168.1.1. In my house this will go to the internet service provider, and they can deal with it after that.
To trace the route between your computer and another you could use the traceroute command:
$ traceroute --resolve-hostnames en.wikipedia.org
traceroute to en.wikipedia.org (208.80.153.224), 30 hops max, 60 byte packets
1 _gateway (192.168.1.1) 0.314 ms 0.437 ms 0.531 ms
2 96.120.1.161 (96.120.1.161) 9.683 ms 9.719 ms 16.250 ms
3 po-302-1216-rur02.santafe.nm.albuq.comcast.net (96.110.17.113) 16.194 ms 16.299 ms 16.337 ms
4 be-2-ar02.albuquerque.nm.albuq.comcast.net (68.86.182.221) 42.389 ms 42.463 ms 42.502 ms
5 be-33654-cr02.losangeles.ca.ibone.comcast.net (68.86.166.65) 63.547 ms 65.250 ms 63.745 ms
6 be-1302-cs03.losangeles.ca.ibone.comcast.net (96.110.39.225) 63.144 ms 55.929 ms 56.521 ms
7 be-2301-pe01.losangeles.ca.ibone.comcast.net (96.110.44.106) 56.154 ms 62.974 ms 62.909 ms
8 50.242.151.62 (50.242.151.62) 63.154 ms 63.184 ms 63.213 ms
9 ae-9.r24.lsanca07.us.bb.gin.ntt.net (129.250.3.235) 71.075 ms 71.200 ms 75.202 ms
10 ae-3.r24.dllstx09.us.bb.gin.ntt.net (129.250.7.68) 74.460 ms 60.323 ms 57.871 ms
11 ae-0.a01.dllstx04.us.bb.gin.ntt.net (129.250.4.178) 61.039 ms 61.089 ms 59.615 ms
12 * * *
13 * * *
[...]
The later hops in that chain did not give useful information, and at some . This is because it is voluntary for a router to offer the protocol used by traceroute. But you get to see a bit of how your packets propagate on their way to wikipedia.
There is now a pleasant modern alternative called mtr
:
$ mtr en.wikipedia.org
5.7. Name service
We don’t usually think of IP addresses - they would be hard to remember. So the internet offers something called the domain name service, or DNS, which maps names to IP addresses.
To find the IP address you can use the host
command:
$ host en.wikipedia.org
en.wikipedia.org is an alias for dyna.wikimedia.org.
dyna.wikimedia.org has address 208.80.153.224
dyna.wikimedia.org has IPv6 address 2620:0:860:ed1a::1