========================== The basics of networking ========================== Motivation and plan =================== Although computer science and software engineering courses often put much of their time into very theoretical algorithms, such as CPU scheduling algorithms, the bulk of a computer kernel's code has to do with networking drivers. Networking software deals with a very imperfect world: network communications can drop out, stutter, come back or not come back, and get attacked by malicious parties. It has taken several decades to sort out how to do this well. We will start by discussing the basic networking commands Concepts before we start ======================== Kaley Martinez has written a very good tutorial with nice pictures depicting a networking setup from a physical point of view. It would be good to look at those slides if you don't have a feeling for how networks look. We will focus here on running simple commands to see what the networking setup is on our computer. Two important topics are worth investigating: the ideas of *abstraction* (how the operating system takes the big complex topic of network connections and turns it in to a set of commands and behaviors), and *layers* (the layers from physical up to application defined in the standard networking model). FIXME: I should put rererences to good articles on those topics. Finally: many types of networks have existed and some still exist today, but we will only worry about the *internet* (forerly arpanet). This is defined by a set of protocols called TCP (transmission control protocol) and IP (internet protocol), so internet networking is often called TCP/IP networking. Terminology =========== Our usual philosophy on tech terminology is even more important in the world of networking. You have to deal with a flood of acronyms and other terms, and not feel that you're lost in it. Here are some acronyms that we will use. LAN local area network WAN wide area network TCP transmission control protocol IP internet protocol DNS domain name service (maps names like wikipedia.org to their IP address, which today is 208.80.153.224 -- see the ``host`` command) You may now forget their precise meaning, but remember that they exist, they relate to computer networking, and that you can look them up when you need to delve into them. Devices and interfaces ====================== You can connect through the internet through a few differente types of devices: * Ethernet cards. * Radio links (wi-fi). * Modems with PPP or SLIP. * Virtual interfaces for when you don't have physically separate machines. This comes up when you have virtual machines or containers. * Virtual interfaces for when you have physically separate machines, but you want to bridge them as if they were in a single virtual network. This comes up when you have virtual private networks. * Loopback interface: this just comes back to the computer itself. It is used for a variety of tests. A computer can have several interfaces, but we will look at the situation where you have a single ethernet or wifi interface, since that's what most desktop and laptop users will have. The command we will use for much of this chapter is called ``ip``. Old timers (like me) are more accustomed to the older ``ifconfig``, but the world has now transitioned to using the ``ip`` command. To see what the network interfaces exist you can type: .. code-block:: console $ ip link [...] 2: eth0: mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether 80:ee:73:b1:0f:4d brd ff:ff:ff:ff:ff:ff 3: wlp3s0: mtu 1500 qdisc noqueue state DOWN mode DORMANT group default qlen 1000 link/ether d8:fc:93:81:b4:2e brd ff:ff:ff:ff:ff:ff [...] There is a lont of information that comes out of that, so part of the skill you need to develop here (and elsewhere as well) is to ignore much of the output and zero in on what's important. In this case that's anything that starts with *eth* (ethernet) or *en* (ethernet) or *wl* (wireless LAN). And within those entries you can ignore much of it and focus on whether the interface is ``UP``. Addresses ========= Note: I will only discuss ipv4 (traditional addresses) here. Although ipv6 is interesting, it is not yet necessary to learn about it, since most of the world still stumbles along with ipv4. There are two addresses associated with your computer: the ip address (which is given to you by your network administrator, and can change), and the hardware address (also called MAC address or ethernet address), which is unique and hard-wired in your ethernet card. An IP address is how you specify which machine you want to talk to. They look like 129.49.21.102, or 128.165.112.17, or 192.168.1.211, or 10.0.0.185, or anything else with 4 numbers in the range of [0, 254]. To find out your own address you can use the ``ip address`` command. I get this: :: 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 80:ee:73:b1:0f:4d brd ff:ff:ff:ff:ff:ff inet 192.168.1.211/24 brd 192.168.1.255 scope global noprefixroute eth0 valid_lft forever preferred_lft forever inet6 fe80::8a3f:464d:668:9835/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: wlp3s0: mtu 1500 qdisc noqueue state DOWN group default qlen 1000 link/ether d8:fc:93:81:b4:2e brd ff:ff:ff:ff:ff:ff [...] Once again let's focus on what's important: the IP address, which looks like 4 numbers with ``.`` between them, and the MAC address, which looks like six 2-digit hexadecimal numbers with ``:`` in between them. (Sometimes they will have ``-`` instead of ``:`` to separate them). The loopback interface *lo* always has address 127.0.0.1. We don't pay much attention to that. The *eth0* interface is my link to the world, and it has IP address 192.168.1.211. If another computer is on my network and wants to talk to my machine, it will say "I want to talk to 192.168.1.211". Other parts of that information are seldom used. Note that if you were on a laptop with a wi-fi card you would probably have a *wl* interface instead of *eth* or *en*. So users always use the IP address. But the MAC address (in my case 80:ee:73:b1:0f:4d) is used under the hood to uniquely identify my machine. A final thing to mention is the idea of "unrouted IP addresses". There are some devices (most, in fact!) which do not need to be visible by the outside network. Nobody is going to come to a web page on your thermostate or your printer from outside. So the IP addresses given out in a typical home network are not visible to the outside world, and there are specific ranges of IP addresses that are designated just for this purpose. Adresses that start with 10 or with 192.168 are of that type. The fact you don't "dial in" to these addresses means that they do not need to be unique, and in fact every household or office can give out a lot of addresses in the 10 and 192.168 ranges. As you saw, my home machine is a 192.168 address. To better understand these "special ranges" you can look at the "Reserved IP addresses" article in wikipedia: https://en.wikipedia.org/wiki/Reserved_IP_addresses Routing ======= What if you are going to a host that is not on your local network? Like a host across the world? New devices are put on the internet every few seconds, and their addresses can change a lot, as well as the path of connected routers to get there. This means that there is no hope of your host knowing the path to any possible device in the world. What you need to know and do is: #. I can find any device on my local network - the local network is aware of them and passes my network packets along. #. I can then tell my *router* about any non-local addresses, and the router will pass them on to a *trusted next step*. #. Eventually a next step in routing will be one of those special routers which actually keep up to date information on various networks around the world, and they will take care of it. This step is deliberately fuzzily stated because there is a lot of detailed work that happens there. How do you find out what the routing is on your host and your network? The ``ip route`` command gives information about the "first step". Here is what mine shows: .. code-block:: console $ ip route default via 192.168.1.1 dev eth0 proto dhcp metric 100 169.254.0.0/16 dev virbr0 scope link metric 1000 linkdown 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 172.18.0.0/16 dev br-6dbe8ffac78b proto kernel scope link src 172.18.0.1 linkdown 192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.211 metric 100 192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown Yours is probably a bit simpler. The important thing to notice in this mass of output is that: * All addresses that start with 192.168.1 are to be reached through device eth0, or something like wlp2s0 for wifi, or enp2s0 for more recent wired ethernet. * You can ignore the lines with 169.254 and 172 addresses - they are for special virtual machines that I run on my machine. * If an address is not listed, then you pass those packets on to the address designated by *default*. * Default is 192.168.1.1. In my house this will go to the internet service provider, and they can deal with it after that. To trace the route between your computer and another you could use the *traceroute* command: .. code-block:: console $ traceroute --resolve-hostnames en.wikipedia.org traceroute to en.wikipedia.org (208.80.153.224), 30 hops max, 60 byte packets 1 _gateway (192.168.1.1) 0.314 ms 0.437 ms 0.531 ms 2 96.120.1.161 (96.120.1.161) 9.683 ms 9.719 ms 16.250 ms 3 po-302-1216-rur02.santafe.nm.albuq.comcast.net (96.110.17.113) 16.194 ms 16.299 ms 16.337 ms 4 be-2-ar02.albuquerque.nm.albuq.comcast.net (68.86.182.221) 42.389 ms 42.463 ms 42.502 ms 5 be-33654-cr02.losangeles.ca.ibone.comcast.net (68.86.166.65) 63.547 ms 65.250 ms 63.745 ms 6 be-1302-cs03.losangeles.ca.ibone.comcast.net (96.110.39.225) 63.144 ms 55.929 ms 56.521 ms 7 be-2301-pe01.losangeles.ca.ibone.comcast.net (96.110.44.106) 56.154 ms 62.974 ms 62.909 ms 8 50.242.151.62 (50.242.151.62) 63.154 ms 63.184 ms 63.213 ms 9 ae-9.r24.lsanca07.us.bb.gin.ntt.net (129.250.3.235) 71.075 ms 71.200 ms 75.202 ms 10 ae-3.r24.dllstx09.us.bb.gin.ntt.net (129.250.7.68) 74.460 ms 60.323 ms 57.871 ms 11 ae-0.a01.dllstx04.us.bb.gin.ntt.net (129.250.4.178) 61.039 ms 61.089 ms 59.615 ms 12 * * * 13 * * * [...] The later hops in that chain did not give useful information, and at some . This is because it is voluntary for a router to offer the protocol used by traceroute. But you get to see a bit of how your packets propagate on their way to wikipedia. There is now a pleasant modern alternative called ``mtr``: .. code-block:: console $ mtr en.wikipedia.org Name service ============ We don't usually think of IP addresses - they would be hard to remember. So the internet offers something called the *domain name service*, or DNS, which maps names to IP addresses. To find the IP address you can use the ``host`` command: .. code-block:: console $ host en.wikipedia.org en.wikipedia.org is an alias for dyna.wikimedia.org. dyna.wikimedia.org has address 208.80.153.224 dyna.wikimedia.org has IPv6 address 2620:0:860:ed1a::1