The Internet Protocol Suite consists of multiple protocol specifications that are implemented by almost all computers and other network devices. One primary abstraction available to the developers to process communication scenarios is a connection. The developer would get typically get hold of a connection object and send and receive messages through it from the application layer.
Available to the application developer are 2 widely used connection protocols - Transmission Control Protocol and User Datagram Protocol. While both the protocols are in general use, TCP is the most common. It is also the protocol that developers are mostly going to concern themselves with. Therefore, this post will focus on TCP.
There is abundant literature on the exact packet structure at different layers on the internet. This post will limit itself to discussing TCP connections - what are they and how to use and reason about them in your applications.
Let’s start with some definitions.
Network
An interconnected group of computers that can send and receive information from one other defines a network. Different network topologies define how the computers should be connected to one other. The Internet is one such network.
IP Address
An IP address is an identifier for a machine in a network. It was initially 32 bits in size, but for more than 2 decades now, a 64-bit version is also in use. The limitation of the 32-bit address is that there are more than 232 devices connected to the internet. There are some mitigation strategies for this limitation that change how these 32-bit addresses are handled and exposed.
Here’s how the 2 versions of IP addresses are expressed for this site.
- 32-bit also called IpV4 :
139.180.190.208
- 64 bit also called IpV6 :
2001:19f0:4400:78f6:5400:3ff:fe17:349c
Port number
This is a logical construct identifying a process in a computer.
An application, when it needs to send/receive packets from another computer, it needs to create a binding to a port. When an application sends a packet intended for a specific application on another computer, the operation system of the target computer looks at the port number in the incoming packet and forwards the packet to the application process that is listening for packets on this port.
It is an integer from 1 to 65536. The operating system might prevent binding to port numbers less than 1000 as these are well-known ports. These run well-known processes to optimise routing at the OS layer. Examples are - port 80 for web server applications and port 22 for SSH.
TCP and UDP
These are 2 widely used connection protocols. Under TCP, the application gets a guarantee that the target computer indeed received the packet (an acknowledgement). For this reason, TCP is called a connection-oriented protocol. UDP doesn’t provide an acknowledgement and is therefore called a connection-less protocol.
The crux of these protocols is that they add header and footer metadata on the message to construct a packet. When such a packet is delivered to the network, the network takes care of forwarding the packet to the correct target using these headers and footers.
Connection
A connection is defined as the 5-tuple - (sourceIp, sourcePort, destinationIp, destinationPort, protocol).
- sourceIp - IP address of the source in the network.
- sourcePort - Port number used by the source process.
- Similarly for destination Ip and Ports.
- protocol - TCP or UDP.
We will focus on just TCP protocol here.
When we say that a TCP connection is established, both the source and the destination computers know about this 5-tuple. Any new packet with this information will be quickly forwarded to the correct process at the receiver. And any response on that packet, will use the same 5-tuple and be forwarded to the network.
TCP Connection handshake
A message in the below paragraphs is a connection + some message bytes.
To establish a TCP connection, the source computer sends a message called SYN - signalling the start of connection establishment. The destination computer responds with SYN-ACK - indicating that it is ready to accept a connection. The source receiving the SYN-ACK validates the target IP address. The source computer now sends an ACK - signalling that it will send messages using that 5-tuple from this point onwards. The destination receiving this ACK means that the destination’s previous message reached the source, validating the IP address of the source for the destination.
- Source : Send SYN, (IPsource, Portsource, IPdest, Portdest, TCP)
- Destination : Receive SYN, (IPsource, Portsource, IPdest, Portdest, TCP)
- Destination : Send SYN-ACK (IPdest, Portdest, IPsource, Portsource, TCP)
- Source : Receive SYN_ACK (IPdest, Portdest, IPsource, Portsource, TCP)
- Source : Send ACK (IPsource, Portsource, IPdest, Portdest, TCP)
- Source : Mark connection as established.
- Destination : Receive ACK (IPsource, Portsource, IPdest, Portdest, TCP)
- Destination : Mark connection as established.
This handshake is also called the 3-way handshake. This establishes the source and the target IP address validity.
Security
With TCP, the destination can set up IP firewall rules to only accept packets from specific sources, as well as only send packets to specific sources.
But this is not enough.
If the network is compromised, malicious agents can inspect the data segment in the packet. Therefore, in practice, apart from the 3-way handshake, there is also the TLS handshake that establishes a connection to also send encrypted messages that can only be read by the source and destination and not by anyone intercepting the message.
Cost of a connection
Because of the multiple handshakes involved, developers should keep in mind that establishing a new connection is an expensive process. A TLS connection takes around 1 to 2 seconds to set up. When the source and targets are fixed, reusing a connection is of utmost importance.
What’s next
While IP Addresses identify the destination computer, it is not practical to remember the IP addresses of different machines. IP Addresses have proxies to them called Domain Names. In the next post, I will try to talk about how a Domain Name maps to a specific IP address.