How is your app running on the internet? (part2)
It is in your best interest to know enough about the internet
Continuing from part 1
https://medium.com/@yatmanwong/apr1-placeholder-299b8173cdb4
The transport layer
The network layer figures out how to send packages from one network to another network. The transport layer finalizes the concept of communication by ensuring the end to end delivery of the entire message.
Port is introduced at this layer to enable one process from one computer to connect to another process from another computer. Now we are not only delivering the letter to the right house, but also to the right person in the house.
This is why multiple processes can use the same network connection simultaneously. Also, transport layer provides the reliability needed to ensure the communication is successful. Different transport layer protocols provide different level of error handling like congestion control, flow control, error detection, re-transmission, etc.
Tell me more about Port
We need three basic pieces of information to locate any process on the internet, they are: IP Address
, Port number
,Protocol.
It is clear why we need port- to differentiate multiple processes running on the same computer. But beware, not all the network protocol need a port for communication.
A port number is just a 16-bit unsigned integer, thus ranging from 0 to 65535. There are well known ports like 80 for HTTP and 443 for HTTPS.
Will there be problem if multiple clients connect to the same port?
When you want to make a HTTPS call, your browser will use one of the random port in your computer to connect to port 443 to the remote server. If you open another tab and go to the same website, your browser will find another random port to connect to port 443 to the same server. Your browser is not using port 443, the server is listening on 443.
Since the clients on the same machine have different port, they are treated as separate connection and have no problem.
Now, if you have multiple application in the server listening at the same port using the same protocol, then you will run into port conflict error. That’s why you cannot run two spring application locally both on port 8080 for example.
What is the different between a port and a socket?
It is easy to confuse them because they can both be thought as some “endpoint” in a connection. The different is socket consist of the IP address and the port of both the source and destination, while port is just the number to uniquely identify a process.
Socket is usually use in the context of programming. In which you can create these “socket” objects and you can open them, close them, read and write data from them. It is an interface for the application layer to invoke transport layer service.
TCP vs UDP
The Transport layer protocols define how to provide end-to-end communication services for applications.
There are actually many other out there for different niche uses. But for most developers, they just need to know enough about Transmission Control Protocol (TCP) and User Datagram Protocol (UDP.)
UCP is connectionless, meaning it doesn’t need to make a connection first before sending out data.
- It has basic error detection and just drop the segment if it is corrupted
- does not resent lost segments
- does not guarantee package delivery in order
- no congestion control
- hence lightweight and smaller header sizes
TCP need to do a three-way handshake to establish a connection
- the receiver will send an acknowledgement segment on every segment it receives
- support retransmission
- rearrange the segments to guarantee in order delivery
- support congestion control with delay transmission
- bigger header, bigger overhead
Because TCP is so reliable, it is used in most common task like sending text, downloading file, etc where we don’t mind a little more processing time if we have the complete and error free data.|
UDP is great for anything real time like video streaming, games update where lost segments are better than noticeable lag.
Application Layer
The application layer perform functions that is specific to applications, such as send emails, browse website, download file, … anything the user wants to send and consume data.
While the transport layer focuses on reliable data transport between hosts, the application layer serves as an interface between the software applications and end users.
The application layer protocol defines the rules, standards, and procedures for communication between applications.
HTTP
The Hypertext Transfer Protocol (HTTP) was developed back in 1996 to fetch html pages. The methods GET, POST, PUT, DELETE and even the status code are all HTTP concepts. Also it defines its own header format:
HTTP use TCP connection underneath to ensure reliability. After HTTP 1.1, multiple requests to the same server can reuse the same TCP connection.
For more about enhancement from each HTTP versions.
Once the server has finished sending the complete response to the client, the HTTP connection is terminated.
With so many application specific requirements, now you should see why there is a need for a whole application layer protocol on top of the transport layer protocol.
HTTP vs HTTPS
HTTP sends data in clear text. HTTPS added the encryption. The encryption process is base on Transport Layer Security (TLS), which is a cryptographic protocol built on top of the now-deprecated SSL (Secure Sockets Layer.) This is why we can say HTTPS is HTTP + TLS.
TLS define the series of steps needed to securely pass the encryption keys between the client and servers. The below graph help visualize the steps.
- first notice TLS requires TCP connection
- second a certificate from server is needed for client to obtain the server public key
- third the client generate a session key and encrypted with the server public key from certificate and send back to server (asymmetric encryption)
- finally both side can use session key to do symmetric encryption for the rest of the communication
At the end all the message is symmetricly encrypted because asymmetric encryption is computationally expensive.
Web Socket
These are actually many well known application layer protocols beside HTTP and HTTPS. Wiki has a full list.
There is also web socket as an alternative form of communication between process. It also has its secure version which is WebSocket Secure (wss.)Same as HTTPS, wss also use the same TLS underneath for encryption.
The benefit of using web socket is real time data delivery. Think about the red notification icon on most website, in HTTP the client has to initiative a request from time to time to check if anything new comes up. But in web socket, once a connection is establish, both side can send and receive data until either side terminate the connection.
HTTP is a stateless protocol where each request-response is independent. WebSocket establishes a long-lived connection that remains open as long as needed. This persistent connection allows the client and server to exchange data in real-time without the overhead of establishing a new connection for each interaction.
Of course web socket will have its drawback such as it limit the server from scaling horizontally. Because that one machine in the middle of a connection cannot just transfer that connection to another machine.
Secure Socket Shell (SSH)
SSH is yet another application layer protocol. If you use Github, you have probably set it up already. I am not interested in how it works, but I want to find out why Github use SSH instead of the available HTTPS.
SSH, as the name suggest, is already secure. But it doesn’t follow TLS. It does its own thing for authentication; that’s why we have to use the ssh-agent to generate a public SSH key and upload to Github.
The reason there is a need for SSH when there is already HTTPS for is because HTTPS is mostly used for websites. SSH is specifically designed for secure remote access and command-line management of remote systems.
When accessing a website over HTTPS, users typically enters their username and password into a login form provided by the website (unless they are using OIDC.) With SSH, users are authenticate using public key instead.
SSH is more convenience and you don’t have to enter your passwords every time you interact with a repo. This is why Github switched to SSH these days.
Ending Thoughts
These are the application layer protocols I have used and seen up till now. And I think most developers should know enough how these commonly used protocols.
Now we have established a good understand of the transport layer & application layer, we can see many API architecture we use day to day are based on top of these protocols. Like RESTful API is based on HTTPS.
We also have other sophisticated way of communication like Kafka and message queue that are also likely based on something from application layer.
Reference:
https://www.baeldung.com/cs/osi-transport-vs-networking-layer
https://www.cloudflare.com/learning/network-layer/what-is-a-computer-port/