Part I: Fundamental Webserver Management with InterWorx Up Part I: Fundamental Webserver Management with InterWorx Chapter 2: Definition of Terms 

1 A Brief primer on HTTP and Apache

This chapter may be somewhat technical and if you are just looking for information about interface elements in the control panel, you might want to head to chapter 3↓. Since the beginning of Internet, man has sought to deliver content across the series of tubes that moves data across the world at lightning fast speeds. From the bowels of government funded defense research and university high-speed communication networks, the world wide web and the HTTP protocol was spawned. Furthermore, HTTP protocol and HTML markup language gave birth to what most in the world understand as the internet today. For the most part, the software that runs on servers that performs the function of responding to HTTP requests is the Apache Web Server, an open source, high performance, fully modular, and highly support HTTP and HTTPS webserver software system. In layman’s terms, it is the program that listens on port 80, recieves requests for specifc documents or files on the webserver, and sends them back to the web browser that requested them.

1.1 A primer of HTTP

This section primarily deals with the basics of TCP and HTTP just so we are on the same page for the rest of the guide. Feel free to skip to the next session if you already have an understanding of the OSI layer and the HTTP protocol. We say lofty things like HTTP protocol and TCP protocol a lot in this guide without really explaining what they mean. If you are a server admin without a strong networking or programming background, these terms might seem somewhat vague. A protocol is essentially a contract. It dictates how two devices or two pieces of software are supposed to talk to each other so they can understand each other. It ensures that there is understanding and when one side says something, the other side can respond appropriately.
Physical Layer
It is well known that information travels over various mediums - copper wire, fiber optic cable, wireless signal. Each medium has the goal of ultimately communicating one thing: 1’s and 0’s between devices. 1’s and 0’s are the basic unit of information for a computer and thus each medium has it’s own protocol on what means a 1 and what means a 0. For example on fiber optic, you might have a pulse of light be a 1 and an absence be a 0. You might say that there is a certain frequency to which the pulses come so both devices on each side of the cable knows what’s a 1 and what’s a 0 and what’s just some blank time between a series of two 1’s.
Internet Layer
Well this is all well and good, but many devices share the same communication pathways - heck undersea cables can carry information across the oceans for billions of devices. How do we know what 1’s and 0’s are from who and what each 1 and 0 means? Well we can logically bundle a group of 1’s and 0’s together into something called a packet - and each packet can be a chunk of information from one devices headed to another device somewhere else. Every device has a unique address which is dictated by a set of thirty-two 1’s and 0’s (often referred to as an IP address) and each packet has information about how large it is and other meta information. As such, each packet’s 1’s and 0’s are bundled together and split up so that each n number of bits represents a different piece of information in the packet. You network card has some basic circuitry that is able to detect when a series of ones and 0’s is the beginning of a packet and it starts processing the stream and loading data into a buffer for processing by the operating system. This is known as the Information or IP protocol and it is the fundamental protocol of the internet. Every packet starts with a set of information that allows packets to be routed along the way to their intended recipient.
Transport Layer
While the internet layer provides nice things like routing and atomicity for chunks of data from a given device, the IP layer really has no ability to:
  1. Ensure that one party doesn’t overwhelm the other party with too much data if one of the devices is too slow to keep up with the rate of the other device.
  2. Ensure that data packets sent to the remote server actually arrive there and are re-sent as necessary
  3. Ensure that data packets arrive in order and are reconstructed to be in order on the destination side if necessary
  4. Modify the transmission rates of packets if high network latency or data loss is detected - i.e. “congestion control”
This is where the TCP protocol comes in. It allows us to send data between two devices with very good reliability at the cost of speed. It is the fundamental transport layer protocol for most data transmission on the internet where data integrity is important. With TCP we are essentially able to send messages - actual human readable text-messages between two devices and that is how most software programs like webservers and web-browsers talk to each other - on top of the TCP protocol using normal human-readable text.
Application Layer
Web browsers and web servers talk to each other in what looks like text messages. This is the HTTP protocol. Typically when a web browser connects to a web server, it establishes a TCP connection and sends a text message like:
GET /index.html HTTP/1.1
Completely human-readable, right? No wierd codes or symbols, just text. The first line is the browser saying “get me a document called index.html, by the way I am speaking in HTTP protocol version 1.1”. The second line is telling the server “Oh and the domain that the user entered into the URL bar is”. This will be important later when understanding how multiple domains can live on the same IP address. Then your server might respond with something like:
HTTP/1.1 200 OK 
status: 200 OK 
version: HTTP/1.1 
content-length: 28278 
content-type: text/html; 
date: Sat, 18 Aug 2012 00:26:50 GMT 
Which is the server’s way of saying:
  1. I too speak HTTP version 1.1, I am returning status code 200 which means that I was able to locate the document you were requesting.
  2. By the way, the status was 200 OK, just in case you forgot
  3. And also again, the version of HTTP is 1.1
  4. The content length is 28278 bytes, meaning that’s how much data you should expect from me.
  5. The content type is text/html. Get ready to parse it!
  6. And here’s today’s date.
After this what’s called the response header, the server will send the contents of the file I requested to my browser. My browser will hide the header from me since it isn’t important and render the HTML according to the rules of how HTML is supposed to work.

1.2 How InterWorx ships the Apache Webserver

When you install InterWorx, the installer will remove the version of Apache that may have shipped with your operating system and install the one which is built and maintained in the InterWorx repositories by InterWorx proper. We do this for two reasons:
  1. Ability to set certain build parameters to settings that work with InterWorx’s docroot structure
  2. We don’t have to rely on the OS maintainers for web server updates.

1.2.1 Multi Processing Module

Apache is highly modular. So modular in fact, you can choose how apache deals with multiple HTTP connections and requests at the same time. The module that controls this behavior is called the Multi Processing Module or MPM. There are a wide variety of MPM’s for Apache that target different applications of the web server. Some are operating-system related such as the MPM that is used when you run Apache in a Windows environment (yes, people use Windows to do webhosting occasionally). There are some MPM’s which run Apache in a threaded/hybrid process mode where the master apache process spawns children that are multithreaded and can hande multiple connections simulatneously. The MPM that InterWorx ships with and is compatible with is called prefork. This MPM uses a model of a master supervisor Apache process that is started and run as root so it can bind to TCP ports 80 and 443. Then it spawns children processes that run as the Apache user and they each service maximum 1 connection at a time. The benefit of this is additional stability. A multi-threaded setup with a crash in a single thread can kill an entire process that is serving multiple connections. A single-threaded setup will localize any crashes to a single process so other users are not effected by the crash. In addition, certain modules are not compatible with the multi-threaded MPM.

1.2.2 The NameVirtualHost System

InterWorx supports having multiple domains on the same IP address through Apache’s NameVirtualHost system. When a request comes in, the HTTP request always includes the domain being requested from the server, as shown back in section 4↑. The “Host:” header will include the domain entered in the URL bar which allows Apache to select from the domains in it’s configuration the correct files to load from the disk and the correct configuration settings to use. Of course, the NameVirtualHost system relies on the browser sending the “Host:” the user entered, which typically requires that DNS be setup and working properly in order for the user to enter a given domain and have it connect to the webserver on the correct IP address. This means that the NameVirtualHost system is reliant on DNS properly being setup for the public internet to visit the domain with ease. You are always able to set your local computer’s HOSTS file to override default DNS lookups for certain domains.
 Part I: Fundamental Webserver Management with InterWorx Up Part I: Fundamental Webserver Management with InterWorx Chapter 2: Definition of Terms 

(C) 2019 by InterWorx LLC