Last Updated on February 11, 2024
What is an HTTP proxy server?
An HTTP proxy server is a software component that sits in the middle of web connections established between a client and a server. In a typical scenario, a client would send HTTP requests to a server in order to access files, web pages, or any other resources available on that server. However, when the client uses a proxy, the client communicates with the proxy, and the proxy acts as an intermediary to request these resources on behalf of the client.
All computers connecting to the internet have their unique IP addresses. These addresses allow this global network system to know how to reach these computers. Just like any other computer, a proxy server has its own IP address as well. This IP address is used by all computers configured by the administrator to use the proxy server. Usually, these computers will resolve the IP address through the DNS protocol.
While acting as middleware, the HTTP proxy server provides a layer of security to the client by:
- Hiding some of the client information (i.e. IP address, location).
- Reducing the attack surface and blocking connections to suspicious websites (i.e. websites with malware or viruses).
Let’s take a look at all the benefits of using an HTTP proxy.
What are the benefits of using HTTP proxies?
HTTP proxy servers have their own public IP addresses, so they offer another layer of protection between companies’ intranet and external traffic.
HTTP proxies operate at the application layer of the OSI model (layer 7). Therefore, they can offer enriched features like content scanning and filtering.
Here is a summary of the benefits of using proxy servers:
Proxy servers act as intermediates between clients and the internet. They provide the benefit of isolating the clients from several attacks. Some proxies are able to scan the content of the HTTP responses before sending these responses back to the clients.
Furthermore, the administrator can configure the proxy server to block all connections to those suspicious websites.
HTTP proxies allow users to browse the internet anonymously by hiding their IP address. They also allow blocking some cookies or Ads displayed on some web pages.
In fact, many websites use the client IP address to track the country or the city of the user. Ads providers are able to filter the type of ads based on the user profile (targeted advertising). However, when the user accesses those websites through a proxy server, Ads providers will only see the proxy server IP address.
HTTP proxy servers optimize bandwidth utilization by caching web pages and files that are accessed multiple times by users.
When a user requests a web page, the HTTP proxy verifies its cache to see if it contains the most recent version of the page. If the most recent version is available then the proxy server will send back this version to the client without re-downloading the web page a second time. This mechanism allows companies to save bandwidth and improve their network performance.
Some HTTP proxies offer additional features like content compression between the clients and the proxy server. This can improve the overall network performance as well.
NB: For content caching optimization, it is recommended to use a specialized content engine device. Content engines or caching engines are dedicated network devices that perform the caching functions of a proxy server.
Organizations are able to audit and control the internet usage of their employees by using a proxy server. This possible only because the proxy captures and logs all outgoing web requests.
Usually, companies have internal policies preventing employees to visit certain websites when they are in office (e.g. inappropriate content, social media). By using a proxy server, companies are able to enforce these policies by denying access to these websites. The proxy server can even redirect users to a nice error page for blocked websites.
In addition, by analyzing the proxy logs, network administrators can have statistics about the time users spend on the internet.
NB: Depending on the proxy server that you are using, some of these features may or may not be supported. It really depends on the implementation of the HTTP Proxy.
Different types of proxies
Let’s take a look at the 2 types of proxies: forward proxies and reverse proxies.
A forward proxy is just a traditional proxy used as an intermediary for a group of machines (or clients) accessing data on the internet. This is the type of proxy that companies use to monitor internet usage in the corporate space. Most proxies are able to hide clients’ identities (as explained in the previous section).
A forward proxy can be either a transparent proxy or a non-transparent proxy.
Transparent proxy meaning
With transparent proxies (also known as intercepting proxies), no configuration is needed on the client-side. These proxies intercept and forward data in transit without altering the data in question.
Usually, transparent proxies do not hide the clients’ IP addresses. The gateway or the router intercepts the data silently in the background. With the evolution of technology, these network devices have enough power to run an HTTP proxy software as well.
Therefore, even though routers and gateway operate at layer-3 of the OSI model (NAT –Network address translation), they are able to forward the data to their own layer-7 proxy application for inspection purposes. Some people refer to this mechanism as a “transparent firewall“. In this scenario, users do not realize that their internet traffic is going through a transparent proxy.
Non-transparent proxy meaning
On the other side, in order to use a non-transparent proxy, clients need to know how to contact the proxy server in question:
- Clients can keep the proxy settings in their configuration(i.e. browser proxy configuration). In this case, the administrator has to manually configure the client.
- Clients can automatically discover the proxy settings through the Web Proxy Auto-Discovery protocol (WPAD). On Windows, the administrator can set a non-transparent proxy configuration with a login script or a domain security policy.
Non-transparent proxies are often classified based on the level of privacy that they offer :
- Anonymous proxy: this proxy identifies itself as a proxy but hides the IP address of the client.
- High anonymity proxy: this proxy does not identity itself as a proxy and hides the IP address of the client
- Distorting proxy: this proxy identities itself as a proxy and changes the IP address of the client. It basically provides a fake IP address for the client.
Non transparent proxies in .NET Framework
In the .NET framework, Microsoft refers to 2 varieties of proxies: adaptive vs static. As explained previously, there are 2 ways of configuring a proxy on the client-side. The proxy information can be configured manually or be automatically discovered by the client. The explicit configuration is more suited for scenarios where the network topology does not change frequently: this is what Microsoft refers to as static proxies in the .NET framework. On the other side, the term adaptive proxies refer to proxies that are discovered automatically through the Web Proxy Auto-Discovery protocol feature (WPAD).
In a nutshell, here is an overview of the different types of forward proxies:
A reverse proxy also acts as an intermediary between the client and the server, but the reverse proxy is installed on the server-side. It hides the IP address of the server and receives requests on behalf of the server.
A reverse proxy can be used to distribute the load between multiple servers of the same web application: this is what is called load balancing. Moreover, a reverse proxy can also leverage caching and compression mechanisms to enhance the performance of a web application.
What is an HTTP Proxy then? – Is it a forward or a reverse proxy?
Usually, in the majority of use cases, an HTTP proxy is simply a forward proxy. In other words, when you see the term “HTTP Proxy” on the internet, it usually refers to an HTTP forward proxy.
What about the term “webproxy”?
A web proxy is a proxy server that offers a web page through which clients are able to browse the internet. Basically, the user needs to manually browse to the proxy page first and then this page contains a text box where the user is able to enter the address of the website he wants to visit.
In some documentation, the term web proxy has been used interchangeably with the term HTTP proxy, which can be confusing since HTTP Proxies do not have a landing page for users to enter the URL of the website where they want to navigate to.
Nowadays, most web proxies also offer HTTP proxy features out of the box, then allowing users to configure their system to connect to the proxy server without going through the proxy web page.
Can HTTP proxies handle HTTPS requests?
The term “HTTP proxy” makes most people think that HTTP proxies are not able to handle HTTPS requests, which is completely a false assumption. Most HTTP proxies are able to forward HTTPS requests as well. However, they do it by opening a TCP tunnel to the specified server. Basically, they use the HTTP CONNECT method documented in RFC 2817.
When processing HTTPS requests, the HTTP proxy cannot leverage some features like caching and content inspection because he is a blind man-in-the-middle. Basically, the proxy is not involved in the HTTPS encryption/decryption process. However, the HTTP proxy can still block connections to malicious and inappropriate HTTPS websites.
Here is an overview of this HTTP CONNECT process:
Let’s go through the steps in this sequence diagram.
Here we suppose that the web browser is already set to use a proxy server and the user enters the URL https://siakaserver.com. Because the URL entered in the browser starts with HTTPS, the browser will send an HTTP CONNECT request to the proxy server. The goal of this request is to ask the proxy server to open a TCP connection with the server known as siakaserver.com.
Update: Between Step 1 and Step 2 there is a DNS resolution process that was omitted from the diagram on purpose to keep things simple. Fundamentally, the proxy server needs to find the IP address of the requested server (siakaserver.com in this example) before opening a TCP connection with the server. This process of mapping a domain name to an IP address is called the DNS resolution. To learn more about the DNS protocol, you can read this article on the topic.
Step 2, 3, and 4
The proxy server does the famous TCP 3-way handshake (SYN, SYN-ACK, and ACK) to open a TCP connection with the server siakaserver.com.
The proxy server replies to the HTTP CONNECT request with an HTTP 200 (OK) response. This will let the browser know that the external tunnel is opened.
Step 6, 7, 8, and 9
Once the TCP tunnel is opened, the proxy acts as a dummy bytes forwarder. The browser initiates the TLS handshake that includes a key exchange mechanism in order to open an encrypted tunnel between the browser and the webserver. There is more back-and-forth communication between the client and the webserver during the TLS handshake. This diagram shows only a simplified view of the process. We will explain the TLS handshake in-depth in another post. Once the TLS handshake is over, the secured tunnel is established.
Step 10 and 11
The browser sends an HTTP GET request to the server to download the content of the website. This request is sent through the TLS tunnel, so the data is encrypted by the client, and the proxy server just forwards the bytes to the webserver without knowing the content.
Step 11 and 12
The web server sends the content of the website to the client through the TLS tunnel as well.
All subsequent data exchange between the client and the server will be done through the TLS tunnel until one of the parties involved decides to close the tunnel.
What about TLS interception?
Some HTTP proxies are able to decrypt the TLS connection. They are able to view the content of the data in transit. This procedure is called TLS interception and requires that the client accepts the proxy certificate. This topic is more complex and requires an article on its own.
I will explain the concept of TLS interception in another article. I hope you enjoyed this one!