What is CGI?

by Matthew Flood
January 14, 2007

This page defines CGI, and shows a complete example of a web-server invoking a CGI application.

Please send any comments or questions that would improve this page to matt @ rudeserver.com

The acronym CGI stands for Common Gateway Interface. Unfortunately, that revelation is pretty useless. Originally, CGI was thought of as a way to invoke a command-line program through a web-server instead of invoking it directly- thus the term Gateway. The web-server acts as the go-between for a web-client and a server side application. Years later, the use of CGI applications has become transparent in that web-browsers do not need to know that they are invoking a command-line program. The results web-clients get from a CGI application are the same as they would be for any static page. So, the term Gateway is not helpful unless you use the old mindset. Nowadays, CGI is a fuzzy term that is basically synonymous with the term Dynamic.

You can get closer to the core meaning of CGI if you interpret it like this:

Common = An industry standard
Gateway = Web server
Interface = A way to send data to an executable program and use the output of that program as the content to be delivered

CGI is the industry's preferred way for a WebServer to make WebClient/Browser data available to an executable program
and use the output of that program as the content to be delivered back to the WebClient/Browser.

Or, in layman's terms:

CGI is when a web-server runs a local program to generate content,
instead of just reading a file and spitting it out byte for byte.

When a web server receives a request, it can send back 1 of three different things:

1. The contents of a file on the system, byte for byte

2. Content that it generates on the fly, such as a File Not Found response

3. The output of an executable program that the webserver invokes (This is always CGI)

How is a CGI application different than any other command line application?

First, a CGI application is supplied with information not normally available to normal CLI (Command Line Interface) applications. This information includes the IP address and port of the connecting client, cookies and other information supplied by the client to the web server, information about the web server itself, and information necessary for the CGI application to access any form data that was sent by the web-client. This information is supplied to the cgi application from the web server via Environmental Variables.

Second, a CGI application is responsible for specifying the kind of content it is outputing. Kinds of content include HTML, plain text, JPEG image, PDF, etc.. The CGI application specifies the content-type by outputing a Content-Type response header before it sends any actual content. A valid Content-Type header would look like the this: Content-Type: text/html

Everything else is the same. There are no special compiler flags that need to be set in order to turn an application into a CGI application. As such, CGI is simply an agreement or contract that says, "Hey developer, here is how you get the form data and information about the web client, and here is what I need you to include in your output."

Basically, as long as an application spits out a Content-Type header followed by a blank line,
it can be considered a CGI application.

CGI in action

The following table represents the invocation of a CGI application. The main players involved are a web browser and a web server.
Also involved is the executable program hello_world.exe (the source code of which is displayed later).

WEB BROWSER/CLIENT

WEB SERVER

hello_world.exe

The webserver at rudeserver.com is running, and is waiting for a connection on port 80

The user types in the following url:

http://rudeserver.com/cgi-bin/hello_world.exe?color=red

The browser parses the url and decides that it needs to connect to rudeserver.com on port 80

The webserver accepts the connection from the web browser, and waits for the web browser to issue an HTTP request

Having connected succesfully, the web browser generates and sends the following HTTP request to the server:

GET /cgi-bin/hello_world.exe?color=red HTTP/1.0
Host: rudeserver.com

The webserver receives the request and parses it based on the HTTP specification. It examines the path that was supplied:

/cgi-bin/hello_world.exe?color=red

The webserver needs convert the path into an actual meaningful local path. It ignores the question mark and everything after it. It knows that
/cgi-bin is represented locally by the path:

/var/www/cgi-bin

So, the resulting path to the resource is

/var/www/cgi-bin/hello_world.exe

The webserver also knows that anything located in /var/www/cgi-bin/ is supposed to be a CGI program that needs to be executed.

Before invoking hello_world.exe, the webserver sets the following environmental variables:

REQUEST_METHOD = GET
QUERY_STRING = color=red

The webserver then invokes hello_world.exe

hello_world.exe is invoked.
It can access the environmental variables REQUEST_METHOD and QUERY_STRING by using the getenv() function provided by the C standard library.
It knows the names of the environmental variables because of the CGI specification. In this case, the program ignores the environmental variables because it does not need them. The program simply sends the following string to standard output and exits succesfully:

Content-Type: text/plain

hello world!

The webserver captures the output of the program. It makes sure that the program exits succesfully, and ensures that the output it received includes a Content-Type HTTP response header. It then completes the HTTP response header, and sends it and the rest of the program's output to the web browser:

HTTP/1.1 200 OK
Date: Mon, 15 Jan 2007 06:02:34 GMT
Server: Apache/2.2.2 (Fedora)
Connection: close
Content-Type: text/plain

hello world!

The web browser receives the HTTP response and closes its connection. Based on the Content-Type header, the browser recognizes that the content of the response is a plain-text file. It displays the plain text content on the screen:

hello world!

The webserver logs the request, and continues processing other requests

In C and C++, the source code for hello_world.exe would look something like this:

#include <stdio.h>
int main(void)
{
printf("Content-Type: text/plain\n");
printf("\n");
printf("hello world!");
return 0;
}