\
Donate profile for Rudi at Stack Overflow, Q&A for professional and enthusiast programmers

29/02/2015

6575 views

HTTP Basics


'http://www' typewritten out on old paper

It was recently brought to my attention that I really don't know the first thing about HTTP, what it means, how it works, or what the implications are for me as a developer. Having had a read around it seems to me that a lot of developers are in a similar situation. So I set out to learn and to write this, a missive on the points I found most pertinent. I like to think that I can get into a fair amount of depth whilst keeping it simple.


First things first, what does HTTP mean? HTTP is the acronym for Hypertext Transfer Protocol. Now interpreting those three words could seem fairly complicated on the face of things but it really does say everything we need to know. Hypertext Transfer Protocol is an official procedure (protocol) for transferring (sending & receiving) hypertext (which is just text with links to other text). Not so hard right? So, what's the procedure then?


Well, essentially it boils down to this. The goal is to allow a client (a browser, or whatever) to request text from a web server and have that server send the correct text back to the client. Alright, no problem; create a request with the salient details in it.


GET /index.html HTTP/1.1
Host: www.rudikershaw.com

And now we have an HTTP Request. It's just a first line which says what we want to do (i.e. GET the /index.html document), and a single header labelled Host which is the domain we want to get our document from. There are just three terms to remember in this syntax. GET is our method/verb and it is on the first line. There are other methods/verbs, including POST, OPTIONS, DELETE, and a few others. And Host is our header. Host is the only essential header in HTTP 1.1, but there are many others. Once the host server has received this HTTP request it sends us a response.


HTTP/1.1 200 OK
Date: Mon, 29 Feb 2015 20:09:12 GMT
Content-Length: 77

<html>
<head>
    <title>Title</title>
</head>
<body>
    ....
</body>
<html>

The above is an incredibly simple HTTP Response. It has a first line which, aside from 'HTTP/1.1', has the status code 200 followed by OK. The status codes are essentially the web server's way of telling you how your request went. 200 shows us, surprise surprise, that everything went OK. Ever search for a page and get the error 404 (page not found)? Well, that's an HTTP Response status code. Below the first line we have two headers. Technically speaking there are no essential headers for an HTTP response but those two at least are generally expected. After two new lines we have the document we requested in the body of the HTTP Response, in this case our index.html.


Now that's all well and good but why do we, as developers, need to know this? The part that affects us most is the HTTP Request's verb. If you're familiar with submitting HTML form data to another page (for example a PHP page) then you will likely recognise two of our verbs. GET and POST. That's because the form submits this data to the next page using an HTTP Request with one of those methods. So, what's the difference?


<form action="page.php" method="GET">
<form action="page.php" method="POST">

Okay. Let's first start with our GET method and imagine that we have the following basic HTML form.


<form action="page.php" method="GET">
    <input type="text" name="data" />
    <input type="submit" value="Submit" />
<form/>

When we hit the submit button the value in our data text input is sent to the web server that we are requesting the page.php document from. It's sent in our HTTP Request using the GET method. But the GET method is for GETting stuff, so how does it send data? Well it already sends some data, the location of our requested document. So that's where the data goes. In the URL.


GET /page.php?data=yourdata HTTP/1.1
Host: www.rudikershaw.com

This means that the data we're sending is plainly visible in the URL, in the address bar of our browser. It's also stored in your history as well as your browser cache and probably the logs of the web server you sent the request to. And because it's in the URL the data can be bookmarked and revisited. Also, because Internet Explorer and a few web servers put restrictions on the size of URLs (no more than 2000 characters) there is an upper limit to how much data you can transfer. It can also be said that GET is often marginally faster because in certain situations two separate requests need to be sent when using POST. So, if you're making trivial transfers of data between one page and another, and you want your user to be able to navigate backwards and forwards between those pages without having to resubmit data, or bookmark the resulting page then GET is for you. A word of caution though, don't use GET to transfer any sensitive information because it is logged everywhere and presents a real security risk.


Now let's submit the same HTML form but change the method to an HTTP POST Request instead.


<form action="page.php" method="POST">
    <input type="text" name="data" />
    <input type="submit" value="Submit" />
<form/>

This time we submit our data to the web server using the POST method, which is specifically for sending data to your recipient. So what's the difference? Well, this time our data is not transferred in the URL. This time our data has a designated space in the body of the HTTP Request, two new lines below the last header.


POST /page.php HTTP/1.1
Host: www.rudikershaw.com

data=yourdata

So what's so great about this? Well your data doesn't get logged anywhere. Not in the browsers history, cache or bookmarks, and not on any web server. Plus, it doesn't show up in the URL. As such transferring sensitive data like this isn't a problem. It also doesn't suffer from the character restriction on the URL, so you can send as much data as you like each time. There are two caveats, the first of which is that under certain circumstances HTTP POST Requests are slightly slower. The second caveat is that many web servers will not receive POST requests that were referred from outside of their domain from a browser side programming language, at least not without a fight. This restriction is called the same-origin policy. It will work fine with submitting an HTML form between domains like in the example above, but sending the request with JavaScript will take some more work.


To finish up; in order for this all to sink in I would advise downloading Fiddler4 in order to take a look at all the HTTP Requests being made as you browse around the web. This will give you an indication of when they are made and the types and varieties of headers and methods that are used. Thanks for reading. If you have any criticism, corrections, objections, or (Gods forbid) praise just drop me a comment under the article.


Rudi Kershaw

Web & Software Developer, Science Geek, and Research Enthusiast