In this assignment, you are going to
develop a simple HTTP client. This client repeatedly reads URLs from a
file, retreives the web pages corresponding to the URLs. Each retreived
web page is stored locally in a seperate file. The file name of a web
page is the MD5 hash of the URL. The file should contain just the html
content of the web page (i.e., it should not contain the HTTP headers).
The input file for your program will be in the following format: First
line indicates the number of URLs in the file. Each subsequent line
contains one URL. You can make the following assumptions:
1. The file contains exact number of URLs as stated in the first line.
2. All the URLs are for static web pages.
3. Cookies are not employed.
Your client should parse the HTML content and retrieve any embedded
images (no need to support other kinds of embeddded content). Each
image needs to be stored in seperate file as well.
You should also instrument the code to measure the time required for
retreiving each web page. Be careful to measure only the time required
to retrieve the page (i.e., the measured time should not include
anything else). Your program should create a log file that contains the
following information (separated by white spaces) for each URL in the
input file.
a. URL
b. Success/failure in retreiving the web page.
c. The code returned by the server indicating success/failure
d. In case of successful retrieval, the local file name where the file is stored.
e. In case of successful retreival, the time tkaen for retreival.
The assignment is due on 02/21/2011. The assignment is to be done individually.
This
web page from UMBC lists MD5 implemenations in various languages. You will find the C implementation
here.
Submission Instructions
To submit your assignment, you should use the submit command on odin.
The syntax is: submit <directory name> csx780
Here, <directory name> is the name of the directory containing your code. Your first project should be in directory named your
PA1_<your last name followed by your first initial>. When you run this command, you MUST be in the parent directory of PA1 directory.