Project 2: HTTP download using "Range"
Type of Project: Only Individual
Deadline: 2012-02-16, 11:59pm
Language: Java
Points: Max 20 points
Submission Guidelines: Submit through nike.cs.uga.edu, as usual. Name the directory project as "LastName_FirstName-http_downloader"
(e.g., "Perdisci_Roberto-http_downloader"). Submit ONLY the source code.
Name the source code file containing the main as "HttpDownloader.java"
(use inner classes, if you need more than one class to develop your
program). Copy "HttpDownloader.java" under the directory "LastName_FirstName-http_downloader" and
$ submit LastName_FirstName-http_downloader cs4760
NOTE: project submissions that do not follow the guidelines risk to be discarded wihtout consideration (i.e., 0 points).
Project Description:
In this project, you are required to write a program that takes in
input (on the command line) the URL of an object to be downloaded, and
the number of connections through which different parts of the object
will be retrieved using "Range:". The downloaded parts need to be
re-stitched to compose the original file.
For example (notice that the line below is an actual example of how your program must be launched)
$ java HttpDownloader http://www.cs.uga.edu/images/template/cs_template_r2_c2.gif 5
will spawn 5 threads,
each thread will open one TCP connection to www.cs.uga.edu on port 80,
and retrieve a part of the .gif file. Each of the 5 parts must be of an
approximately equal length. Finally, the program will put the parts
together and write the output into a file called
"cs_template_r2_c2.gif". You should name the files containing the parts
of downloaded content as "part_i", where i is an index. In the example
above, the program will output the parts into 5 different files called
"part_1", "part_2", ..., "part_5", along with the reconstructed
"cs_template_r2_c2.gif" file. DO NOT delete the "part_i" fiels after
you are done recomposing the original file.
Save all downloaded files into the same directory from which the program is launched (do not create any subdirs).
To make sure your software downloads and correctly reassambles objects from the web, you can use md5sum to compare your result with the original file downloaded using a browser (or wget or curl), for example.
NOTE:
You don't need to worry about handling Server "errors" (e.g.,
redirections, unavailable Range option, etc.). I will only test your
software on objects retrieved from websites that support the Range
option, and for which no special error handling is required. Of course,
make sure to use HTTP/1.1 and that your HTTP requests are correctly
formatted, otherwise they will fail even the simpler tests.
HINT:
To divide the object to be retrieved into approximately equal parts,
you first need to retrieve the length of the object without retrieveing
the object itself. One way to do this is using a HEAD request and
parsing the Content-Length field in the response. An alternative
(optional) way to do this is by using a GET request with a particular
value in the Range field (I will leave it to you to figure this out, if
you decide to use this second option).
TESTING YOUR CODE
You can use the following URLs to test your code:
http://www.cs.uga.edu/~perdisci/CSCI4760-S12/Project2-TestFiles/topnav-sport2_r1_c1.gif
http://www.cs.uga.edu/~perdisci/CSCI4760-S12/Project2-TestFiles/Uga-VII.jpg
http://www.cs.uga.edu/~perdisci/CSCI4760-S12/Project2-TestFiles/story_hairydawg_UgaVII.jpg
Make sure that when you recompose the output from the different parts, the md5sums match the following ones:
04e1f00315854f382d00311c599a05f8 story_hairydawg_UgaVII.jpg
0592796fa4f06f806e5d979d7e016774 topnav-sport2_r1_c1.gif
9dc5407cc368aaaa33c6cc2c284ab5c4 Uga-VII.jpg
I suggest you to also test your code on other URLs chosen by yourself,
and try to determine if there are any websites that support the Range
option but cause problems to your code.
MAILING LIST: If you have questions about the projects, the best place to ask is the course mailing list.
PROJECT EVALUATION
I will run your progarm on 5 different input URLs and number of
requested parts. For each of the 5 runs, you will be assigned 4 points
if the md5sum of the final recomposed file matches the md5sum of the
original object. Therefore, you will get max points if all the md5s
match correctly.
Notice that I will also verify that the "part_i" files exist and their
size respects the criteria outlined above, otherwise you may be
penalized.