SourceForge.net Logo

Overview
  Introduction
  Project's Goal
  Related Projects
Design/Impl
  Basic Ideas
  Tasks
Documentation
  Quick Start
  Screen Shots
  FAQs
 
 
 
Http Multi-thread Downloading

Some web servers support multi-threaded downloading by supporting the "range" header. Range header is a standard http header which tells the web server to transfer only the part specified in the header body. Since "range" header is just a http header which recommend the web servers to transfer the content, web servers sometimes do not accept "range" header and just ignore the header content.

So in our implementation, we will deal with this situation as the following:

1. Check wether the web server support "range" header by sending a test request contains "range: 1-2" header

2. Check the response code

  A. If the response code is 203 (partial content), the web server supports range header and we could use multi-threading download.

  B. If the response code is 200 (ok), the web server does not support range header. So we have to download the content by single-thread.

Difficulties in implementing Http download

Http protocol is a very complex protocol. And in the implementation of quickget, we should face the following difficulities:

1. Http Forward

  Sometimes, the objects we want to get will be moved to other places. And a http response object will return and tell the object's current place. So our implementation will care this issue. And we should automatically download the object from the current place.

2. Content-length header

  Content-length header is a response header which specify the length of the specified object's file length. And in our implementation, it is the base for multi-threaded downloading. We use the content-length as the base to generate the internal map to trace the downloading process. Unfortunately, some web servers / applications do not support the content-length header, so we can not get the file length of the object and we cannot use multi-thread downloading to accelerate the downloading process.

3. Range http request header

  Range http header is another important header which will make the multi-threaded downloading happen. As discussed in the above section, we use the above mentioned algorithm to select download method.

4. Head method and Get method

  Head and get methods are two basic methods which support by the web server. Head method is used to just query the http objects for the information and no real data transfer occurs. So it is the best choice to use head method to query the content length attribute and test the support of "range" header. Unfortunately, some web servers do not support head method, so we have to use head method when possible and use get method when head method is not supported.

All in all, the http support is some complex. And thanks to the http client, it can shield us from the basic implementation of the dummy http protocol.

Pseudocode for the current implementation

Since http protocol is very complex, we provide 3 methods for downloading the specified objects. The 3 methods are:

a. Multi-threaded download: It must fulfill two preconditions. 1. support the "content-length" in the response. 2. support "range" header in the request headers.

b. Single-threaded download: When the servers do not support "range" header, we cannot get the partial contents for each thread. We can only use the single-threaded download.

C. Un-determinated download: When we cannot get the exact file length from the "content-length" header in the response, we can only download the object in this mode. And the EOF will be the flag for the end of the download process.

In the current implementation, we use the following process to do the method selection:

1. Send a head/get method to the specified object's url

2. Check whether the response contains the "content-length" header.

  2.a) If it contains "content-length" header, send another request which contains "range: 1-2" to test whether it supports "range" header

    2.a.1) If the response code is 203 (partial content), select multi-threaded download

    2.a.2) If the response code is 200 (ok), select single-threaded download

  2.b) If it does not contains "content-length" header, we can just use undeterminated-download method.

In our implementation, we use "FixLengthHttp***" to identify classes used in "multi-theaded" and "single-threaded" download. And we use "VarLengthHttp***" to identify classes used in "undeterminated download".