Learn how to download the contents of a URL
using Java's URL class
Learn Java at Developer's Daily Pure Java Department
Introduction
In some Java applications you'll want to download the contents of a URL across a network. As an example, we've written two applications that do this regularly.
The first application is a customized Java web robot. This robot downloads the contents of certain URL's every day, and creates an HTML page of all the anchor tags it finds on those URL's. We use this Java program to get all of the headlines we want delivered to our doorsteps any time we want - with no ads and no waiting - because the robot has already done the work for us.
The second program is a Java application we call ServerStress. This program uses Java's URL class to download the contents of a list of URL's we've created. This program downloads the entire list from a web server as fast as it can.
The purpose of this Java application, as you might guess from it's name, is to stress-test the web server. It's a great way of throwing a mind-numbing number of client requests against a web server in a short time, and measuring the response of the server.
In this article we'll take a look at the procedure necessary to download
the contents of a URL in a Java application. In a future article
we'll discuss our Java ServerStress program, so you can see how this
method is used in a real-world application.
Let's go straight to the code
There are approximately five steps required to download and print the contents of a URL. I say "approximately", because (a) it all depends on what you consider a step, and (b) it depends on how you handle the possible exceptions that can occur.
The code in Listing 1 shows the entire DnldURL.java program. When I look back at this program, I realize that the process of dealing with the possible exceptions that can occur is more time-consuming than creating the networking aspect of the code. One thing I've learned with Java - the networking aspect of the code is pretty easy.
|
//------------------------------------------------------------//
import java.io.*;
public class DnldURL { public static void main (String[] args) { //-----------------------------------------------------//
URL u;
try { //------------------------------------------------------------//
u = new URL("http://200.210.220.1:8080/index.html"); //----------------------------------------------//
is = u.openStream(); // throws an IOException //-------------------------------------------------------------//
dis = new DataInputStream(new BufferedInputStream(is)); //------------------------------------------------------------//
while ((s = dis.readLine())
!= null) {
} catch (MalformedURLException mue) { System.out.println("Ouch
- a MalformedURLException happened.");
} catch (IOException ioe) { System.out.println("Oops-
an IOException happened.");
} finally { //---------------------------------//
try {
} // end of 'finally' clause } // end of main } // end of class definition
|
| Listing 1: | The DnldURL.java program shows how easy it is to open an input stream to a URL, and then read the contents of the URL. |
Discussion
Because this Java source code is well documented, I won't add much to the description. First, I'll point out the obvious - you'll want to put your own URL in place of the "http://200.210.220.1:8080/index.html" in this code. This is just a TCP/IP address we use on our internal LAN during testing.
Of course, instead of hard-wiring the URL into the Java code, it would be even better to read the URL as a command-line argument, creating the URL object "u" with a statement like this:
Another item to point out is that we typically generated a MalformedURLException
whenever we botched the actual URL, doing things like mis-typing "http"
as "htp", for instance. On the other hand, we generated
an IOException whenever we properly typed the URL syntax,
but mis-typed a filename.
Compiling and running the program
To compile the program (after you've downloaded it), just type this command:
If the program runs properly, the HTML code from the URL you've targeted will be printed to your screen. If you'd like to save the output of the program (i.e., the contents of the URL) to a file, simply redirect the output of the command like this (for DOS and UNIX systems):
Download our Java source code
We hope you enjoyed this article. Creating network code with Java is one of our favorite topics. If you'd like to download the Java source code shown in Listing 1, just follow these steps:
Click here to download the DnldURL.java
source code to your computer. After the source code appears in your
browser,
simply save the code to your local filesystem by selecting the File
| Save As .. option of your browser.
Copyright 1998-2008 DevDaily Interactive, Inc.
All Rights Reserved.