Home


Full review of Pdf links checkers


The problems

Checking links in a website is very important. Serif X2 and X4 have a built in links checker which is very helpful. There are many free websites that will check a site. But they all suffer from one major problem: they will mainly check html pages. Very few of them will check other file types such as Word Doc files and, crucially, pdf files. Yet pdf files are the backbone of any academic site and many business sites.


I thought this could be a job for a webspider program, or a webwhacker, such as Internet Researcher (www.zylox.com - which may have gone out of business) or Teleport Pro (www.tenmax.com/teleport/pro/home.htm). These programs give you the choice of selectively downloading part or all of a site, or ‘spidering’ a site to explore the structure and links. Once again when I checked, even Teleport Pro which is probably one of the more sophisticated and well maintained products does not check the pdf files. Now, I may be wrong - there are over 50 such products out there, and possibly one of them does the job, and may even check offline files and sites, but I needed a specific tool for the job.


In addition, there is a need for a program which will check the links in a pdf file which is stored on a computer before it is put online, or check the links of a downloaded file. Ideally the link checking should be both the internal links within a pdf file and/or the external links.


In short we need a program which will check the internal and external links of pdf files, and do so for online or offline files. After over ten hours of hunting on the internet I finally found one program which will do this: Web Links Verifier www.relsoftware.com/wlv/ I will discuss it in more detail later, but first, the competition.


Link Tiger (www.linktiger.com)

The free version is no more. One small site like this used to be free. Now they want 32.45 Euros for a maximum of 250 links which are checked once a week for a year. As an idea, a small site like this has around 200 links. It is time that the site hosters provided this service for free! But Linktiger is an easy service to use, and it will send regular email reports as to what it has found.


WebCeo free version

There is a free version of WebCeo. This has recently disappeared from their website (www.websiteceo.com) but it can still be found on various sites such as cnet. The version you want is 8.1 and I found (February 2011) the following direct link HERE


But the free version is no longer being maintained and the professional version is a whopping 500 dollars, so you can see what sort of market they are now aiming at. To understand the program you have to view it as a package of several modules, some of which have subscription extras. In the free version the site checking part, which does check the external links of pdf files, is likely to work for immediate future, but already I have found that other parts such as the site links are deteriorating. The program no longer checks the page links from google. This could well be because the program relies on ‘updates’ and the company are using these to inject code which disables the program.


Since the free version may still work for a while, and some of the modules such as key word optimisation may work for a while yet, it may be worth looking at.


Web Link Validator www.relsoftware.com/wlv/

This program really does the job. It will extremely rapidly go through a website or a file or a group of files and produce a very detailed report - too detailed to be shown here.


The trial edition is limited to 500 links, and you have a whole month to experiment. There are no other restrictions. This is generous and ample.  Unfortunately the help section is very limited which is a great shame because the program is excellent. The company makes up for it though by fast courteous and helpful technical help. The cheapest version checks a maximum of 3000 links per attempt which ought to be more than enough for most amateur sites. Presumably if you hit a big site you can analyse it page by page, or download the page, divide it, and analyse the parts separately.


1. The settings

One of the first things I do after installation of any program is to check that it does not load in startup (it does not) and to use Startup Inspector if necessary to block this.

The first thing to appreciate about Web Links Verifier is that it has two modes: Site Mode and Links Mode. You see this by looking at the top bar on the left. They are labelled “Website Verification” and “Link List”. What is confusing is that to change between these modes, on the right you see a button with the opposite name!






Next check the settings, and you need to do this because of the defaults. View > Options (F11)  The interface languages possible are English German Spanish, Italian, French, Swedish, and Dutch. It is possible to spellcheck in many languages. More dictionaries can be downloaded from http://www.addictivesoftware.com/addict3/dicts-extern.htm

Unzip the dictionary to the Web Link Validator's Dictionaries folder (e.g. C:/Program Files/Web Link Validator/Dictionaries/) then restart the program.




The bookmarks option requires you to specify the paths for Firefox and Opera. It seems to find the bookmarks for Internet Explorer without help. This program will check these bookmarks, which is a useful additional extra I was not expecting.



2. The next step is to begin a new profile, File > New Profile.




The first one to study and modify is the Script Analysis


Make sure that ALL the filetypes you are interested in are listed. Now, since you can save profiles under different names this provides great flexibility if you want to focus in on types of error or types of files.


Under verify links, make sure that external files is checked. The directory index allows you to stop indexing and site map files from being checked since this merely duplicates information in the site. The limits option is very important. These are the defaults:



As you can see, non-html files, by default, are limited to a size of only 500k, and this should be increased to something more reasonable. The maximum number of links in practice is set by the program licence - 3000 for the cheapest version. The page optimisation has the settings for some value judgements. Here are the defaults:



Orphan analysis compares files on the server with files on a computer. Obviously this requires you to specify servers, passwords etc, and to specify the location of your website on your computer. This all supposes, as I do, that you keep your own website as a directory tree on your computer. The orphans can be discovered on the server or on your hard drive. Asking the program to do orphan analysis is an action you need to ask for when needed.


The report settings allow you to name a report and select the details you want to include. They are best played with AFTER you a have analysed a file or site.


At any time the settings in the existing profile can be accessed by clicking on the Profile Icon.


2. Analysing a website

So, I set to work to analyse this site just before I put this review and other changes on the site. In Website verification mode, File > New Profile > Add a URL .... > Start. In less than one minute it had checked this site and produced the following summary report.



Upon Closing this summary there is a summary reports tree on the left which can be expanded. For each page listed on the right, when clicked upon, more detail is provided underneath under various tabs.



Click on Report and you will be provided with the full selection of features to include or exclude, and an export format choice of htmi or xls. You can also specify the file name and location.




  


The report is far too detailed to publish here. It ran to over 20 pages for this site! It found all the redirects, and for html pages detailed all the images in them with their sizes. You really need to try it out to see how detailed this reporting is. Of course, if you do not want the detail you only need to deselect from the selection boxes.



3. Analysing a file

To test this I had downloaded a rather old pdf file, Mike Palmquist “The potential of plagiarism detection software”  http://tilt.colostate.edu/twt/palmquist_plagiarismsoftware.pdf


                    Select:   File > New Profile > Add

Usually you type in a URL under Enter starting URL, but, notice the “browse” button on the right.


You can choose a file to analyse:

Notice how the right syntax of file:///  is automatically added. This gave the summary:


The details can be studied and a report made. The file was over 5Mb but the verification took only 5 seconds.


4. Analysing a list of files

The program can process a list of files, saved either in html or txt format. However, you cannot use the above method to add a list of files. The way to do it is:

File > New Profile, but this time click on a button which is not labelled to the left of the Add button.



From this choose “List of URLs”  and find the file you want and analyse it. This works much better than the recommended list file mode.


The problem is, how to create a list of files. This is a two step process.

1. Open a command line in the relevant directory.

Type dir /on /b >list.txt where list.txt is any filename of choice. This will create a list of files.


2. Open them in your favourite text editor. Wordpad will do. Delete files you do not want to include. Then type in the extra file location and path information in front of the first file. In my case it was

                           file:///k:/web/work/plagiarism/                                           

Notice the direction of the slashes. Once you have done this for one file it is a simple matter of copy and paste to add this to the other files.


There is an easier solution, if you have the program. WebCacheIlluminator (www.nstarsolutions.com/wci/index.html) will make an html index file of a folder which can be used.


Conclusions

Web Link Validator is extremely useful and informative. It is a pity that the help files and information on the website is incredibly brief. I am also surprised how difficult it was to find this program and I hope the company can make progress in publicising this program. I also think the price is steep for the amateur and I hope they will produce a cheaper version, for instance 1000 links for 50 dollars. But the core problem is that there simply is no competitor. Web Link Validator is the ONLY program I can find which will check a wide range of files including pdf, either onsite or saved on your computer.