|
LinkScan for Windows. Reference Manual |
![]()
![]()
Note: This Reference Manual is divided into multiple documents for ease and speed of navigation. However, the contents are also available as a single document suitable for searching and/or printing as the Single Document LinkScan Reference Manual.
![]()
LinkScan is an industrial-strength link checking and website management tool. It saves time and money by automating the quality assurance testing of virtually any website or web-based application.
LinkScan is built around applicable open systems standards. Hence it integrates easily with many other content development, management and testing applications as well as general purpose computer tools. It operates on all Microsoft Windows and Unix/Linux platforms and is professionally supported.
LinkScan users include Fortune 1000 companies such as Hewlett Packard, government agencies like NASA, as well as many smaller businesses.
New users will find that LinkScan is extremely simple to install, configure and use. And the more experienced user will appreciate the vast array of customization features built into the system. Together, these attributes make LinkScan ideal for:
Small and medium sized websites
LinkScan can be configured to scan simple websites in a few seconds. Yet it rapidly analyses the site and accurately identifies 100 different types of problem. Affordable licenses are available from as little as $750.
Large and very large websites
LinkScan offers unparalleled performance and scalability. It can handle massive sites with 2,000,000 and more web pages. One of the many performance features includes the ability to navigate a website via direct file system access to static documents thereby avoiding the latency and other overheads associated with network access. The LinkScan database incorporates features that enable different content managers and workgroups to selectively view the results of their data. Even to send reports and alarms via e-mail.
Complex sites with dynamic content
LinkScan incorporates many features specifically designed for sites containing complex dynamic content. That includes sites and applications built with tools such as Active Server Pages (ASP), Cold Fusion pages (CFM), Java Server Pages (JSP) and other high-end publishing systems such as those from Broadvision and Vignette.
LinkScan is available in five different editions all based upon the same core technology:
LinkScan Workstation is a single-user implementation designed for individual content developers in large enterprises, and for organizations having smaller websites with up to 500 unique documents. It will check an unlimited number of external links.
LinkScan Server is a multi-user implementation and includes LinkScan/Dispatch. LinkScan Server will analyze a single website of up to 5,000 unique documents and an unlimited number of external links. Reports may be viewed with web browsers and/or distributed via e-mail.
LinkScan ServerPro is a multi-user implementation and includes LinkScan/Dispatch. LinkScan ServerPro will analyze a single website of up to 15,000 unique documents and an unlimited number of external links. Reports may be viewed with web browsers and/or distributed via e-mail.
LinkScan Enterprise is the full multi-team product and it will scan up to 50,000 unique documents and an unlimited number of external links on up to ten physical computers that are owned or leased by you at one Location. If you wish to scan more than 10 computers, you will have to purchase one or more additional LinkScan Enterprise Licenses. You may buy licenses to scan as many unique documents as you wish and to scan multiple locations as described below.
Document Blocks (DocBlocks) - If you wish to scan more than 50,000 unique documents with a copy of LinkScan Enterprise, you must purchase addtional Document Blocks (DocBlocks) each of which allows you to scan and addtional 50,000 unique documents.
Location Blocks (LocBlocks) - If you wish to scan computers at more than one location, you must purchase new LinkScan Enterprise licenses for those locations or if you want to scan more locations using one copy of LinkScan Enterprise, you may purchase additional Location Blocks (LocBlocks).
LinkScan Unlimited - will scan an unlimited number of unique web pages (documents) on any number of physical computers that are owned or leased by you.
The above descriptions are not complete nor comprehensive. You must read the LinkScan License Agreement for a complete definition of the products and your other rights and obligations.
The steps involved in using LinkScan include:
Each of these steps is described in this Reference Manual. However, we recommend that new users get a fast start by jumping to one of the following pages:
![]()
This section introduces some important concepts and terms that are used throughout the remainder of this Reference Manual. These are:
![]()
LinkScan is able to scan multiple websites. You may also scan the same website multiple times with different configuration options. In each case, LinkScan creates a unique and corresponding LinkScan Database containing the results of the analysis. Together, the configuration files and database constitute a LinkScan Project.
Users/administrators are required to select a Project when scanning, if multiple projects are defined. And, users must select a Project when viewing the results.
Each LinkScan Project is stored within a subdirectory of the main LinkScan installation directory.
For addition information concerning Projects, how to create them and how to scan them, see Basic Scanning.
![]()
Within each Project, you may also configure multiple LinkScan Owners. Collections of HTML documents and other files are assigned between Owners in a variety of ways:
The LinkScan Owner concept enables individual content developers or workgroups to view results that pertain to their documents or areas of responsibility. LinkScan Owners are defined via the LinkScan Configuration Files, discussed below. By default, LinkScan will create and assign Owners as follows:
This enables users to browse the results selectively so that the reports are smaller and more relevant to their needs. They're also produced more rapidly.
![]()
LinkScan incorporates access controls that may be used to limit user access to LinkScan databases and results. These controls are not enabled by default.
When activated, users may be required to login to the LinkScan system used a pre-defined LinkScan Username and associated password. The Username will define the Projects and Owners that an individual user is permitted to access.
Those wishing to enable these access control features should see LinkScan Access Controls.
![]()
LinkScan supports three different scanning methods:
Network (HTTP) Scanning, which uses HTTP requests to check links on your site
File System Scanning, which bypasses the network when scanning internal links and reads the documents via direct access to your computers file system
Import Scanning which is used to import lists of documents or links for validation
Network HTTP scanning is generally the best mode to use for sites with a large amount of dynamic content: .jsp, .asp files, etc. The File System Scanning method mode enables tracking of "orphaned" files, files which aren't linked to currently, and is more appropriate for sites with limited dynamic content.
![]()
The LinkScan software, and this document, both maintain a strong distinction between Documents and Links.
A Link refers to a pointer to any arbitrary file or URL.
A Document refers to a file or URL that contains a number of Links.
Hence an HTML file is a Document containing Links. Dynamically generated web pages, PDF and Flash Files as well as Import Files may also be considered Documents since LinkScan can examine those files for the presence of Links. Images (such as .gif and .jpg files) are not considered documents.
References to sites other than the one being scanned (External Links) are not documents either, since LinkScan does not examine the content of those files for the presence of Links.
![]()
The LinkScan system is made up of a number of different file types:
In a basic LinkScan installation these files are organized within the following directory structure:
linkscan/ Contains all of the executable files including some diagnostics and utilities together with a number of configuration and control files including the linkscan.sys file and the Global Configuration File, linkscan.cfg (discussed below)
linkscan/docs/ Contains this documentation in HTML format together with a number of image files used by the LinkScan Menus and Reports. You may, optionally, move the contents of this directory to another location on your server if, for example, you do not wish to install the LinkScan directory under "www root"
linkscan/default/ Contains some additional configuration files including the Project Configuration File, linkscan.cfg.
linkscan/default/data/ This directory (and the subdirectories within it) are created during execution and contain the results of the scan; the LinkScan database.
linkscan/utils/ This directory contains a number of supporting utility programs.
linkscan/weblint/ This directory contains the weblint HTML syntax checking software.
![]()
LinkScan's operation is controlled by a number of different configuration files. When running LinkScan via the Windows Graphical User Interface, these files are somewhat invisible. However, they still control the execution of the program and you may find it useful to view the raw configuration files from time to time. On Unix systems, these files represent the primary method of configuring LinkScan. All of the files are formatted in plain ASCII text and may be viewed and modified using the editor of your choice (e.g. Windows Notepad, Unix vi, emacs, pico, nedit, et al).
The most important configuration files are:
linkscan.sys: This file (there is only one) resides in the main LinkScan directory. This file contains the basic information concerning LinkScan and your computer. That includes the LinkScan License details and information that controls how LinkScan interfaces with other systems and services on your computer.
linkscan.mas: This file (there is only one) resides in the main LinkScan directory. This file contains a simple list of the available LinkScan Projects.
linkscan.cfg: Multiple copies of this file may reside within a single LinkScan installation. One copy, known as the Global Configuration File, resides in the main LinkScan directory. An additional linkscan.cfg file, known as the Project Configuration File resides within each LinkScan Project subdirectory.
LinkScan always reads the Global Configuration File and the Project Configuration File (in that order). Hence it is important to understand how all of the commands are processed. Each command is defined as either single-valued or multi-valued; see the LinkScan Command Summary. Single-valued commands are overwritten each time they are read, so the last value read is the significant value. Multi-valued commands are cumulative; all are added to the list of values for that command. Note that in some cases, the order in which multi-valued commands are read may impact the manner in which they are subsequently processed (this is noted where appropriate).
This approach provides tremendous flexibility. It means you can establish Global Settings in the Global Configuration File that apply to all Projects. And you may override (single-valued) settings or supplement (multi-valued) settings with additional commands in the Project Configuration File(s); these being Project-specific.
Some additional configuration/control files are discussed elsewhere in this manual. They are used by LinkScan (i.e. do not delete them!) but it is rarely necessary for users to examine or modify them.
All of the configuration files include extensive comments. Comments are signified by the pound sign like this:
# This line contains only a comment Realcommand = 1 # This comment could describe Realcommand
![]()
LinkScan incorporates a vast array of customization features many of which exploit the power of Perl Regular Expressions. For a description of Perl Regular Expressions on Unix systems, see man perlre. HTML versions are available at many locations including:
http://www.perl.com/doc/manual/html/pod/perlre.html
We also recommend the book Mastering Regular Expressions (a.k.a. the Owl Book) by Jeffrey E.F. Friedl, and published by O'Reilly [ISBN: 1-56592-257-3].
![]()
We make extensive reference to these terms in the customization sections of this manual and they are introduced here for your convenience.
Let us assume that we are scanning the website:
http://www.example.com/
An individual document within that website might be:
http://www.example.com/products/widget.html
LinkScan will refer to that page using its relative-path, which in this case, is:
products/widget.html
A relative-path-expression is a Perl Regular Regular Expression that matches relative-path. For example, all of the following will match our widget page:
products/widget.html # Also matches products/widgetXhtml products/widget\.html$ # Does not match anything else (|.*/)widget\.html$ # Matches widget.html in any directory
![]()
This section describes the pre-requisites for LinkScan and leads into step-by-step instructions for performing a new installation.
![]()
LinkScan is supported on a wide variety of platforms including:
We do not recommend Windows 95/98/ME for scanning large websites of more than 5000 documents. Although LinkScan has been tested on websites of significantly greater size, performance and stability will be much improved when running under operating systems with a true multi-processing implementation such as Windows NT/2000/XP/Vista or Linux/Unix.
Disk and memory requirement depend almost exclusively on the size and nature of the website(s) to be analyzed. However, the following guidelines are intended to assist users with their capacity planning needs:
Memory: We recommend 64 Mbytes of RAM (or more) for scanning websites up to 5,000 documents. 128 Mbytes is generally sufficient for sites of up to 50,000 documents. Some experimentation is generally essential when considering very large sites beyond 50,000 documents.
Disk Space: With a default configuration the LinkScan Database will require around 5 Mbytes of disk storage per 1000 documents scanned.
![]()
To successfully install and configure LinkScan on your computer you must have:
An appropriate version of Perl Version 5 installed on your computer. You may download a version suitable for your system via:
A copy of the LinkScan software and a LinkScan License Key. Both are available from:
![]()
We recommended that new users get a fast start by jumping to one of the following pages:
![]()
This section describes how to upgrade an existing LinkScan installation to LinkScan Version 12.0.
In view of the dramatic enhancements since LinkScan 9.0, we strongly recommend that you perform a clean installation into a brand new folder; C:\LinkScan10\ is the suggested default.
Once you are completely satisfied with the new setup, you may manually delete the old LinkScan folder and all of its contents to remove the prior version and recover that disk space.
Simply install LinkScan 12.0 on top of your existing LinkScan files (typically under C:\LinkScan10\).
![]()
This section describes how to create, configure and scan a LinkScan Project.
![]()
From the Main LinkScan Window, click New.
You will be prompted for a Project Name and Description. You may elect to create a brand new (empty) Project or to create the Project by cloning/copying an existing Project.
![]()
From the Main LinkScan Window, select an existing Project from the displayed list of Projects and click Plan.
On the Plan Project Dialog you must:
Scanning Method: We recommend that you use the Network (HTTP) Scanning method, at least initially. This method is frequently the most appropriate and is also the simplest to configure. Optionally, you may also configure LinkScan to check for Orphaned Files but this requires a more detailed knowledge of your server environment and again we suggest you defer this until you are more familiar with LinkScan.
Review the status of the Case Sensitive Pathnames checkbox. This tells LinkScan whether to treat index.html and INDEX.HTML, for example, as a single file or two different files. In general, this box should be checked when scanning websites hosted on Unix servers and unchecked when scanning websites hosted on Windows servers.
Also note the status of the Onlyinclude/Onlyfollow setting. Typically, this will be blank. However, if you enter a URL such as:
http://www.example.com/Products/index.html
LinkScan will automatically enter Products/ in the text box below. You may use the associated Radio Buttons to control the scope of the scan.
Select Full Site (default) to scan the entire site.
Select Onlyfollow if you wish to completely scan the Products/ directory. LinkScan will validate all of the links leading to other directories within the site. However, it will not follow them and scan those other areas of the website.
Select Onlyinclude if you wish to scan the Products/ directory without following, or even checking, those links that lead to other areas of the website.
Click OK to save the settings or Cancel to discard them.
![]()
From the Main LinkScan Window, select an existing Project from the displayed list of Projects and click Scan.
LinkScan will display the Scanning Panel which enables you to monitor progress as the scan proceeds.
On completion of the scan, the Cancel button will change to an OK button and the system will beep. Press the OK button to dismiss the Scanning Dialog box.
You have now completed a scan of the website and LinkScan has created a Database for that Project. Next you will want to examine the findings by following the steps described in Examining the Results.
![]()
LinkScan supports automation and scans may be initiated from the DOS prompt, BAT files and other scripting languages, via system schedulers and even from the Windows APIs. See Scheduling LinkScan. When executed in this manner, the following command line options are available.
C:\LinkScan10> perl linkscan.pl -help
LinkScan Version 12.0 Windows
Copyright 1997-2008 Electronic Software Publishing Corporation
USAGE: linkscan.pl {-help} {-alllinks} {-fast} {-home pathname} {-http}
{-newproject name} {-noexternal} {-noorphans} {-project name}
{-quiet} {-remote URL} {-retest}
-help Displays this message
-alllinks Check all external links [Override: Maxgoodhours etc]
-fast Use larger number of processes to speed testing
-home pathname Specify starting page [Override: Homefile in linkscan.cfg]
-http Use HTTP navigation [Equiv: Execute .* and -noorphans]
-newproject name Create a new LinkScan Project
-noexternal Test internal links only [Default: Internal and External]
-noorphans Disable checking for orphaned files
-project name Select a LinkScan Project
-quiet Reduce verbosity of progress/status messages
-remote URL Specify Remote Site [Equiv: -http; Override: Homeurl/Homefile]
-retest Repeat last test, rechecking only those links that failed
Detailed Help [Y/N]:n
![]()
Once a Project has been scanned and a database created, a wide range of different reports are available.
This document describes those reports and how to view them interactively using a simple web browser-based interface. Note that a batch command-line interface is also available. See Section 12 of this manual.
To view the reports interactively:
From the main LinkScan Window, select an existing Project from the displayed list of Projects and click Exam. This will activate the LinkScan Web Browser and send it to:
http://127.0.0.1:83/LinkScan/linkscan.cgi
The first time you access the results, you will be presented with the LinkScan Login and Preferences Menu. Simply click Login Now. No username is required unless you later decide to enable various LinkScan security features.
If you prefer, you may tell LinkScan to display the results in your normal Windows default browser. On the main LinkScan Window, click Options and select the Display Tab.
Once you have logged in, you will be presented with the LinkScan Main Menu.
You must select one of the individual Reports and submit the form by pressing Select Report.
A help page is available for each type of LinkScan Report. You may view the appropriate help page at any time by using the Help option on the context-sensitive LinkScan Toolbar. You may also use the [?] links on the LinkScan Main Menu, or the links provided in the summary table below.
The most frequently used reports have been organized in the left hand column; we suggest new users start there. Also, many of the reports incorporate hyperlinks to other reports. This means you can use a drill-down paradigm to view more detail associated with a specific problem or document. For example, some users may never explicitly select a LinkScan/QuickCheck Report. But they will likely view reports of that type by following the [Src] links from other reports.
Summary of Available Reports |
|
| Project Summary Report Summary statistics for the current project |
Summary of All Projects Report Summary statistics for all configured projects |
| Problem Documents Report List documents containing potential problems |
Selected Status Codes Report List errors of specific types |
| Document Detail Report List all/selected documents |
All Pages Linking To ... Report Find pages that link to... |
| Critical Errors Report List most critical errors |
Orphaned Files Report List orphaned files |
| Detailed Errors Report List all/selected errors |
External History Report View history of an external link |
| Changed Documents Report Compare two scans of the current project |
Redirections Report List a summary of redirections |
| Search Documents Report Ad hoc searching: document-centric |
System Configuration Report Display current LinkScan configuration settings |
| Search Links Report Ad hoc searching: link-centric |
LinkScan/QuickCheck View source code and detailed analysis of a document |
| SiteMap Report Display LinkScan SiteMap |
LinkScan/TapMap Display LinkScan TapMap |
The LinkScan Main Menu may include an Owner Selection Box. If enabled, this option will allow you to select a sub-set of the website to which subsequent reports will apply.
In a default configuration, the Owner Selection Box will include entries for each top-level directory scanned, in addition to the special entry "All". This will be the default selection and subsequent reports will apply to the entire website scanned.
Note however, that the LinkScan Administrator may configure and customize the manner in which Owners are created. Hence your installation may appear and behave somewhat differently from that described herein.
In many cases, when you submit the form by pressing Select Report you will be presented with a second menu of options. Initially, we suggest you accept the default options which have been carefully designed to produce excellent results in the vast majority of situations. However, to learn more, you may use the context-sensitive Help button on the LinkScan Toolbar at any time.
Each of the LinkScan Menus and Reports includes a common LinkScan Toolbar. It contains a number of links:
| Main Menu Preferences Advanced | Help Reference HowTo Card |
The Main Menu link will always return you to the LinkScan Main Menu.
The Preferences link will always take you to the LinkScan Login and Preferences Menu.
The Advanced link appears when appropriate and it will cause the current menu to be redrawn with additional options.
The Help link will display an appropriate section of the LinkScan Documentation depending upon the current context.
The Reference link will display the table of contents for the LinkScan Reference Manual.
The HowTo link will display a brief How To Guide with instructions for completing certain Common Tasks.
The Card link will display the LinkScan Quick Reference Card.
![]()
The following section describes each of the LinkScan Error and Status Codes. Each Status Code is assigned to one of six Severities:
| Symbol | Code | Severity | Explanation |
| |
0 | Unknown: | LinkScan has not tested or was unable to test this link |
| |
1 | Error: | LinkScan found a hard error on this link |
| |
2 | Possible Error: | There may be a problem with this link. It should be retested at a later time |
| |
3 | Warning: | LinkScan found something unusual about this link. Manual inspection highly recommended |
| |
4 | Advisory: | This link is probably ok, but manual inspection recommended |
| |
5 | No Error: | This is a good link |
The Severity associated with any specific Error or Status Code may be customized by the LinkScan Administrator through the use of the Statuscode option.
Status codes in the range 0-99 are generated exclusively by LinkScan and generally refer to the status of local links (HTML files, Non-HTML files, etc.).
Status codes in the range 100-699 are defined exclusively by the HyperText Transfer Protocol.
Status codes in the range 800-3099 are generated exclusively by LinkScan and generally refer to Networking Problems (Failed DNS lookups, failure to connect to a remote server or timeouts) as well as some other LinkScan detected warning or advisory messages.
![]()
Explanation: This object has not been tested.
Action: Inspect this link manually.
![]()
Explanation: This HTML document was found OK.
Action: None required.
![]()
Explanation: The Referring document is linked to an HTML file that does not exist on your server.
Action: Create/restore the missing file or correct the erroneous reference.
![]()
Explanation: This non-HTML file was found OK.
Action: None required.
![]()
Explanation: The Referring document is linked to a non-HTML file that does not exist on your server.
Action: Create/restore the missing file or correct the erroneous reference.
![]()
Explanation: The corresponding <a name=> tag was found OK.
Action: None required.
![]()
Explanation: The Referring document is linked to a <a name=> tag that does not exist within the target document.
Action: Create/restore the missing tag or correct the erroneous reference.
![]()
Explanation: This HTML file cannot be reached (directly or indirectly) from your home page.
Action: Check whether this is intentional or an error.
![]()
Explanation: This non-HTML file cannot be reached (directly or indirectly) from your home page.
Action: Check whether this is intentional or an error.
![]()
Explanation: This server-side Imagemap file was found OK.
Action: None required.
![]()
Explanation: The Referring document is linked to a server side Imagemap file that does not exist on your server.
Action: Create/restore the missing file or correct the erroneous reference.
![]()
Explanation: This mailto: link appears valid based on an examination of the tag and E-mail address syntax.
Action: None required.
![]()
Explanation: This mailto: link appears invalid based on an examination of the tag and E-mail address syntax.
Action: Inspect this link manually.
![]()
Explanation: This link is almost certainly missing a trailing "/". LinkScan was able to validate the link by adding the "/".
Action: Add a "/" character to the end of the existing URL. This omission, although not normally fatal, may cause visitors that try to follow the link problems or delays.
![]()
Explanation: LinkScan identified but did not process this Server Side Include (SSI). If you are scanning the website via Network (HTTP) Access, your server failed to process the SSI and the served document may be incomplete!
Action: Inspect this Server Side Include manually.
![]()
Explanation: This PDF document was found OK.
Action: None required.
![]()
Explanation: The Referring document is linked to a PDF document that does not exist on your server.
Action: Create/restore the missing file or correct the erroneous reference.
![]()
Explanation: LinkScan found a tag of the form <A HREF=...> with no corresponding </A> tag. This check is not enabled in a default configuration.
Action: Correct the markup. Mismatched tags may cause problems with some or all browsers. If very large numbers of these errors "clog" the LinkScan database, this check may be disabled via the Closeatag setting.
![]()
Explanation: This link uses a scheme that LinkScan did not recognize as valid. LinkScan validates various schemes (http:, https:, ftp:, ldap:, mailto:). It is aware of, but does not validate, other common schemes (e.g. gopher:, news:) and these are stored with No Status. This link uses an unknown scheme. It may caused by a typographical error.
Note: links using the file: scheme are always marked with an Invalid Scheme Error. The use of the file: scheme is rarely desirable (or intended) in published documents and generally indicates an oversight.
Action: Inspect/correct this link manually. In rare cases, when the use of the file: scheme is actually intended, use an Exclude or Substitute command to modify the LinkScan behavior as appropriate.
![]()
Explanation: The Referring document contains an IMG SRC tag without the ALT, HEIGHT and/or WIDTH attributes.
Action: Adjust the specified IMG SRC tag.
![]()
Explanation: This Flash document was found OK.
Action: None required.
![]()
Explanation: The Referring document is linked to a Flash document that does not exist on your server.
Action: Create/restore the missing file or correct the erroneous reference.
![]()
Explanation: This Text document was found OK.
Action: None required.
![]()
Explanation: The Referring document is linked to a Text document that does not exist on your server.
Action: Create/restore the missing file or correct the erroneous reference.
![]()
Explanation: This Javascript document was found OK.
Action: None required.
![]()
Explanation: The Referring document is linked to a Javascript document that does not exist on your server.
Action: Create/restore the missing file or correct the erroneous reference.
![]()
Explanation: This XML document was found OK.
Action: None required.
![]()
Explanation: The Referring document is linked to a XML document that does not exist on your server.
Action: Create/restore the missing file or correct the erroneous reference.
![]()
Explanation: An HTML Syntax Error was found.
Action: Correct the HTML markup.
![]()
Explanation: This HTTP Status Code will not normally arise with LinkScan.
Action: Inspect this link manually.
![]()
Explanation: This HTTP Status Code will not normally arise with LinkScan.
Action: Inspect this link manually.
![]()
Explanation: LinkScan found a good (external) URL.
Action: None required.
![]()
Explanation: An unusual error occurred.
Action: Inspect this link manually.
![]()
Explanation: The target server requires a language selection before serving the applicable document.
Action: Add a command to the linkscan.cfg file such as:
Extraheader Accept-Language: en
![]()
Explanation: This URL has moved permanently.
Action: Update this link as soon as possible. The redirection instruction may expire shortly, making it harder for you to find the new location.
![]()
Explanation: The URL of the page retrieved is different from the URL of the page requested. This is a design feature of the referenced server. According to the http specifications, you should continue using the existing URL. However, in our experience, such links should be inspected manually. Some servers report redirections to temporary URL's that are specific to the current user session. It would clearly be undesirable to modify your existing hyperlinks in these situations. But, other servers return a 302 Status Code when the URL has in fact been moved "permanently".
Action: Inspect this link manually.
![]()
Explanation: An unusual error occurred.
Action: Inspect this link manually.
![]()
Explanation: This link must be accessed via a proxy server.
Action: Inspect this link manually and contact your LinkScan Administrator.
![]()
Explanation: This status code is no longer used and is reserved.
Action: Inspect this link manually and contact the Web Server Administrator.
![]()
Explanation: This link is temporarily redirected.
Action: Inspect this link manually.
![]()
Explanation: An unusual error occurred.
Action: Inspect this link manually.
![]()
Explanation: The remote server reported that you are not authorized to access the requested object. You may be able to access it manually if you supply a valid username and password.
Action: Inspect this link manually.
![]()
Explanation: The remote server reported that you are not authorized to access the requested object. You may be able to access it manually if you supply a valid username and password.
Action: Inspect this link manually.
![]()
Explanation: The remote server understood the request but refused to fulfill it. Supplying a username and password will not help.
Action: Inspect this link manually.
![]()
Explanation: The remote server reported that the requested object does not exist. This condition is probably (but not necessarily) permanent.
Action: Inspect the link manually. A very small number of servers report a "Not Found" error when there is, in fact, no problem. In some cases, the server may display a "Moved" message even though it did not supply a "Moved" header.
![]()
Explanation: An unusual error occurred.
Action: Inspect this link manually.
![]()
Explanation: An unusual error occurred.
Action: Inspect this link manually.
![]()
Explanation: The Proxy Server requires authentication.
Action: Review the LinkScan Proxy Server configuration settings or contact your LinkScan Administrator.
![]()
Explanation: The Request timed out.
Action: Check the link manually. This URL is currently unavailable. LinkScan was not able to establish whether the situation is temporary or permanent. You may wish to probe the site using the standard ping and traceroute utilities.
![]()
Explanation: An unusual error occurred.
Action: Inspect this link manually.
![]()
Explanation: The remote server reported that the requested object does not exist. The condition is permanent and no forwarding address is known.
Action: Inspect the link manually.
![]()
Explanation: An unusual error occurred.
Action: Inspect this link manually.
![]()
Explanation: An unusual error occurred.
Action: Inspect this link manually.
![]()
Explanation: An unusual error occurred.
Action: Inspect this link manually.
![]()
Explanation: An unusual error occurred.
Action: Inspect this link manually.
![]()
Explanation: An unusual error occurred.
Action: Inspect this link manually.
![]()
Explanation: An unusual error occurred.
Action: Inspect this link manually.
![]()
Explanation: An unusual error occurred.
Action: Inspect this link manually.
![]()
Explanation: An unusual error occurred.
Action: Inspect this link manually.
![]()
Explanation: An unusual error occurred.
Action: Inspect this link manually.
![]()
Explanation: The connection to the remote server timed out.
Action: Check the link manually. This URL is currently unavailable. LinkScan was not able to establish whether the situation is temporary or permanent. You may wish to probe the site using the standard ping and traceroute utilities.
![]()
Explanation: An unusual error occurred.
Action: Inspect this link manually.
![]()
Explanation: An unusual error occurred.
Action: Inspect this link manually.
![]()
Explanation: This link was skipped because it has been tested recently. See How to control the testing of external links.
Action: None required.
![]()
Explanation: This link was skipped because an excessive number of other links to the same server appeared broken. The server is probably down, either temporarily or permanently. See How to control the testing of external links.
Action: Retest this link later and/or manually inspect the links to this server.
![]()
Explanation: This link was skipped because the limit on the number of FTP links to any one server was exceeded. See How to control the testing of external links.
Action: Manually inspect this link and/or increase the Maxftp setting.
![]()
Explanation: This link was skipped because the limit on the number times LinkScan checks the same CGI with different queries was exceeded. This avoids the possibility of LinkScan checking the same URL with a potentially infinite number of automatically generated query strings. See How to control clusters of links.
Action: Manually inspect this link and, if appropriate, increase the Maxcgi setting.
![]()
Explanation: LinkScan was unable to locate the requested server.
Action: Check the link manually. This server may no longer exist. Or, it is possible that the remote site's Domain Name Server (DNS) was temporarily unavailable at the time LinkScan tried to access it. You may wish to probe the site using the standard nslookup utility.
![]()
Explanation: LinkScan was unable to complete a DNS lookup.
Action: Check the link manually. This URL is currently unavailable. LinkScan was not able to establish whether the situation is temporary or permanent. You may wish to probe the site using the standard ping and traceroute utilities.
![]()
Explanation: LinkScan was unable to establish a TCP/IP connection to the remote server. Most likely, the remote server is currently rejecting connections.
Action: Check the link manually. This URL is currently unavailable. LinkScan was not able to establish whether the situation is temporary or permanent. You may wish to probe the site using the standard ping and traceroute utilities.
![]()
Explanation: A timeout arose while attempting to connect() to the remote server.
Action: Check the link manually. This URL is currently unavailable. LinkScan was not able to establish whether the situation is temporary or permanent. You may wish to probe the site using the standard ping and traceroute utilities.
![]()
Explanation: This link is almost certainly missing a trailing "/". LinkScan was able to validate the link by adding the "/".
Action: Add a "/" character to the end of the existing URL. This omission, although not normally fatal, may cause visitors that try to follow the link problems or delays.
![]()
Explanation: The remote server did not supply a valid http header, but it did appear to serve up a valid HTML document.
Action: Inspect this link manually.
![]()
Explanation: This link uses a numeric IP address. These addresses are much more likely to change than conventional server addresses referenced via the Domain Name Service (DNS).
Action: We recommend that you use a conventional URL if at all possible.
![]()
Explanation: This URL appeared to be subject to multiple redirections. LinkScan will follow up to five redirections. It then generates a 907 error rather than continue in a potentially infinite loop.
Action: We recommend that you inspect your server redirections (often defined in a .htaccess file).
![]()
Explanation: This link is almost certainly missing a trailing "/". LinkScan was able to validate the link by adding the "/".
Action: Add a "/" character to the end of the existing URL. This omission may cause significant problems for some users that access the web via proxy servers.
![]()
Explanation: This error typically results when a remote server disconnects a TCP/IP connection prematurely.
Action: Inspect this link manually. If problems persist, please contact LinkScan Technical Support at linkscan@elsop.com.
![]()
Explanation: The server attempted to redirect the request to a different URL using an HTTP "Location" header but failed to supply an absolute URL as required by the HTTP specifications.
Action: Check the HTTP server configuration files and/or any CGI scripts that generate HTTP "Location" headers and ensure they transmit an absolute URL on redirections.
![]()
Explanation: This link contains an 'unsafe' character; probably a control character or a non-encoded space (spaces in URL's should be written as "%20"). Different browsers will interpret this link differently.
Links written with a leading query... <A HREF="?Something"> will also be flagged with a 911 Error. Although strictly legal, we have found that different browsers process the tag in a wildly inconsistent manner. Include some or all of the pathname to avoid this problem and eliminate the error.
Action: We recommend that you inspect and correct this link.
![]()
Explanation: LinkScan was able to establish a TCP/IP connection to the specified port (Default: 443) on the specified server. LinkScan does not natively support SSL/HTTPS on Unix platforms and did not validate the pathname portion of the URL.
Action: We recommend that you inspect this link manually using a browser with SSL support if you wish to validate the complete URL.
![]()
Explanation: LinkScan processed a Redirect directive in the linkscan.cfg file.
Action: Check this link manually.
![]()
Explanation: LinkScan detected a redirection specified using a <META HTTP-EQUIV REFRESH> tag.
Action: This construct is not supported by all clients. We recommend that you at least insert a regular hyperlink in this document that will be visible by someone viewing the page.
![]()
Explanation: LinkScan detected a redirection specified using a <META HTTP-EQUIV REFRESH> tag. Furthermore, the target location was specified using a relative URL.
Action: This construct is not supported by all clients. We recommend that you specify the REFRESH using an Absolute URL and insert a regular hyperlink in this document that will be visible by someone viewing the page.
![]()
Explanation: LinkScan was able to establish a TCP/IP connection to the specified port (Default: 389) on the specified server. LinkScan does not natively support LDAP and did not validate the query portion of the URL.
Action: We recommend that you inspect this link manually using a browser with LDAP support if you wish to validate the complete URL.
![]()
Explanation: LinkScan connected to the remote server but did not receive any HTTP response headers.
Action: We recommend that you inspect this link manually.
![]()
Explanation: A timeout arose after LinkScan connected established a connection to the remove server and during the exchange of HTTP Request and Response Headers.
Action: Check the link manually. This URL is currently unavailable. LinkScan was not able to establish whether the situation is temporary or permanent. You may wish to probe the site using the standard ping and traceroute utilities.
![]()
Explanation: A timeout arose after LinkScan connected established a connection and exchanged HTTP Request and Response Headers but during the transmission of the document body. Typically this arises when LinkScan attempts to download a very large document (e.g. multi-MegaByte PDF file) over a limited bandwidth connection.
Action: Check the link manually.
![]()
Explanation: A timeout arose but no other details are available.
Action: Check the link manually.
![]()
Explanation: LinkScan downloaded an incomplete document body because the size exceeded the Maxdownload parameter.
Action: Check the link manually.
![]()
Explanation: LinkScan was not able to create a socket (network connection) while testing this link. This indicates an internal problem with LinkScan and/or your operating system.
Action: Contact LinkScan Technical Support at linkscan@elsop.com.
![]()
Explanation: The Windows Internet Library was not able to access this URL. The remote server may have an invalid or unrecognized security certificate.
Action: Inspect this link manually.
![]()
Explanation: A data file referenced by a LinkScan multipart POST command was not found.
Action: Correct the POST command and/or supply the missing data.
![]()
Explanation: LinkScan was not able to establish the status of this link. This error tends to arise with approximately 0.1 percent of servers on the Web. Generally, the remote server is completely non-compliant with the http specifications or refused to accept TCP/IP connections from your current IP address.
Action: Inspect this link manually.
![]()
Explanation: LinkScan failed to receive a satisfactory response from this FTP server. The error description reflects the actual message returned by the FTP server.
Action: Inspect this link manually.
![]()
Explanation: This mailto tag appears to contain an e-mail address with an invalid syntax.
Action: Inspect this link manually.
![]()
Explanation: This mailto tag appears to refer to an invalid address. The SMTP server associated with this address reported that it did not recognize the username.
Action: Inspect this link manually.
![]()
Explanation: This mailto tag appears to point at a valid e-mail address. The SMTP server associated with that address reported the mailbox was full.
Action: Inspect this link manually.
![]()
Explanation: This mailto address is suspect. LinkScan was unable to obtain a satisfactory response from the SMTP server associated with that address.
Action: Inspect this link manually.
![]()
Explanation: This link resulting in a redirection to a URL matching the user-specified Errordoc pattern (probably a custom error page).
Action: Inspect this link manually.
![]()
Explanation: This document contained a string matching the user-specified Errorbody pattern. The document probably contains a human-readable error message even though the document was served with a 200 OK HTTP status code.
Action: Inspect this link manually.
![]()
Explanation: This document contained a string matching the user-specified Profiler pattern.
Action: Inspect this link manually.
![]()
LinkScan is compatible with virtually any existing Windows scheduling utility.
Using Notepad or a similar editor, simply edit the file linkscan.bat which is automatically installed in the LinkScan folder. This basic Windows BATCH file must set the current working directory to the LinkScan folder and execute LinkScan for each required Project.
REM Set current working directory CD /D C:\LinkScan10\ REM Execute LinkScan Phase 1 call perl linkscan.pl -project myproject -manual REM Execute LinkScan Phase 2 call perl linkscan2.pl -project myproject REM Execute LinkScan/Dispatch (if required) call perl dispatch.pl -project myproject -options REM Execute command line reports (if required) REM Must set environnment variable for these call set linkscan=linkscan call perl linkscan.cgi -project myproject -options
See the following for a summary of the available command line switches/options:
Please note the following points:
You must explicitly set the current working directory to the LinkScan folder before executing LinkScan.
You must specify the Project name on the command line to prevent LinkScan from prompting the (absent) user to select a Project.
You must run linkscan.pl with the -manual switch and then run linkscan2.pl from the BATCH file. If you omit the -manual switch, linkscan.pl will automatically execute linkscan2.pl but the BATCH script will execute the next command without waiting for linkscan2.pl to complete execution.
You must run set the environment variable linkscan before executing linkscan.cgi via a DOS prompt or script.
Finally, configure your Windows Scheduler to execute the file:
C:\LinkScan10\linkscan.bat
according to the required schedule. LinkScan is compatible with almost all Windows Schedulers -- for example, the one you use to scan your system for viruses. Windows 2000 users may wish to use the standard system scheduler which works rather well. See Control Panel | Scheduled Tasks.
![]()
LinkScan incorporates the ability to examine the files on your local hard drive and interpret them in a manner very similar to a web server. This capability has two major applications:
It can dramatically accelerate the scanning of large numbers of static HTML documents.
It enables the identification of Orphaned Files.
Configuration is inherently significantly more complex when compared to normal HTTP Scanning. In particular, you must configure the following items:
![]()
From the main LinkScan Window, select a Project and click Plan.
On the Basic Settings Tab of the Project Planning property sheet, select HTTP Scanning with Orphans. [Screenshot]
Select the Root Tab and use the Find button to navigate to the folder that corresponds to the root of the website you wish to scan.
Click OK to save.
If and only if you have different URL's mapped to different File System Folders, you will need to select the Aliases Tab and configure the additional mappings.
![]()
From the main LinkScan Window, select a Project and click Plan.
On the Basic Settings Tab of the Project Planning property sheet, select File System Scanning. [Screenshot]
Select the Root Tab and use the Find button to navigate to the folder that corresponds to the root of the website you wish to scan.
Select the Files Tab to review and, if necessary, modify the list of HTML file extensions and default pages.
Click OK to save.
If and only if you have different URL's mapped to different File System Folders, you will need to select the Aliases Tab and configure the additional mappings.
![]()
In some cases, the file system directories containing the web site may reside on a physically different computer from LinkScan. In these cases, LinkScan will support Network Shares (subject to any locally imposed security controls).
In other cases, the file system of the remote system may not be visible via the network, quite possibly for security reasons. LinkScan will be unable to scan the remote computer using the File System Scanning Method. You must use HTTP Scanning.
However, it is still possible to enable Orphaned File checking. In summary, you will need to execute a small, self-contained Perl program on the remote computer. It will assemble a "picture" of the file system and save it as a simple ASCII file. That file may be transferred to the LinkScan computer using FTP (or any other more secure technique) and used to perform the orphan analysis in lieu of direct access to the remote server.
Fully configure the selected Project as described in HTTP Scanning with Orphaned Files Detection above. However, when setting the Website Root Folder use the pathname applicable to the remote server.
Set the Imported Orphans Data File to the pathname of a file on your local computer. For example:
Orphanfile = C:/LinkScan/someproject/orphans.txt
Transfer the following files to the remote server:
C:/LinkScan10/lsfind.pl C:/LinkScan10/someproject/linkscan.cfg
On the remote server, execute the lsfind.pl program:
perl lsfind.pl orphans.txt
Transfer the orphans.txt file back to the LinkScan machine.
Initiate a scan of the target website in the normal manner. LinkScan will use the orphans.txt file from the remote server in lieu of scanning the file system on the local server.
![]()
The LinkScan Import function may be used to:
Validate a list of Links exported from some arbitrary data source (e.g. a database management system).
Validate a list of Documents (e.g. an arbitrary sub-set of pages from a web site) and all the links contained within them. This might include the most critical/popular pages perhaps extracted from an HTTP logfile analysis program. This could also represent an arbitrary user session including a sequence of form submissions with specific data values. Such sequences may be easily captured with the LinkScan Recorder.
When processing a list of Links each URL is checked in turn and its status stored in the LinkScan database. When processing a list of Documents, each document and every link within that document is checked and its status stored.
The import function offers enormous flexibility. To use this feature, carry out the following steps:
Prepare the Import File
LinkScan will import a simple ASCII file of the following format:
URL ... one or more tab characters ... URL-Description
URL's may be absolute, or relative to the Home URL for the current server. The URL-Description is imported and carried through to the LinkScan Reports for identification purposes. You may use any ASCII string, for example a database record number.
Import files may also include URL's using the extended LinkScan conventions for form submissions (GET, POST and Multi-Part POST). See How to Submit Forms.
An alternative field separator may be specified by including a special command as the first line of the file:
## \s+
The command starts with '##' in column one followed by a Perl expression that specifies the field delimiter. In the example above, '\s+' means one or more whitespace characters (tab or space).
Lines with a '#' in column one, and blank lines, are ignored as comments.
From the main LinkScan Window, select a Project and click Plan. [a href="ssedit.jpg">Screenshot]
Configure the Project Plan
Select the Import Tab and then select from:
Import Links to Import a list of links
Import Documents to Import a list of documents
Import Documents (no cache) to Import a list of documents
with caching disabled
Use the Find button to navigate to the prepared ASCII import file.
When using Import Documents LinkScan will by default check each document listed in the Import file but it will not follow those links and scan the entire site. Optionally, you may set Maxclicks on the Scope Tab and force LinkScan to execute a deeper scan. e.g. with Maxclicks = 3, LinkScan will check the Import File, the documents listed in the Import File, and the children (but not the grandchildren) of those documents.
Click OK to save.Special Considerations
LinkScan de-duplicates the list of links within an Import Document list. This means that LinkScan will validate each unique URL within the list only one time.
However, you may force LinkScan to process an Import Sequence so that the same URL or document is checked more than once. This may be achieved by adjusting the URL's to make them appear unique. Note that this also provides a means by which to differentiate the test results for each step. Simply edit the URL's to make them unique by adding dummy name-value pairs to the query string of the URL's:
http://www.example.com/cookie_sensitive?dummyseq=1
[...]
http://www.example.com/set_cookie
[...]
http://www.example.com/cookie_sensitive?dummyseq=2
If the URL's already include a query string, simply append the additional parameter to the existing query and change:
http://www.example.com/foo?name=value
to:
http://www.example.com/foo?name=value&dummyseq=1
Normally, LinkScan maintains the status of each link in a cache while it scans a site. This dramatically improves performance since LinkScan does not need to re-check commonly used images and other components over and over. However, it may also be undesirable with some stateful sequences. For example, if the same URL produces a completely different result before and after a cookie is set.
In those situations, you may use a special option (Import = 3) which will force LinkScan to flush its cache after each imported document has been validated.
![]()
LinkScan incorporates many powerful customization features described below.
Hint: We strongly recommend that you read Essential LinkScan Concepts before studying this section of the Reference Manual.
![]()
You may use any combination of the following commands to include or exclude specific areas of the target website.
Exclude relative-path-expression Exclude absolute-url-expression Nofollow relative-path-expression Onlyfollow relative-path-expression Onlyinclude relative-path-expression Maxlevels depth Maxclicks depth
Exclude: The Exclude command may be used to completely ignore specific links. You may supply a relative-path-expression to exclude Internal Links, or an absolute-url-expression to exclude External Links.
Nofollow: The Nofollow command may be used to provide even finer control over LinkScan's behavior. When LinkScan encounters a link matching a Nofollow command, it will validate the link (and check for any <a name = ... > tags if appropriate). However, it will not test any links that lead from the target document.
For greater flexibility and completeness, the Onlyinclude and Onlyfollow commands are also supported.
Onlyinclude: is logically equivalent to "Exclude everything except".
Onlyfollow: is logically equivalent to "Nofollow everything except".
Maxlevels: A command such as Maxlevels = 3 will limit the depth of the scan to three directory levels under server root.
Maxclicks: A command such as Maxclicks = 3 will limit the depth of the scan based on the number of clicks from the start of the scan. In order to more closely model the real user experience, LinkScan does not include clicks that result from following framesets or redirections.
The following rules of precedence apply when using multiple commands in combination:
Example 1: Exclude http://www.domain.com/ Exclude test/
All links to "http://www.domain.com/" and all files in the local "test/" subdirectory will be ignored by LinkScan.
Example 2: Nofollow user2/
LinkScan will check the links to files in the "user2/" directory, but it will not examine the content of any documents within the "user2/" directory or test any of the links contained within them.
Example 3: Onlyfollow user1/
LinkScan will check the documents in the local "user1/" subdirectory and test the links to files in other local directories. However, LinkScan will not examine the content of any documents that lie outside of the local "user1/" directory or test any of the links contained within them.
On websites that incorporate a high proportion of dynamic content it may not be productive to test any or all scripts with large number of query parameters or other variations. Controls are provided.
Maxcgi: The maximum number of times any single URL should
be probed with different query parameters. This prevents LinkScan from
trying to validate a CGI script or dynamic page with a potentially
infinite number of query parameters.
[Default: Maxcgi = 100 ]
Taglimit: The Taglimit command may be used to provide even finer control over the number of times clusters of URL's are probed. Syntax and example:
Syntax: Taglimit relative-path-expression maxnumber Example: Taglimit scripts/DatabaseLookup.asp 20
LinkScan will only attempt to parse 20 documents matching the pattern "scripts/DatabaseLookup.asp". Any further links matching the specified pattern will be completely ignored.
![]()
Many websites include some form of access control or user authentication features. These are:
In the case of HTTP or NTLM Authentication, when a user attempts to access a protected area, their browser will present a challenge in the form of a pop-up dialog box that requires a username and password to be entered. In the case of cookie-based arrangements, the user is normally required to login by filling out an HTML form and submitting it.
For sites that require HTTP Authentication, you must configure LinkScan with an appropriate Auth command:
Syntax: Auth server-name "realm-name" username password Examples: Auth www.example.com "" guestuser xxxxxx Auth app.example.com "Controlled Access" guestuser xxxxxx
You must include a realm-name (enclosed in double-quotes) but it may be empty. In that case, LinkScan will use the configured username and password for any realm on the target server. This is the recommended approach unless your server uses multiple realms with different access control rules for different portions of the website.
Some Intranet websites utilize the proprietary and undocumented Microsoft NTLM protocol to authenticate users. LinkScan (on Windows systems only) may be configured to scan such sites.
Note: This may result in other minor artifacts in the results of the scan since LinkScan will use the Microsoft Windows implementation of the HTTP protocol versus the (stricter) native LinkScan implementation.
HTTP access to some sites is controlled via authentication schemes requiring Cookies. For more information regarding Cookies see the Netscape Cookie Specification at http://wp.netscape.com/newsref/std/cookie_spec.html.
LinkScan will automatically accept and return all valid cookies received during the course of a scan. However, to gain access to the site, you may need to configure LinkScan to ensure that the appropriate cookies are set. This may be achieved by one of two techniques:
The submissions of a login form may be configured using the Extrahome command (described in the next section). However, you may optionally initialize LinkScan's collection of stored cookies (aka Cookie Jar) with one or more permanent Cookies by using the Cookie command:
Syntax: Cookie server-name cookiename=cookievalue Example: Cookie www.elsop.com LinkScan=cookie_value; Note: Do not enter space characters around the '=' character
The server-name is the name of the server to be tested. For security reasons and in compliance with the applicable standards, LinkScan will only send the cookie when the specified server-name exactly matches the hostname portion of the requested URL. In this context, server names and their corresponding IP addresses are considered to be different (consistent with all major browsers). The cookie names and values must be reverse engineered from your server code or "discovered" via your browser by enabling the "Prompt before accepting cookies" or examination of stored cookies on disk.
Hint 1: Sites with especially complex schemes (multiple levels of access control, subscription expirations etc.) might consider configuring their server and/or scripts to recognize a "super-user-cookie" specifically for testing purposes. This approach may also be used to trigger test points within server-based scripts and greatly improve the meaningful testability of complex dynamic content.
Hint 2: HTTP Authentication and Cookie related transactions are logged by LinkScan during the course of the scan. You may examine the following file to view the log: .../LinkScan/Projectname/data/linkscan.red
![]()
You may configure LinkScan to examine additional documents that would not normally be found during the scan and might otherwise be reported as orphaned files. The same technique may be used to submit forms on your website with specific data values for testing purposes. This is achieved with the Extrahome command:
Syntax: Extrahome relative-path-expression Examples Extrahome somedir/staticdoc.html Extrahome cgi-bin/getscript.cgi?Var1=aaa&Var2=bbb
The second example above includes a query string and is therefore equivalent to a FORM submission using the GET method. In addition, LinkScan includes support for special conventions that allow users to specify FORM submission operations using the POST method, including the Multi-Part POST, frequently used to upload files from a client to the server.
Examples: Extrahome cgi-bin/postscript.cgi??Name=Malcolm%20Hoar&Password=secret Extrahome upload.cgi???(postedfile;C:\LinkScan10\post\test.jpg;image/jpeg) Extrahome upload.cgi???Name1=Val1&(postedfile;/usr/home/test/test.jpg;image/jpeg)&Name2=Val2
The '??' convention is used to designate a POST operation.
The '???' convention is used to designate a Multi-Part POST operation.
The name-value pairs are delimited using the '&' character, in the normal manner.
The query strings must not contain any space characters; they must be percent-encoded according to the standard conventions.
The option to POST the contents of a client-side data file uses three parameters delimited with semi-colons and wrapped within in parentheses:
Hint: Use the LinkScan Recorder to automatically capture the correctly constructed URL's.
Hint 2: When using the Extrahome command to submit a login form to provide access to a site, you may also need to configure LinkScan so that it doesn't immediately "click" any LOGOUT button which would invalidate the newly created session.
![]()
LinkScan may be configured to interpret the contents of drop-down lists as links to other pages. The HTML specification does not define a standard method for indicating that a drop-down list contains hyperlinks (as opposed to regular data). Hence LinkScan needs some other "cue" and may be triggered by pattern matching of attributes within the SELECT tag. Consider, for example, the following:
<select name="URLLIST"> <option value="/products/" Selected> Relative URL to Products <option value="http://www.mydomain.com/services/"> Absolute URL to Services </select>
To instruct LinkScan to treat the contents of the drop-down list as URL's, use the following command:
Selecturl URLLIST
LinkScan will examine all SELECT tags and look for a Regular Expression match on the NAME attribute. If the match is successful (URLLIST in this example) LinkScan will treat each OPTION tag within the list as a hyperlink and validate it accordingly.
LinkScan includes the ability to validate links contained within JavaScript code. A relatively simple pattern matching technique is used -- LinkScan does not contain a full JavaScript interpreter. This means that LinkScan may "miss" some links or find "false positive errors" especially if the code creates the hyperlink references dynamically at run-time. The following Scriptmatch and Scriptnomatch commands give excellent results in most cases. However, you can customize the matching rules by changing these expressions and/or adding new ones.
Scriptmatch = (\w+://\S+|\S+/$|\S+\?\S+|\S+\.([a-z]{2,3}|[js]?html?|Z)$)
Scriptnomatch = .*([\(\)\[\]\{\}\']|document\.\S+|\.(src|com)$)
Some JavaScript constructs may still produce false errors. You may force LinkScan to ignore complete script blocks that match a specified pattern. For example:
Scriptexclude function\s+ZoomWindow
The above command will force LinkScan to ignore script blocks that contain a definition for the ZoomWindow function.
![]()
Many websites are constructed with special user-friendly error pages, sometimes known as "custom-404 documents". Some servers will deliver the error document directly whereas others may force a redirection to a specific error document. In either case, an issue arises if your server delivers the error document with a 200 OK response code. LinkScan (or any other link checker) would not be able to detect the error condition.
A similar issue arises with some dynamically generated documents. For example, a Java applet may encounter a run-time error condition after it has already sent a 200 OK response code to the client.
Hence LinkScan supports two special commands that may be used to detect such conditions and force a 404 Not Found error, regardless of the HTTP response code produced by the server/application. The first is used with servers that force a redirection by pattern matching on the HTTP Location: header. The second operates by pattern matches on the document bodies.
Syntax: Errordoc pattern Errorbody pattern Examples: Errordoc special/notfound\.html Errorbody (?i).*runtime\serror
In the Errordoc example, LinkScan will report as 404 Not Found any URL that is redirected to http://your.server/special/notfound.html. In the Errorbody example, LinkScan will report as 404 any document that contains the string runtime error in the document body. Note the (?i) makes the pattern match case-insensitive.
Hint: The Errorbody pattern match is carried out on the entire document, including comments. Developers might consider including a standard error string within comment tags that may be used to trigger the Errorbody match.
![]()
One of the most powerful (and complex) customization features of LinkScan concerns the real-time manipulation of links during the course of the scan. This is typically used to control the testing of sites with complex dynamic content. The basic commands available are:
Sessionmatch expression Substitute relative-path-expression expression Substituteraw relative-path-expression expression Substitutescript relative-path-expression expression
The Sessionmatch command is used to manipulate Session numbers. The Substitute command is used to perform transformations on resolved links. The Substituteraw is used to perform transformations on unresolved links (i.e. the raw contents of a tag or tag attribute). The Substitutescript is used to perform transformations of blocks of JavaScript code.
We shall consider a number of examples which may be adapted according to your specific needs.
Consider a site that produces links such as:
http://www.example.com/page1.asp http://www.example.com/page1.asp?Print
It is entirely possible that page1.asp has been designed in such a manner that it delivers the same basic content with minor variations in formatting depending upon the presence or absence of the Print query string. One might configure LinkScan with:
Substitute (.*\.asp)\?Print $1
Whenever LinkScan encounters a link matching the specified pattern it will make the substitution indicated before it tries to validate or follow that link. In this example, a link to:
http://www.example.com/page1.asp?Print
will immediately be transformed to:
http://www.example.com/page1.asp
Note, however, this is not the same as Excluding links which contain the Print query string; that would cause LinkScan to simply ignore the link. In this case, LinkScan will process the link but transform it on-the-fly during the scan.
Next we will consider a significantly more complex scenario.
Sessionmatch .*&token=([^&]+) Substitute (.*&token=)[^&]*(.*)$ $1!S$2
In this case, we use the special Sessionmatch command to capture and save the first value of the query parameter token that LinkScan sees. This is most likely some kind of session number assigned by the target server immediately following the submission of a login form. The Substitute command then instructs LinkScan to replace all subsequent values of token with the saved value (represented by the special parameter !S).
In this scenario, LinkScan ensures that the value of token can never change during the course of the scan from the originally assigned value.
Next we'll consider a JSP site that produces URL's with the following structure:
http://www.example.com/content?A=123&B=456&C=789&D=XYZ
It may not be productive or efficient for LinkScan to scan all of the pages using every combination and permutation of values for the parameters A, B, C, D... etc.. We can control that by manipulating the individual name-value pairs during the scan. For example:
Substitute (content\.jsp\?.*)&B=[^&](.*) $1&B=456$2 Substitute (content\.jsp\?.*)&C=[^&](.*) $1$2 Taglimit content\.jsp\?.*&D= 20
The first command fixes the value of B=456. Whatever value the parameter B takes on during the scan, LinkScan will force the value back to 456. The second command deletes any references to the C parameter from every link that it finds. We have also included the third Taglimit command; this will cause LinkScan to completely ignore the twenty-first and subsequent links that include a D parameter. In other words, in this case, we only want to test a representative sample (20) of links that include a D parameter.
For our next example, we shall consider a site that generates pages containing some links with the following structure:
http://www.example.com/cgi-bin/GenerateFrame?Referer=abc&Link=http%3A%2F%2Fwww.yahoo.com%2F
Rather than linking directly to Yahoo!, this page links to a script that generates a frameset that includes the referenced page. In a default configuration, LinkScan will happily follow the link, validating the frameset and the ultimate link to Yahoo!. However, it may not be productive to do that for potentially thousands of links. Furthermore, in the (extremely unlikely) event that the link to http://www.yahoo.com/ was broken, the error would appear in one of the GenerateFrame documents and not the original referring document. In order to repair that link, one would have to backtrack through the frameset to locate the original source of the trouble.
Hence we can apply more Substitute magic:
Substitute cgi-bin/GenerateFrame.*&Link=([^&]+).* !U$1
This command will extract the value of the Link= parameter, and the special !U token instructs LinkScan that the string needs to be un-encoded. So the original link:
http://www.example.com/cgi-bin/GenerateFrame?Referer=abc&Link=http%3A%2F%2Fwww.yahoo.com%2F
is transformed on-the-fly to:
http%3A%2F%2Fwww.yahoo.com%2F
and then decoded to:
http://www.yahoo.com/
And this means LinkScan can validate the link to Yahoo! directly without checking the GenerateFrame script many, many times. Furthermore, any errors will be flagged against the original document (and not one or more steps removed).
For our final example, we include for illustration the complete configuration for a real-world large and very complex dynamic site:
# Set the CGI limit to be very large # Include all file types on the Map Maxcgi = 10000 Mapinclude .* # Force &A=B and insert it immediately after the '?' Substitute (cgi-bin.*[&\?])A=[^&=]*&*(.*) $1$2 Substitute (cgi-bin.*\?)(.*) $1A=B&$2 # Discard null and undefined values Substitute (cgi-bin.*)&B=(null|undefined)(.*) $1$3 Substitute (cgi-bin.*)&C=(null|undefined)(.*) $1$3 Substitute (cgi-bin.*)&D=(null|undefined)(.*) $1$3 Substitute (cgi-bin.*)&R=(null|undefined)(.*) $1$3 # For 'category', take the &C= if present, otherwise the &B= Substitute (cgi-bin/bv/scripts/category.*\?A=B).*?(&C=[^&=]*).* $1$2 Substitute (cgi-bin/bv/scripts/category.*\?A=B).*?(&B=[^&=]*).* $1$2 # For 'content', take the &D= or &R= if present (call it &D=). Otherwise take the &B= Substitute (cgi-bin/bv/scripts/content.*\?A=B).*?&[DR]=([^&=]*).* $1&D=$2 Substitute (cgi-bin/bv/scripts/content.*\?A=B).*?(&B=[^&=]*).* $1$2 # For 'frame', take the &D= or &R= if present (call it &D=). Otherwise take the &B= Substitute (cgi-bin/bv/scripts/frame.*\?A=B).*?&[DR]=([^&=]*).* $1&D=$2 Substitute (cgi-bin/bv/scripts/frame.*\?A=B).*?(&B=[^&=]*).* $1$2 # For 'mailing...', take the &R= Substitute (cgi-bin/bv/scripts/mailing.*\?A=B).*?(&R=[^&=]*).* $1$2 # For 'contact', take the &B=, &C= and &Comments Substitute (cgi-bin/bv/scripts/contact.*\?A=B).*?(&B=[^&=]*).*?(&C=[^&=]*).*?(&Comments=[^&=]*).* $1$2$3$4 # Mark redirects to Error page as 404 # Mark documents containing 'Error Code:' as 404 Errordoc cgi-bin/bv/scripts/error.jsp Errorbody Error\s+Code:[^\n<]* # Hide some frequent arising errors Noforms = 1 Exclude images/arrow.gif
Next we will consider a reference to a JavaScript function:
<a href="javascript:MyFunction(4,5,6);">
The following Substitutescript command:
Substitutescript .*:MyFunction\((\d+),(\d+),(\d+)\) '/somepage.jsp?Par1=$1&Par2=$2&Par3=$3'
will transform the function call into the following link which will then be validated/processed by LinkScan.
/somepage.jsp?Par1=4&Par2=5&Par3=6
The Substitute commands may be used to modify existing links on-the-fly. However, a variation of this, the Insertlink command, may be used to insert additional links into specified documents in order to achieve a specific test coverage. Again, it is best illustrated by example:
Insertlink .*complex\.jsp\?.*SPVAR= - Insertlink (.*complex\.jsp\?.*) /$1&ALTMODE=1 +
As each document is scanned, LinkScan will process all Insertlink commands (in the order specified). The URL of the scanned document is matched against the first parameter of each Insertlink command. In the case of the first example above, a link to:
complex.jsp?VAR=1&SPVAR=2
will match the expression and LinkScan will abort all Insertlink processing for this document (signified by the minus character).
However, a link to:
complex.jsp?VAR=1
does not match the expression. Processing will continue to the second command. This does match the expression and LinkScan will insert a link into this document (signified by the plus character). Hence, when LinkScan processes:
complex.jsp?VAR=1
It will insert into that document, the following link:
complex.jsp?VAR=1&ALTMODE=1
Hint: Clearly, the Substitute command requires a good working knowledge of Perl Regular Expressions. If you need assistance, the LinkScan engineers will be happy to help. Please write to mailto:linkscan@elsop.com describing in as much detail as possible, the transformations you are seeking to achieve.
![]()
Most web browsers advertise their identity by including a User-Agent header with every request that they make. LinkScan also sends a User-Agent header. For example, the versions of Netscape Navigator, Microsoft Internet Explorer and LinkScan installed on the writers computer send, respectively:
User-Agent: Mozilla/4.08 [en] (WinNT; I ;Nav) User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) User-Agent: LinkScan Enterprise/12.0 Windows
Some websites are constructed in a manner that is browser sensitive. They may, for example, deliver customized pages depending on the users browser type. Hence LinkScan may be customized to emulate different browser types using the Extraheader command:
Syntax: Extraheader literal-header-string Example: Extraheader User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
In this example, LinkScan will advertise itself as Microsoft Internet Explorer version 5.5 running under Windows 2000.
In fact, the Extraheader command may be used to add any arbitrary HTTP headers to every request that LinkScan sends. A common application involves those servers which look for a language preference in the HTTP headers in order to deliver pages in the appropriate language. For example, the following command instructs LinkScan to include an English Language preference header with each request:
Extraheader Accept-Language: en
![]()
Sometimes a single website may contain links such as:
http://www.example.com/ http://www2.example.com/
Where www.example.com and www2.example.com resolve to the same host IP address. However, LinkScan would consider www2.example.com to be an External Link and not part of the www.example.com Project. Hence the Hostalias command may be used to assign more than one name to the current server. Syntax and example:
Syntax: Hostalias from-server-url to-server-url Example: Hostalias http://www2.example.com/ http://www.example.com/
A similar issue arises when scanning development or staging servers. For example, you may wish to scan the site:
http://staging.example.com/
but the site may contain one or more absolute links to http://www.example.com/. In this case, you can use the Mirrorurl command.
Syntax: Mirrorurl absolute-url Example: Homeurl = http://www.example.com/ Mirrorurl = http://staging.example.com/
In this case, LinkScan will resolve all links as if it were scanning http://www.example.com/. However, all actual HTTP requests will be directed to http://staging.example.com/. This provides a convenient mechanism for scanning development and staging copies of a production website.
![]()
You may define the ownership of any given document or file in one of several ways. Ownership directives are evaluated in the order specified with the last match taking precedence. Note that the file ownership attribute is case sensitive.
By the Unix File System ownership attribute. Note: this is not supported on Windows systems
By the Defaultowner command. The syntax for the Defaultowner command is:
Defaultowner owner-name
By pattern matching with one or more Owner commands. The syntax for the Owner command is:
Owner relative-path-expression owner-name
OR
Ownerq relative-path-expression owner-name
The Owner command operates on the pathname portion of the URL and does not process any query string (following a "?" character). The Ownerq command operates on the entire URL including any query string.
LinkScan also supports a special variation of the Owner command. This will automatically assign every file an owner-name based on the name of the directory in which it resides. The syntax is:
Owner *integer
The default setting (Owner *1) will assign each document to an Owner based on the top-level directory name (i.e. under "www root"). A setting of Owner *2 will cause LinkScan to assign Ownership based on the first two directory names. For example:
http://www.example.com/first/second/third/index.html
Will be assigned to the Owner first_second.
By using preexisting META tags in your HTML documents. For example, if your existing documents already contain tags of the form:
<METa name="S11CONTENT_OWNER" CONTENT="Malcolm Hoar">
You may set the Owner to 'Malcolm Hoar' by configuring a suitable pattern. e.g.:
Ownertags = ^meta\s+name\s*=\s*"content_owner"\s+content\s*=\s*"([^"]+)
Finally, once an Owner has been assigned to the file or document, you may manipulate the Owner string with a simple pattern substitution:
Owneralias .*?([a-zA-Z0-9]+)[\s\.\)]*$ \L$1
This example would take the string 'Malcolm Hoar' and convert the ownership to 'hoar'. This technique may be used to deal with synonyms such as 'M. Hoar.', 'Malcolm C Hoar '.
Example: Defaultowner elsop # Set default Owner *1 # Assign Owner based on top level dir ... Owner wrc/humor/ humor # But, make this subdir look like top-level Owner .*\.cgi$ webmaster # And give all *.cgi files to webmaster
When using LinkScan Dispatch to create reports for delivery by Electronic mail, you may define associations between Owners and Addresses with the Mailalias command. The syntax is:
Mailalias expression list-of-addresses
list-of-addresses may be a comma separated list of addressees if you wish to distribute the report to multiple recipients. Use Mailalias owner-name null to skip a specific Owner.
Example: Defaultowner elsop # Set default Owner *1 # Assign Owner based on top level dir ... Owner wrc/humor/ humor # But, make this subdir look like top-level Owner .*\.cgi$ webmaster # And give all *.cgi files to webmaster Mailalias elsop malch@elsop.com, ken@elsop.com Mailalias links ken@elsop.com Mailalias linkscan malch@elsop.com Mailalias wrc ken@elsop.com Mailalias humor ken@elsop.com Mailalias test null
If no Mailaliases are defined, Dispatch will address the reports to Ownername @ Mailhost
![]()
Facilities are provided to extract additional data from each document scanned, store those data in the LinkScan database and create various reports. The additional data collected are typically collected from the META tags in each HTML document.
Supported commands are provided for data extraction, substitution/manipulation and formatting:
# Userdata [123] match-expression expression # Userdatafmt [123] [DHLTX] integer[LRC] caption # D=date; H=hot links; L=link; T=truncate to format; X=normal # Userdatasub [123] expression expression
The following example illustrates the use of these commands to extract and process an employee badge number from document META tags:
Userdata 1 (?i)<meta\s[^>]*employee\s*=\s*"\s*(#?\d+)\s*" $1 Userdatasub 1 #?(\d+) $1 Userdatafmt 1 X 6R Badge-Number
In the above example, we use the first of the three available userdata fields. The first command extracts the badge number from the document META tag. The second command performs a substitution on the matched data to remove an optional pound symbol from the badge number. The third command defines the formatting attributes; X defines a simple text field; 6R specifies a six-character, right-adjusted layout and Badge-Number defines a simple caption.
During the course of the scan, the employee badge numbers are extracted from each document and stored in the LinkScan database. In fact, the userdata fields are stored in a separate file:
PATH-TO-LINKSCAN/Project-name/data/linkscan.usr
This means that it is relatively simple to post-process the data before creating reports. For example, in this case, one might translate the badge numbers to employee names via a lookup on an employee database. T