LinkScan

Planning a LinkScan Project

 

 Help   Reference   HowTo   Card 

Introduction

All basic LinkScan operations are carried out on Projects. The essential steps are:

  1. Create and Plan a Project: This step provides LinkScan with the basic specification and definition of the test you wish to perform. You will always need to supply the URL of the website you are seeking to test. Often you will want to define the test conditions more precisely by selecting from the large number of available options and customizations.

  2. Scan a Project: During this step, LinkScan actually executes the test scenario defined by the Project Plan.

  3. Examine the Results of a Project: Finally you will wish to examine and analyze the results of the test.

These basic steps correspond to the Plan, Scan, and Exam buttons on the main LinkScan window. [Screenshot]

In addition, New and Remove buttons are provided to create a new Project or permanently delete an old project that is no longer needed.

When creating a new Project, you may either create a brand new (empty) Project, or create a new Project based upon (cloned from) an existing Project. The latter technique provides a simple method to define multiple test scenarios with minor variations between them.

The remainder of this document describes the options available on the Project Planning property sheet dialog. You may press the property sheet Help button at any time to display the applicable section of this document.

Basic Settings Tab

Scope Tab

The settings on this tab are used to control the scope of the scan -- these rules are applied in addition to any Onlyinclude/Onlyfollow rule you may define on the Basic Settings Tab.

See Regular Expressions for a discussion of the pattern matching rules and their syntax together with some common examples.

Root Tab

The Root Tab is enabled when using File System Scanning.

Alias Tab

The Alias Tab is enabled when using File System Scanning.

You must configure the mapping between the Root URL of the website and the File System on the Root Tab. You may optionally configure additional mappings via the Alias Tab.

Files Tab

The Files Tab is only enabled when using File System Scanning. It routes different file types (as defined by their file extensions) to the appropriate parser/processor. Hence files with a .htm or .html extension are routed to the HTML parser.

Note: when using HTTP Scanning, the Internet Standards dictate that files are routed according to their MIME or Content-Type and not based on their file extension.

The following mappings are established by default:

You may wish to establish additional mappings; the following are commonly used:

Note that the LinkScan Text parser is an extremely generic implementation and it attempts to extract hyperlinks from any file type that is routed to it. In particular, it may be used to extract links from various Microsoft Office file types (e.g. .doc, .ppt, .xls etc.) as well as .url files as used within the Internet Explorer Favorites folder.

The lower half of the Files Tab is used to define what files to look for when a reference to a directory (without any explicit filename) is found. Typically index.html.

A checkbox controls whether or not to permit a directory listing to be created on-the-fly when a link to a directory is found but no index.html (or similar) file is present.

Mimes Tab

The Mimes Tab routes different file types (as defined by their MIME or Content-Type header) to the appropriate parser/processor.

The following mappings are established by default:

You may wish to establish additional mappings; the following are commonly used:

Enabling the PDF option may incur significant performance overheads in view of the large size of many PDF documents and the time required to download them.

Note that the LinkScan Text parser is an extremely generic implementation and it attempts to extract hyperlinks from any file type that is routed to it. In particular, it may be used to extract links from various Microsoft Office file types (e.g. .doc, .ppt, .xls etc.) as well as .url files as used within the Internet Explorer Favorites folder.

Import Tab

The LinkScan Import function may be used to:

When processing a list of Links each URL is checked in turn and its status stored in the LinkScan database. When processing a list of Documents, each document and every link within that document is checked and its status stored.

The import function offers enormous flexibility. To use this feature, carry out the following steps:

  1. Prepare the Import File

    LinkScan will import a simple ASCII file of the following format:

    URL ... one or more tab characters ... URL-Description

    URL's may be absolute, or relative to the Home URL for the current server. The URL-Description is imported and carried through to the LinkScan Reports for identification purposes. You may use any ASCII string, for example a database record number.

    An alternative field separator may be specified by including a special command as the first line of the file:

    ## \s+

    The command starts with '##' in column one followed by a Perl expression that specifies the field delimiter. In the example above, '\s+' means one or more whitespace characters (tab or space).

    Lines with a '#' in column one, and blank lines, are ignored as comments.

  2. Then select the import mode by changing the Import setting. Valid selections are:

    Import links
    Import documents
    Import documents with caching disabled

  3. Supply the pathname to the ASCII Import File.

Special Considerations

LinkScan de-duplicates the list of links within an Import Document list. This means that LinkScan will validate each unique URL within the list only one time.

However, you may force LinkScan to process an Import Sequence so that the same URL or document is checked more than once. This may be achieved by adjusting the URL's to make them appear unique. Note that this also provides a means by which to differentiate the test results for each step. Simply edit the URL's to make them unique by adding dummy name-value pairs to the query string of the URL's:

http://www.example.com/cookie_sensitive?dummyseq=1
[...]
http://www.example.com/set_cookie
[...]
http://www.example.com/cookie_sensitive?dummyseq=2

If the URL's already include a query string, simply append the additional parameter to the existing query and change:

http://www.example.com/foo?name=value

to:

http://www.example.com/foo?name=value&dummyseq=1

Normally, LinkScan maintains the status of each link in a cache while it scans a site. This dramatically improves performance since LinkScan does not need to re-check commonly used images and other components over and over. However, it may also be undesirable with some stateful sequences. For example, if the same URL produces a completely different result before and after a cookie is set.

In those situations, you may use a special option (Import Nocache) which will force LinkScan to flush its cache after each imported document has been validated.

Login Tab

HTTP access to some sites is controlled via authentication schemes requiring Cookies.

LinkScan will automatically accept and return all valid cookies received during the course of a scan. However, to gain access to the site, you may need to configure LinkScan to ensure that the appropriate cookies are set. This may be achieved by one of two techniques:

Auth Tab

The Auth Tab may be used to specify HTTP authentication credentials. Servers that require HTTP Authentication cause web browsers to challenge the user with a popup dialog demanding a username and password.

Note that this is a completely separate mechanism from cookie based schemes that require users to enter their credentials on an HTML based form (see the Login Tab for details).

Owners Tab

By default, LinkScan assigns each document scanned to an Owner based on the top-level directory name. You may use the Spin Button to create Owner names based on the topmost 0-5 directory levels.

Use the Owners tab to specify additional ownership rules. For example:

# The default (automatic) rules would assign all documents under
# products/ and services/ directory to their respective owners.

Owner products/   products
Owner services/   services

# This can be enhanced by adding additional rules such as:

Owner products/consumer/   btoc
Owner products/business/   btob

Advanced Tab

This tab may be used to enter or modify some rarely used commands not otherwise available via the graphical interface. See the LinkScan Quick Reference Card for a complete list.

Notes Tab

Unlike old fashioned configuration files that normally accept comment lines, graphical user interfaces do not generally allow you to make notes (for example, the reasons for a change to the configuration, when it was made and by whom). We miss that nice feature!

Hence you may use the Notes tab to annotate a Project with your own comments. These are saved as part of the Project Plan.

Other Options Tab

Planning a LinkScan Project
LinkScan Version 12.3
© Copyright 1997-2012 Electronic Software Publishing Corporation (Elsop)
LinkScan™ and Elsop™ are Trademarks of Electronic Software Publishing Corporation

 Help   Reference   HowTo   Card