LinkScan for Unix. Reference Manual. Section 2. Essential LinkScan Concepts

LinkScan for Unix. Reference Manual.

Section 2

Essential LinkScan Concepts

This section introduces some important concepts and terms that are used throughout the remainder of this Reference Manual. These are:

LinkScan Projects
LinkScan Owners
LinkScan Usernames
Scanning Methods
Documents and Links
LinkScan Directory and File Structure
LinkScan Configuration Files
Perl Regular Expressions
relative-path and relative-path-expression

2.1 LinkScan Projects

LinkScan is able to scan multiple websites. You may also scan the same website multiple times with different configuration options. In each case, LinkScan creates a unique and corresponding LinkScan Database containing the results of the analysis. Together, the configuration files and database constitute a LinkScan Project.

Users/administrators are required to select a Project when scanning, if multiple projects are defined. And, users must select a Project when viewing the results.

Each LinkScan Project is stored within a subdirectory of the main LinkScan installation directory.

For addition information concerning Projects, how to create them and how to scan them, see Basic Scanning.

2.2 LinkScan Owners

Within each Project, you may also configure multiple LinkScan Owners. Collections of HTML documents and other files are assigned between Owners in a variety of ways:

By the Unix File System ownership attribute
By subdirectories within the website
By pattern matching on directory and file names
By Meta Tags inserted in individual documents

The LinkScan Owner concept enables individual content developers or workgroups to view results that pertain to their documents or areas of responsibility. LinkScan Owners are defined via the LinkScan Configuration Files, discussed below. By default, LinkScan will create and assign Owners as follows:

Owner: All containing all documents within the Project
Owner: toplevel containing all documents in the root directory of the website scanned
One owner for each subdirectory of the root directory, containing all documents in or under that subdirectory

This enables users to browse the results selectively so that the reports are smaller and more relevant to their needs. They're also produced more rapidly.

2.3 LinkScan Usernames

LinkScan incorporates access controls that may be used to limit user access to LinkScan databases and results. These controls are not enabled by default.

When activated, users may be required to login to the LinkScan system used a pre-defined LinkScan Username and associated password. The Username will define the Projects and Owners that an individual user is permitted to access.

Those wishing to enable these access control features should see LinkScan Access Controls.

2.4 Scanning Methods

LinkScan supports three different scanning methods:

Network (HTTP) Scanning, which uses HTTP requests to check links on your site
File System Scanning, which bypasses the network when scanning internal links and reads the documents via direct access to your computers file system
Import Scanning which is used to import lists of documents or links for validation

Network HTTP scanning is generally the best mode to use for sites with a large amount of dynamic content: .jsp, .asp files, etc. The File System Scanning method mode enables tracking of "orphaned" files, files which aren't linked to currently, and is more appropriate for sites with limited dynamic content.

2.5 Documents and Links

The LinkScan software, and this document, both maintain a strong distinction between Documents and Links.

A Link refers to a pointer to any arbitrary file or URL.
A Document refers to a file or URL that contains a number of Links.

Hence an HTML file is a Document containing Links. Dynamically generated web pages, PDF and Flash Files as well as Import Files may also be considered Documents since LinkScan can examine those files for the presence of Links. Images (such as .gif and .jpg files) are not considered documents.

References to sites other than the one being scanned (External Links) are not documents either, since LinkScan does not examine the content of those files for the presence of Links.

2.6 LinkScan Directory and File Structure

The LinkScan system is made up of a number of different file types:

Executable program files
Executable CGI scripts
Configuration files
HTML files (this documentation)
Image files (used by the LinkScan Reports)
Data and control files generated during execution

In a basic LinkScan installation these files are organized within the following directory structure:

linkscan/ Contains all of the executable files including some diagnostics and utilities together with a number of configuration and control files including the linkscan.sys file and the Global Configuration File, linkscan.cfg (discussed below)
- linkscan/docs/ Contains this documentation in HTML format together with a number of image files used by the LinkScan Menus and Reports. You may, optionally, move the contents of this directory to another location on your server if, for example, you do not wish to install the LinkScan directory under "www root"
- linkscan/default/ Contains some additional configuration files including the Project Configuration File, linkscan.cfg.
  - linkscan/default/data/ This directory (and the subdirectories within it) are created during execution and contain the results of the scan; the LinkScan database.
- linkscan/utils/ This directory contains a number of supporting utility programs.
- linkscan/weblint/ This directory contains the weblint HTML syntax checking software.

2.7 LinkScan Configuration Files

LinkScan's operation is controlled by a number of different configuration files. When running LinkScan via the Windows Graphical User Interface, these files are somewhat invisible. However, they still control the execution of the program and you may find it useful to view the raw configuration files from time to time. On Unix systems, these files represent the primary method of configuring LinkScan. All of the files are formatted in plain ASCII text and may be viewed and modified using the editor of your choice (e.g. Windows Notepad, Unix vi, emacs, pico, nedit, et al).

The most important configuration files are:

linkscan.sys: This file (there is only one) resides in the main LinkScan directory. This file contains the basic information concerning LinkScan and your computer. That includes the LinkScan License details and information that controls how LinkScan interfaces with other systems and services on your computer.
linkscan.mas: This file (there is only one) resides in the main LinkScan directory. This file contains a simple list of the available LinkScan Projects.
linkscan.cfg: Multiple copies of this file may reside within a single LinkScan installation. One copy, known as the Global Configuration File, resides in the main LinkScan directory. An additional linkscan.cfg file, known as the Project Configuration File resides within each LinkScan Project subdirectory.

LinkScan always reads the Global Configuration File and the Project Configuration File (in that order). Hence it is important to understand how all of the commands are processed. Each command is defined as either single-valued or multi-valued; see the LinkScan Command Summary. Single-valued commands are overwritten each time they are read, so the last value read is the significant value. Multi-valued commands are cumulative; all are added to the list of values for that command. Note that in some cases, the order in which multi-valued commands are read may impact the manner in which they are subsequently processed (this is noted where appropriate).

This approach provides tremendous flexibility. It means you can establish Global Settings in the Global Configuration File that apply to all Projects. And you may override (single-valued) settings or supplement (multi-valued) settings with additional commands in the Project Configuration File(s); these being Project-specific.

Some additional configuration/control files are discussed elsewhere in this manual. They are used by LinkScan (i.e. do not delete them!) but it is rarely necessary for users to examine or modify them.

All of the configuration files include extensive comments. Comments are signified by the pound sign like this:


# This line contains only a comment

Realcommand = 1   # This comment could describe Realcommand

2.8 Perl Regular Expressions

LinkScan incorporates a vast array of customization features many of which exploit the power of Perl Regular Expressions. For a description of Perl Regular Expressions on Unix systems, see man perlre. HTML versions are available at many locations including:

http://perldoc.perl.org/perlre.html

We also recommend the book Mastering Regular Expressions (a.k.a. the Owl Book) by Jeffrey E.F. Friedl, and published by O'Reilly [ISBN: 1-56592-257-3].

2.9 relative-path and relative-path-expression

We make extensive reference to these terms in the customization sections of this manual and they are introduced here for your convenience.

Let us assume that we are scanning the website:

http://www.example.com/

An individual document within that website might be:

http://www.example.com/products/widget.html

LinkScan will refer to that page using its relative-path, which in this case, is:

products/widget.html

A relative-path-expression is a Perl Regular Expression that matches relative-path. For example, all of the following will match our widget page:


products/widget.html      # Also matches products/widgetXhtml
products/widget\.html$    # Does not match anything else
(|.*/)widget\.html$       # Matches widget.html in any directory

LinkScan for Unix. Reference Manual. Section 2. Essential LinkScan Concepts
LinkScan Version 12.3
© Copyright 1997-2012 Electronic Software Publishing Corporation (Elsop)
LinkScan™ and Elsop™ are Trademarks of Electronic Software Publishing Corporation

Previous Contents Next

Help Reference HowTo Card