LinkScan for Unix -- Common Tasks |
Help Reference HowTo Card |
See:
See:
Delete the LinkScan installation directory and everything within it.
If you receive an Invalid/Corrupt/Expired License Key Error, please mail the exact error message to [email protected].
The Include and Exclude rules take advantage of powerful Perl Regular Expressions. Comprehensive documentation for Perl Regular Expressions is widely available on-line. For example, see: http://perldoc.perl.org/perlre.html. Some simple examples are described below.
You may add various rules to the Project configuration file, linkscan.cfg. Windows users can click Edit and then Advanced to open the appropriate linkscan.cfg file in a Notepad window.
Note that each rule or command must be entered on a new line starting in column one. Note that lines starting with a pound sign ("#") are treated as comments and ignored. The position of the line within the linkscan.cfg file does not matter; we suggest making additions near the top of the file so you can find them again quickly.
To exclude all files in the test/ directory:
http://your.server.name/test/
Add the following command to the linkscan.cfg file:
Exclude test/
Note that you must not include the http://your.server.name/ part of the URL.
To exclude multiple test/ directories such as:
http://your.server.name/products/test/
http://your.server.name/services/test/
Exclude .*/test/
To make the match case insensitive to exclude:
http://your.server.name/products/TEST/
Enter:
Exclude (?i).*/test/
The dollar sign may be used to indicate an end of string. For example:
Exclude (?i).*\.htm$
Will exclude all .htm files but not .html files.
Note that certain characters have special meanings within the context of a Perl Regular Expression. Therefore, if you wish to refer to one of literal characters, you must escape the character by preceding it a backslash character. The special reserved characters include:
. * + ? $ | \ ( ) { } [ ]
Hence the correct way to exclude a URL containing a query string is:
Exclude foo.asp\?Value=1
In addition to the basic Exclude rule, LinkScan supports some powerful variations:
The Nofollow rule.
Nofollow test/
The Exclude test/ rule instructs LinkScan to completely ignore all links that point to the test/ directory. In contrast, the Nofollow test/ rule instructs LinkScan to validate any links that point to the test/ directory. It will however, ignore all of the links contained within the files under the test/ directory. In other words, it will validate the links leading into the test/ directory and ignore the links leading out of the test/ directory.
Other variations include:
Onlyinclude test/ Onlyfollow test/
The Onlyinclude rule tells LinkScan to completely ignore all links that do not point at the test/ directory. The Onlyfollow rule tells LinkScan to validate every link leading out of the test/ directory but not to follow those links. Hence the rule Onlyfollow test/ is a simple and effective way to completely validate every document in the test/ directory and ignore the rest of the site.
Each link validated by LinkScan is assigned a specific LinkScan Error or Status Code. And, every Status Code is associated with a Severity. You may customize the Severity associated with any Status Code by using the Statuscode command. The command syntax is:
Statuscode statuscode, severitycode
The following Severity codes are valid:
Symbol | Code | Severity | Explanation |
0 | Unknown: | LinkScan has not tested or was unable to test this link | |
1 | Error: | LinkScan found a hard error on this link | |
2 | Possible Error: | There may be a problem with this link. It should be retested at a later time | |
3 | Warning: | LinkScan found something unusual about this link. Manual inspection highly recommended | |
4 | Advisory: | This link is probably ok, but manual inspection recommended | |
5 | No Error: | This is a good link |
Examples: Statuscode = 301,3 # 301 (Moved Permanently) from Error to Warning Statuscode = 7,4 # 7 (Orphaned HTML File) to Advisory Statuscode = 8,4 # 8 (Orphaned non-HTML File) to Advisory
The above commands will downgrade all 301 status codes from Errors to Warnings, and all Orphaned Files from Warnings to Advisories.
Your Project configuration file, linkscan.cfg should look something like this:
Homedir = /usr/www/htdocs/ Homeurl = http://www.example.com/ Mirrorurl = Homefile = index.html [...] Http = 0 [...] Htmlfiles = html, shtml, htm Mapfiles = map Pdffiles = Flashfiles = swf Defaultpages = index.html, index.shtml, index.htm, home.html, home.shtml, home.htm Indexoptions = 0
Note the following points:
Homedir: must point at the root directory for the site. Links of the form <A HREF="/"> will be mapped to this directory.
Homeurl: must be supplied -- even if you use something completely fictional such as http://anything/. Note however, that the Homeurl parameter is used to localize any absolute links. Hence a link to http://anything/foo/ will be localized to foo/.
Homefile: must be relative to Homedir (File System view) and Homeurl (HTTP view).
Http: must be zero (off).
Htmlfiles: files with these extensions are parsed as HTML.
Defaultpages: links that point at a directory are processed by searching for a file matching these names (in the order specified).
Indexoptions: If enabled, and there are no matches on Defaultpages, LinkScan will create a synthetic page containing links to every file in the directory.
Aliases: If your file system is laid out with Server Aliases, you will need to tell LinkScan about these. See File System Scanning and Orphaned Files for some examples.
See Import Scanning.
Many websites include some form of access control or user authentication features. In general, these arrangements use one of two mechanisms defined by the HTTP protocols. Both are supported by LinkScan. They are:
In the case of HTTP Authentication, when a user attempts to access a protected area, their browser will present a challenge in the form of a pop-up dialog box that requires a username and password to be entered. In the case of cookie-based arrangements, the user is normally required to login by filling out an HTML form and submitting it.
For sites that require HTTP Authentication, you must configure LinkScan with an appropriate Auth command:
Syntax: Auth server-name "realm-name" username password Examples: Auth www.example.com "" guestuser xxxxxx Auth app.example.com "Controlled Access" guestuser xxxxxx
You must include a realm-name (enclosed in double-quotes) but it may be empty. In that case, LinkScan will use the configured username and password for any realm on the target server. This is the recommended approach unless your server uses multiple realms with different access control rules for different portions of the website.
HTTP access to some sites is controlled via authentication schemes requiring Cookies.
LinkScan will automatically accept and return all valid cookies received during the course of a scan. However, to gain access to the site, you may need to configure LinkScan to ensure that the appropriate cookies are set. This may be achieved by one of two techniques:
The submissions of a login form may be configured using the Extrahome command (described in the next section). However, you may optionally initialize LinkScan's collection of stored cookies (aka Cookie Jar) with one or more permanent Cookies by using the Cookie command:
Syntax: Cookie server-name cookiename=cookievalue Example: Cookie www.elsop.com LinkScan=cookie_value; Note: Do not enter space characters around the '=' character
The server-name is the name of the server to be tested. For security reasons and in compliance with the applicable standards, LinkScan will only send the cookie when the specified server-name exactly matches the hostname portion of the requested URL. In this context, server names and their corresponding IP addresses are considered to be different (consistent with all major browsers). The cookie names and values must be reverse engineered from your server code or "discovered" via your browser by enabling the "Prompt before accepting cookies" or examination of stored cookies on disk.
Hint 1: Sites with especially complex schemes (multiple levels of access control, subscription expirations etc.) might consider configuring their server and/or scripts to recognize a "super-user-cookie" specifically for testing purposes. This approach may also be used to trigger test points within server-based scripts and greatly improve the meaningful testability of complex dynamic content.
Hint 2: HTTP Authentication and Cookie related transactions are logged by LinkScan during the course of the scan. You may examine the following file to view the log: .../LinkScan/Projectname/data/linkscan.red
LinkScan may be configured to submit a form using either the GET or POST methods. Pages that require the GET method are specified with a normal URL and query string. Pages that require the POST method are specified in a similar manner except that the query character (?) is replaced with a double-query (??).
Syntax: Extrahome relative-path-expression Example: Extrahome login.jsp??Name=Malcolm%20Hoar&Password=secret
Hint 1: Use the LinkScan Recorder to automatically capture the correctly constructed URL's.
Hint 2: When using the Extrahome command to submit a login form to provide access to a site, you may also need to configure LinkScan so that it doesn't immediately "click" any LOGOUT button which would invalidate the newly created session. For example:
Extrahome login.jsp??Name=Malcolm%20Hoar&Password=secret Exclude .*logout.jsp
First, you must configure the LinkScan to Email Interface. Once this has been completed:
To send Email LinkScan Reports via a browser interface, edit the file linkscan.sys and change the following setting:
Mailto = 1
A simple mailto form will be added to the foot of each LinkScan report that you access from a browser.
To create and Email a LinkScan Report from the command line or a shell/batch script, see: Creating Reports from the Command Line.
To create and Email LinkScan Reports to individual Owners see LinkScan Dispatch. Note that LinkScan Dispatch is not available with LinkScan Workstation.
See:
LinkScan was designed from the outset to be a highly open system. Hence it is a straightforward matter to export portions of the LinkScan database into other database management systems for further analysis.
For many users, the simplest method of achieving this is via LinkScan Excel. Once a table of data has been imported into a LinkScan Excel spreadsheet, the data can easily be pushed into another relational database management system (RDBMS) such as Microsoft Access, Microsoft SQL Server or Oracle.
Others may wish to access the LinkScan database structures directly via their own program code. It is a relatively simple programming task to extract the required data using most programming languages including Perl, C, C++, Java or Visual Basic. Those users will wish to study a brief description of the LinkScan File Formats. Note that small changes in the file formats may arise if and when you install new versions of LinkScan. Such changes are generally minor and infrequent.
Activate the LinkScan Profiler.
LinkScan for Unix -- Common Tasks
LinkScan Version 12.4
© Copyright 1997-2013
Electronic Software Publishing Corporation (Elsop)
LinkScan and Elsop are Trademarks of Electronic Software Publishing Corporation
Help Reference HowTo Card |