Version History
New in LinkScan 12.4
LinkScan 12.4 is a minor maintenance release.
We have incorporated a number of minor enhancements, bug fixes and performance improvements.
New in LinkScan 12.3
LinkScan 12.3 is a significant enhancement release.
We have removed all references to a deprecated Perl library (flush.pl).
We have added full support for IPv6.
We have incorporated a number of other minor enhancements, bug fixes and performance improvements.
New in LinkScan 12.2
LinkScan 12.2 is a consolidation of several minor bug fixes and enhancements.
We have corrected some compiler issues with the Windows GUI.
We have improved link extraction from text files.
We have addressed a cross site scripting vulnerability.
We have addressed expiration issues with a LinkScan cookie.
We have fixed a bug in TapMap.
We have improved link extraction from PDF files.
New in LinkScan 12.1
LinkScan 12.1 is a significant maintenance release that corrects several small errors and refines a number of existing features.
LinkScan 12.1 has been fully tested on Microsoft Windows 7, including 64-bit versions of Windows 7.
We have provided a brand new installer for Windows systems that is faster, cleaner, and more efficient.
We have fixed several minor problems with the HTML and JavaScript parser and implemented several other improvements as well.
We have incorporated a number of other minor enhancements, bug fixes and performance improvements.
New in LinkScan 12.0
LinkScan 12.0 is a significant maintenance release that corrects several small errors and refines a number of existing features.
We have provided the option to use an external link extractor on FLash (SWF) files. To use this you must first obtain a copy of the Adobe Search Engine SDK via http://www.adobe.com/licensing/developer/search/faq/.
Simply copy the Adobe "swf2html" executable to the LinkScan installation folder.
Link extraction from from Flash files represents a significant challenge. The "swf2html.exe" program created by Macromedia/Adobe probably represents the very best option available anywhere. Once installed, LinkScan will route all Flash files to this program and then process all of the hyperlinks that it is able to identify.
We have made several improvements to the JavaScript link extraction.
We have added several improvements to the handing of encoded characters including UTF-8.
We have improved the accurancy of the page weight computations.
We have fixed a compatibility problem with Net::SSLeay that arises on some UNIX systems.
We have incorporated a number of other minor enhancements, bug fixes and performance improvements.
New in LinkScan 11.7
We have introduced a new licensing option: LinkScan Unlimited. This is a license to scan an unlimited number of unique web pages (documents) on any number of physical computers that are owned or leased by you. See Ordering Information.
We have tested LinkScan 11.7 with Windows Vista.
We have made several significant improvements to the PDF file parser (link extractor). Customers who scan significant numbers of PDF documents are strongly encouraged to install this new release.
We have enhanced the RelaxAnchor command to make the checking of named anchors a little more relaxed, consistent with the latest browsers.
We have enhanced the Excludehidden option to ignore <link ...> tags. The was done by popular demand because several common authoring tools including Microsoft Office tend to insert invalid, albeit harmless, link tags in the documents they create.
We have enhanced LinkScan to handle <image...> tags exactly like <img...> tags.
We have incorporated a number of other minor enhancements, bug fixes and performance improvements.
New in LinkScan 11.6
We have added an option to exclude (ignore) "hidden" links. That is, links with an empty anchor such as:
<A HREF="link.html"></A>
On UNIX systems this may be activated by adding the Excludehidden directive to the linkscan.cfg file.
On Windows systems this may be activated via a checkbox on the Scope Tab of the Project Planning Property Sheet.
This avoids false errors with links that have been temporarily hidden with null anchors.
We have added to option that enables users to scan only the first "N" pages of a website.
On UNIX systems this may be activated by adding the Maxdocs directive to the linkscan.cfg file.
On Windows systems this may be activated via the Max Docs control on the Scope Tab of the Project Planning Property Sheet.
This option helps LinkScan users to more quickly debug or fine tune new LinkScan configurations and test scanarios.
We have enhanced LinkScan with a powerful new parser or link extractor. Previously, LinkScan was able to extract links from documents of the following types:
- HTML documents
- JavaScript files
- Shockwave/Flash files
- PDF documents
- ASCII text files
- Microsoft Office documents
The new parser will allow link extraction from additional file types although it has been designed and implemented principally for XML files.
The new parser means that LinkScan can now be used to quickly and accurately extract links from XML and similarly formatted data files. See XML Documents.
An existing LinkScan feature (Collectmeta) will cause all HTML META tags to be saved to an ASCII file for subsequent analysis by the user. The new command:
Xmeta <metadata[^>]*>(.*)</metadata>
will cause the contents of any METADATA tag to be included in that file.
We have made other small improvements and enhancements to SSL Proxy support, PDF document parsing, LinkScan SiteMaps, LinkScan Dispatch, and the Google SiteMap feature.
We have incorporated a number of other minor enhancements, bug fixes and performance improvements.
New in LinkScan 11.5
We have enhanced LinkScan to automatically create a XML Sitemap file in a format suitable for submission to Google Sitemaps. For more background, see Google Webmaster Help Center.
More details of this new feature are described in the Google Sitemaps Application Note.
We have added a percent completion display to the title bar of the Windows interface when a scan is in progress. When the window is minimized, the percentage is shown in the Windows Task Bar.
We have implemented some improvements to the handling of bad characters in URL's.
We have made an addition to the Diagnostic Trace. When a URL is dissected and the hostname resolved, the IP address is logged. This has proven useful in investigating problems associated with round-robin DNS environments.
We have enhanced the LinkScan Pinger with several new options including the ability to send more succinct e-mail notifications (especially useful for sending text message alarms to cellphones).
We have enhanced the LinkScan checking of Fragments and Anchors. First, <DIV ID="string"> tags are recognized exactly like <A NAME="string"> tags. Second, a new option (Relaxanchor = 1) will make the anchor checks less strict. Although this in not in accordance with the HTML standards, it is consistent with most modern browsers. Specifically, with Relaxanchor enabled, the Fragment/Anchor check is made case insensitive, and superflous '#' characters are ignored.
We have fixed a (rare) problem with the LinkScan Profiler.
We have made several small fixes and enhancements to LinkScan Dispatch.
We have incorporated a number of other minor enhancements, bug fixes and performance improvements.
New in LinkScan 11.4
We have made several improvements to the processing of JavaScript constructs in complex documents. This results in improved test coverage and accuracy on websites that make extensive use of JavaScript.
We have added the Substitutescript command which allows users to perform complex transformations on certain JavaScript and Dynamic HTML constructs. These transformations may be used by more advanced users to more effectively test functions invoked by complex JavaScript/DHTML function calls.
We have added the new Ownerq command. This new option gives users even more flexibility and control over the ability to assign specific areas of web site content to specific Owners (content developers).
We have improved some error checking and reporting functions to better detect and explain certain configuration or environmental errors and anomalies.
We have added a new Autoencspace option. This will cause LinkScan to automatically compensate for certain HTML/HTTP errors that result when content developers fail to properly encode certain characters in a URL. More commonly this arises when authors fail to write space characters as "%20".
By default, LinkScan reports a 911 Unsafe Character Error when it encounters links containing improperly encoded characters. With the Autoencspace option, LinkScan will automatically perform the encoding for you, mirroring the behavior of Microsoft Internet Explorer. We do not recommend the use of this option (since it masks real errors in the HTML documents) but it has been provided in response to user requests.
We have incorporated a number of other minor enhancements, bug fixes and performance improvements.
New in LinkScan 11.3
We have improved several reports, especially the Search Links Report and the sort options on same.
We have made several small enhancements to the LinkScan Orphaned File detection.
We have made several enhancements to the LinkScan SiteMap.
We have improved the handling/reporting of certain (rare) link redirection scenarios.
We have improved the speed and accuracy with which LinkScan validates FTP links.
We have improved the processing of JavaScript code to maximize link extraction and minimize false matching on complex structures.
We have incorporated a number of other minor enhancements, bug fixes and performance improvements.
New in LinkScan 11.2
We have made significant enhancements to the LinkScan user interface on Windows systems. The sorted order of the main Project List is now saved when exiting LinkScan and restored the next time the program is launched.
We have improved the integrated LinkScan web browser on Windows systems. The loading and rendering of pages and updating of the Address Bar operates more smoothly. JavaScript error dialogs are suppressed (where possible). New options have been added to the menus including Open, Save As, Print, Page Setup, Copy, Paste, Find In Page, Increase/Decrease Font Size, View Source and Internet Options. In addition, Control-C and Control-V keyboard accelerators may be used within web pages and forms. Support for the Internet Favorites has also been enhanced.
We have made numerous enhancements to the low-level link checking methodologies. These include improved timeout-retry algorithms, additional status codes, more detailed information concerning DNS lookup, timeout, connect and other networking errors as well as improvements to the reporting of multiple redirection problems.
We have added support for Multi-Part Form Submissions using the POST method. This mechanism is typically used when uploading data files from a client to a server. See How To Submit Forms.
The maximum length of a normal URL remains at 4096 bytes (or thereabouts, due to encoding effects). However, we have eliminated all arbitrary size restrictions on special URLs using the "??" and "???" conventions indicating FORM submissions using the POST method.
We have enhanced the LinkScan SiteMap and TapMap Reports. Each node of the Map includes a counter indicating the total number of child nodes below the current node.
We have added a new Maxdocbyte option to control the maximum size of document body that will be downloaded. This can save considerable time when checking large numbers of PDF documents over relatively slow network connections.
We have adjusted the algorithm used to extract TITLE tags from a document. It now triggers on the first set of tags versus the last. This is more consistent with the majority of common web browsers.
We have reorganized the Search Links Report and included significant performance enhancements.
We have improved the options for adding custom headers and footers to the LinkScan reports.
We have added more Orphaned File information to the Project Summary Reports.
We have improved some of the internal diagnostic tools in order that Elsop's engineers may better support users.
We have incorporated a number of other minor enhancements, bug fixes and performance improvements.
New in LinkScan 11.1
We have introduced the LinkScan Pinger: a small self-contained utility that may be used to periodically check a list of URL's and raise e-mail alarms if certain error conditions arise. See: LinkScan Pinger.
We have enhanced and improved the layout of the directory-order SiteMap to improve the visualization of the website structure.
We have made several adjustments to the LinkScan general purpose Text File Parser. In general LinkScan will extract more hyperlinks from text files, Microsoft Office documents and similar file types with fewer false matches.
We have enhanced LinkScan with the ability to record the timing for each HTTP transaction. This means LinkScan may be used in performance related studies. The transaction times are logged to a simple tab-delimited ASCII file which may easily be imported directly into Microsoft Excel (or other tools) for further analysis.
It is very simple to move this into Excel with:
Data | Get External Data | Import Text File
See description of linkscan.tim in LinkScan File Formats.
We have added support for the Real Time Streaming Protocol (RTSP). The software will:
- Check http://... links to .rm files
- Extract the rtsp://... and pnm://... links from those .rm files
- Validate the rtsp://... and pnm://... links
Users upgrading from LinkScan 11.0 or earlier should add the following directive to their linkscan.cfg file:
Mimetypes audio/x-pn-realaudio T # Default at 11.1
We have added support for <NOINDEX> tags.
If the Project configuration contains the directive Noindex = 1 then any links contained within an HTML <NOINDEX></NOINDEX> block are ignored, unless the link refers to a new URL (i.e. one that has not thus far been "seen" by LinkScan).
The <NOINDEX> tag is supported by various search engines and is typically used to prevent the indexing of document fragments that are used repeatedly (e.g. site navigation menus/tools). Excluding these regions from LinkScan and search engine indexes helps users and authors focus their attention on the most critical content.
We have significantly improved support for Japanese character sets. When scanning sites that contain (in whole or in part) Japanese pages, include the following directives in the Project configuration file (on Windows systems, via the Advanced Tab of the Project Planning Property Sheet):
Jisencode = 1 Displaylang = EUC-JP
Pages containing JIS, Shift-JIS and/or EUC-JP encoded Japanese characters will be normalized to EUC-JP. This means, for example, that the TITLE tags extracted from different documents may be combined in a single summary document (e.g. the LinkScan SiteMap) even though the original pages were constructed with different encodings.
The encoding type of each document is stored in the LinkScan database together with the MIME type (Content-Type). The Search Documents Report may be used to search/display this data and help enforce consistent encoding standards across mixed language sites.
We have added an option that will permit LinkScan to test web servers that require proprietary Microsoft NTLM Authentication.
LinkScan includes native support for HTTP Basic Authentication. However, some Intranet environments utilize the proprietary and undocumented Microsoft NTLM protocol to authenticate users. We have added the ability to scan such sites.
- Add the directive Winhttp = 1 to the Project configuration on the Advanced Tab of the Project Planning Property Sheet.
- Using the integrated LinkScan web browser or a copy of Microsoft Internet Explorer, access the target site and authenticate prior to initiating a scan.
Note: This may result in other minor artifacts in the results of the scan since LinkScan will use the Microsoft Windows implementation of the HTTP protocol versus the (stricter) native LinkScan implementation.
We have made significant performance improvements to the LinkScan Profiler. As well as running generally much faster we have eliminated some pathologically poor performance on certain (rare) types of documents.
We have incorporated workarounds to some platform-specific Perl problems that (rarely) lead to fatal errors:
- HP/UX with certain Perl 5.003 Builds
- Solaris 9 (64-bit) with Perl 5.6.1
We have improved the formatting of the System Configuration Report, Cookie Log/Diagnostic Trace to improve usability.
New in LinkScan 11.0
LinkScan 11.0 is a major new release built upon a new internal database engine. This results in dramatically faster reports, especially on larger websites.
In comparative tests, the time required to select, sort and display most of the commonly used reports is significantly reduced. On small websites (say 500 documents) the reports are displayed in approximately half the time. On large websites (say 40,000 documents) the reports are displayed approximately 10 times faster.
Despite the use of some new binary indexing files, all of the raw data is still available to other applications via simple ASCII text files. See LinkScan File Formats. We have also conducted tests to ensure it is a simple matter to load some of these tables into Relational Database Management Systems such as MySQL and SQL Server.
We have incorporated new options for HTML Syntax Checking. LinkScan/QuickCheck continues to offer seamless integration with the Weblint program. But now integration with other programs is also possible. In particular, QuickCheck integrates with OpenSP or Jim Clark's SP program and this means users may perform a full SGML validation against a specific Document Type Definition (DTD). The LinkScan distribution includes a small sample of the most common DTD's and, on Windows systems, a copy of the OpenSP program. Unix users will need to download the OpenSP sources and compile them but this is extremely simple and straightforward. See LinkScan QuickCheck.
We have enhanced the Search Documents Report with the ability to display documents that use (or do not use) specific tag types (e.g. APPLET, FORM, META, SCRIPT, etc).
The default Owner *1 for automatically assigning documents to Owners based on the top-level directory name has been generalized to operate on multiple levels if required. For example, Owner *2 will cause the link http://www.example.com/first/second/third/index.html to be assigned to Owner first_second. On Windows systems, this may be selected via a spin button on the Owners Tab of the Project Planning Property Sheet.
An existing feature provides for the optional display of a form at the foot of each report. This form permits users to e-mail a copy of the current report to a specific address. We have added an optional Comments box so than annotations may be included in the header of the e-mail message. To enable the comments box, set Mailto=2 in linkscan.sys.
We have discovered that tags of the form:
<A HREF="?Something">
Tend to cause wildly erratic results. Different web browsers resolve such links relative to different bases. In our view, the use of such constructs is extremely unsafe. Hence tags of this form (with a leading query character) are flagged with a 911 Unsafe Character Error.
We have included a new Maxredir command which enables users to control the maximum number of HTTP redirections LinkScan will follow when fetching a given URL. The default value of 5 is unchanged and appropriate for the vast majority of users. But those that need to customize that behavior will now have that option.
We have added a new Retry External option. When enabled, LinkScan will track all External links that appear to fail due to network related errors (e.g. DNS, connect and timeout errors). These links will be retested at the end of the scan. This tends to reduce the number of transient errors reported but the scan may require a little more time to complete. The feature may be activated via the Other Tab of the Project Planning Property Sheet on Windows systems, or by setting Retryext=1 in linkscan.cfg.
The behavior of the Reload/Refresh button on the integrated Web Browser has been improved to ensure that locally cached copies of the page are not used.
We have incorporated a number of other minor enhancements, bug fixes and performance improvements.
New in LinkScan 10.0
LinkScan 10.0 comes equipped with a brand new and highly functional Graphical User Interface on Windows systems. See Screenshot.
We have increased the maximum length of a URL from 1024 to 4096 characters.
We have enhanced LinkScan with support for additional file types. In addition to the existing interpreters (HTML, JavaScript, PDF and Shockwave/Flash) we have added a new, general purpose TEXT interpreter. This will seek to extract plain text URL's (without any HTML markup) from simple ASCII files. However, it is also highly effective for finding and validating hyperlinks in many other file types including Microsoft Office documents (.doc, .xls, .ppt files) and .url files as used in the Microsoft Internet Explorer Favorites folder.
Use the Textfiles command to specify which file types should be routed through the TEXT parser when scanning via the File System. Use the Mimetypes command to route documents to the TEXT parser when using HTTP scanning. For example:
Textfiles txt, doc, xsl, ppt, url Mimetypes application/msword T
On Windows systems these features are available via the Mimes and Files tabs of the Project Planning Property Sheet.
The Critical Errors, Detailed Errors and Selected Errors Reports have all been enhanced with a new First Reference Only option. When selected, LinkScan will only display one example reference to each broken/suspect link.
We have enhanced the System Parameters Report with an option to display the contents of the linkscan.red file. This file contains an audit trail of each cookie encountered during the course of the scan. Optionally, it may contain a full diagnostic trace of all the HTTP request and response headers (enabled with Probe = 1).
The LinkScan Profiler has been enhanced with a new $nearish. The original $near operators looks for a proximity match with no more than two "tokens". The new $nearish operator is more general, looking for a proximity of no more than five "tokens". In general, a "token" approximates to a single word but the actual implementation is rather more complex since the matching algorithms seek to discount a certain amount of intervening HTML markup.
We have added the Qhttp and Qnow settings to linkscan.sys. These will force LinkScan QuickCheck to use HTTP Access (versus file system access) and Realtime link checking (versus database).
New in LinkScan 9.0
We have added support for the Wireless Application Protocol (WAP) and Wireless Markup Language (WML). This allows LinkScan to validate wireless sites via an HTTP gateway. Typically, you will need to add some configuration commands to linkscan.cfg. For example:
Extraheader User-Agent: Nokia7110/1.0 (04.80) Mimetypes text/vnd.wap.wml H
This will cause LinkScan to send an appropriate User-Agent header with each request and to parse/follow documents with a MIME/Content-Type of text/vnd.wap.wml.
We have added a new method for controlling the depth of a scan. The new Maxclicks command complements the existing Maxlevels command.
Whereas Maxlevels controls the depth of the scan based on an examination of the URL and the number of directory levels within it, the new Maxclicks command controls the depth of the scan based on the number of clicks required to reach the link from the starting (home) page.
The click level is normally incremented each time LinkScan follows a link. However, in order to more closely resemble real-world scenarios, the click level is not incremented when following links of this type:
- HTTP 301/302 redirects
- META Refresh redirects
- FRAME SRC links
Hence you may control the depth of a scan based on Maxclicks, Maxlevels or a combination of both.
A number of webmasters have told us about a new and increasing problem with their external links. Users are finding that working (200 OK) links are suddenly pointing at pages with "inappropriate" (e.g. adult) content. This has become quite an issue with large numbers of domains changing hands or, in some cases, being hijacked through exploits in the Internet Domain Name System (DNS). We have experienced the problem ourselves.
We have, therefore, implemented a range of special profiling techniques that may be used to automate the detection of these situations without the need to manually inspect each link on a periodic basis. The profiling options include user written profiles, pre-configured profiles available on request, and integration with third party content filtering products and services such as firewalls and proxies. See the LinkScan Profiler for details. [Not available in LinkScan Workstation]
We have incorporated a new Problem Documents Report. This report provides a summary of documents which:
- Contain at least one broken link
- Have missing Title tags
- Exceed a specified page weight
- Exceed a specified depth
- Exceed a specified age
- Exceed a specified size
We have greatly enhanced LinkScan Dispatch which now includes options to create and/or e-mail a range of different reports. LinkScan Dispatch supports a completely new series of command-line switches. However, for existing users, backwards compatibility with the pre-9.0 options has been preserved. See LinkScan Dispatch.
To improve ease of use, we have renamed and reorganized some reports and provided more context-sensitive help.
We have made numerous other small changes and enhancements to the LinkScan reports. We highly recommend that existing users who use the command line reporting update their linkscan.rep file(s) based on the new template.
We have enhanced LinkScan to save and store the MIME/Content-Type associated with each internal link. These data are available via the Search Documents and Changed Documents Reports.
We have enhanced the Windows Graphical User Interface to provide more control over the "scope" of a scan based on the Onlyinclude and Onlyfollow commands. See screenshot.
We have added several new Status Codes. Errors generated via the Errordoc (redirect match) command are displayed with the 3000 Status Code to differentiate them from regular 404's. Similarly, errors generated via the Errorbody (body match) command are displayed with the 3001 Status Code.
The 3002 Status Code is used by the new LinkScan Profiler described above.
We have added the Excludecookie command to filter/reject specific cookies.
We have added the Proxymatch command to provide more flexibility for those with complex network environments that require the use of different proxy servers for different hosts/domains.
New in LinkScan 8.2
In LinkScan 8.2 we have consolidated several minor bug fixes and a large number of customer generated suggestions for improvements and enhancements. We thank all of those users who contributed suggestions. Some of the highlights include:
We have added a new Changed Document Report. This allows users to compare the summary data from two different scans of the same website/project. The report displays lists of new documents added, documents removed and documents changed. Document changes are detected based on one or more of the following data items: document size in bytes, document title, document date/time modified (if available) and/or additional user specified data collected from META tags as described below. Benefits include:
- Enhanced management information.
- Work flow management -- do the changes correlate with the approved Change Requests.
- Quality Assurance -- the report provides the data necessary for Regression Testing.
We have added an option which, when enabled, will allow users viewing any LinkScan Report to send a copy of that report to a specified e-mail address (in HTML or TEXT format). This improves work flow; for example, a supervisor viewing a report of bad link(s) may rapidly mail it to someone else for action.
We have added two new reporting capabilities with forms -- Search Documents and Search Links. These may be used to perform arbitrary ad-hoc queries on the LinkScan Database with a flexible array of sort/select/display options. For example, one might use such a query to produce a report listing every document that contains one or more <FORM> tags.
This reporting capability permits very arbitrary queries on the database. It makes virtually the entire database searchable.
We have added a new control (Maxlevels) that may be used to more easily configure limits on the depth of a scan. This provides a fast and easy way to configure limits on the depth of a scan.
We have added the ability to collect additional user specified data from each document scanned. Typically this is used to extract document attributes from META tags although the feature is not limited to META data. The data may also be manipulated via Perl Regular Expressions prior to storage in the LinkScan database (e.g. to normalize formatting). The collected data may also be post-processed by external programs to carry out more complex transformations.
User data collected could include the name of a person responsible for a document or an expiration date by which a document must be reviewed or updated. This feature enables the user to integrate LinkScan with their work flow tools and procedures.
We have noticed that a significant proportion of web pages include vast amounts of totally redundant, bandwidth-consuming whitespace. In our view, many website operators have an opportunity to improve page load times and reduce their bandwidth cost. We have, therefore, enhanced LinkScan to report a summary of the Whitespace-Bytes versus Total-Bytes consumed during the course of a scan.
We have added an summary of inline image data to the LinkScan QuickCheck reports. This report now displays just about everything that LinkScan knows about a given document.
We have introduced an option (Mapext) to include external links on the LinkScan SiteMap and TapMap.
We have made several small but significant adjustments to the low-level HTTP and HTTPS drivers for improved accuracy and greater performance. In particular, we have incorporated some improved timeout/retry algorithms to enhance accuracy and throughput on slower links. The handling of DNS timeouts has also been improved.
We have incorporated several improvements to the HTML and JavaScript parsers. These should benefit all users but the enhancements are especially significant on sites using IBM/Lotus Domino.
We have rewritten the Portable Document Format (PDF) drivers for improved accuracy and performance and to better handle the latest versions of the PDF file formats.
We have enhanced our MailVet technology to improve the speed and accuracy of the LinkScan active mailto: checking.
We have improved the speed at which all of the LinkScan reports are generated.
New in LinkScan 8.1
At LinkScan 8.1 we have consolidated several minor bug fixes and a large number of customer generated suggestions for improvements and enhancements. Although each individual change is relatively minor in scope, the aggregate of them all represents a significant improvement to the product. We thank all of those users who contributed suggestions and urge customers to install this greatly improved release at the earliest opportunity. In total, we have have made approximately 60 changes and enhancements. Some of the highlights include:
Several enhancements to the LinkScan Reports for improved management of user preferences and system security, additional/improved cross-linking between various reports, and a number of improvements to the report layouts.
A number of new error checks and improved error messages.
Various improvements to the LinkScan Webserver.
Numerous improvements to LinkScan Dispatch including:
- Ability to customize the e-mail headers (e.g. for Content-Type)
- Improved interface to sendmail
- Much improved sendmail emulator for Windows users
- Options to control the sort order of the Dispatch reports
Various enhancements to our MailVet technology to improve the speed and accuracy of the active mailto link checking.
Various enhancements to LinkScan Excel -- including an option to import all META tags. Note: To use this feature, a scan must be completed with the Collectmeta option in linkscan.cfg enabled.
CPU times as well as wall ckock times are recorded for each scan, in the file linkscan.dbg.
Somewhat simplified configuration of Orphaned Files checking.
Added ability to direct documents with specific MIME (Content-Type) headers to an appropriate interpreter (HTML, PDF, Shockwave/Flash and JavaScript options currently supported). For example, to check the contents of included JavaScript files use:
Mimetypes application/x-javascript J
Added ability to insert synthetic links into selected documents on-the-fly, for controlling test coverage on complex dynamic content.
Various corrections, clarifications and improvements to the LinkScan Documentation.
New in LinkScan 8.0
We have made very substantial internal changes to improve the performance, scalability and reliability of LinkScan. These changes should result in significant storage savings with a (typical) 50 percent reduction in database size. Some of the changes establish new foundations on which other enhancements will be built over the coming months and years.
We have significantly enhanced the Windows Graphical User Interface.
On Unix Systems we have added a direct interface to the OpenSSL package for scanning sites that use the Secure Sockets Layer (SSL) or https://... protocol.
We have substantially restructured and rewritten the LinkScan documentation.
We have enhanced several of the LinkScan Reports.
We have introduced the first release of LinkScan Excel.
We have added several new options/commands that may be used to optimize performance when scanning very large (100,000 and more documents) websites.
We have included the new Noforms command. When enabled, this will prevent LinkScan from testing links found in <FORM ACTION=...> tags. Attempting to test those links without submitting some associated data values may lead to 500 Server Errors on many sites. In general, this indicates inadequate error checking and recovery in the target scripts but we have nevertheless provided an option to avoid to such errors cluttering the reports.
We have included a detailed audit trail of all cookie transactions processed during a scan. The log is maintained in the file ...linkscan/project/data/linkscan.red.
We have made the list of unsafe characters a user configurable option. This means, for example, that users may control whether or not the use of a backslash character in URLs will or will not generate a 911 Unsafe Character warning. Note that the use of a backslash instead of a forward slash is indeed unsafe but some sites use it anyway.
New in LinkScan 7.4
The LinkScan Recorder is a Windows application that interfaces with Microsoft Internet Explorer. It may be used to capture real web browsing sessions, such as a complex order entry sequence. The captured recording includes all of the data entered into any associated forms. LinkScan may then be configured to replay the recording on demand, validating every link on each form and results page in the sequence.
We have greatly enhanced the LinkScan Import feature which now includes two separate functions:
Import Links: May be used to validate a simple list of URL's that is derived from some external source such as an SQL database or spreadsheet export.
Import Documents: May be used to validate a list of documents, including all of the links within each document. Such sequences may be generated with the LinkScan Recorder or derived from some other source. See the LinkScan Import Function
.
We have enhanced LinkScan to parse, and extract any hyperlinks embedded in ShockWave/Flash files.
We have enhanced LinkScan with the ability to add customized hyperlinks at various points throughout the reports. This provides a flexible means to integrate the LinkScan Reports with other applications. For example, these links may be configured to activate functions within a content management or other database management system.
Some web servers are configured in a manner that may mask serious errors from end users and link checkers alike. This typically arises when the server responds to an invalid request by delivering a user-friendly error page with a 200 OK status code rather than a 404 Not Found. In some cases, the server will issue a redirect to a custom error document such as:
http://www.example.com/notfound.html
In other cases, server-side application code will simply deliver a valid document that contains a description of the error or exception.
We have enhanced LinkScan with directives that may be used to force a 404 Not Found Error in either of these situations. For example:
- Errordoc = notfound.html
- Errorbody (?i).*<title>Server\s+Error</title>
In the former case, any links that result in a redirection to the URL "/notfound.html" will be reported as 404.
In the latter case, any links that return a document body with content matching the specified expression will be reported as 404.
We have enhanced the link status information displayed on the LinkScan Reports. The LinkScan database now includes an additional extended status information field which is used to display supplementary information about certain link types.
We have incorporated additional locking protections such that multiple Projects may safely be scanned simultaneously. Note that any attempt to scan a Project that is currently being scanned by another user/process, will be refused.
However, we do urge some caution. Scanning multiple Projects in parallel may consume significant processor, memory and/or network resources. If the available system resources are saturated, the overall impact on LinkScan's throughput may prove negative. Users should be prepared to monitor system resources using the available tools applicable to the operating system and make adjustments if necessary.
New in LinkScan 7.3
We have enhanced LinkScan for Windows (not Unix) to automatically and transparently support the Secure Sockets Layer (SSL). That is, URL's that start with https://.... Note the you must have Microsoft Internet Explorer 5.0 or later installed on your computer.
We have enhanced the various LinkScan Menus and Reports with a completely new "look and feel". Major improvements include a new Critical Errors Report, a more comprehensive Summary Statistics Report, context-sensitive help, and more convenient preferences/options. All reports are available in Rich, Standard or Text formats. The Rich format makes extensive use of HTML tables which produce an easy to use layout. However, all major browsers tend to encounter memory problems when rendering very large tables with many thousands of cells. If a selected report is likely to exceed 1000 rows, LinkScan will automatically use Standard format to avoid these problems.
We have completely eliminated the dependency on the operating system sort utility.
We have improved still further LinkScan's analysis of JavaScript and ASP constructs and incorporated several significant performance enhancements.
We have added a new check and Status Code for <A HREF=...> tags with no corresponding </A> tag. This may be enabled or disabled with the Closeatag option in linkscan.cfg.
We have added a new Followext option to linkscan.cfg. If enabled, LinkScan will attempt to follow redirections when testing external links (versus simply noting the redirection).
We have added a new Errordoc option to linkscan.cfg. This feature is useful when scanning servers that automatically redirect bad requests to a Custom Error Document. If such a page is served with a 200 OK Status, serious errors may be masked. A command such as:
Errordoc notfound\.html$
will force LinkScan to report a 404 Not Found error for any URL that is redirected to a URL that matches the pattern specified with the Errordoc parameter.
We have enhanced the Substitute command. This command is used to manipulate URL's as they are processed by LinkScan. We now support separate Substituteraw and Substitute commands. The former operates on URL's as they are extracted from the raw HTML tags. The latter operates on URL's after they have been normalised relative to the then current base URL.
We have enhanced the Substitute command only with the special token !U. For example:
Substitute (.*) !U$1
This will cause LinkScan to decode any %-encoding within the URL. For example:
Substitute cgi-bin/redirect\?.*?&Link=([^&]+).* XX$2 Substitute XX(.*) !U$1
Hence a link to:
cgi-bin/redirect?Type=1&Link=http%3A%2F%2Fwww%2Eexample%2Ecom%2F
will be translated to:
XXhttp%3A%2F%2Fwww%2Eexample%2Ecom%2F
and then to:
http://www.example.com/
We have added a new Tagonce command to linkscan.cfg. If enabled, LinkScan will only process one time any link that matches the specified pattern. All subsequent references to that link will be completely ignored. This option may be used to eliminate excessive storage associated with tracking thousands of references to the same frequently used URL. For example links associated with toolbars and other navigation aids that are included in every document on a large website.
We have incorporated the ability to check for Orphaned Files on remote servers without the requirement to use NFS or a local mirror copy of the target website. We supply a script which may be executed on the remote machine to collect a recursive file listing that may subsequently be imported into LinkScan in liew of direct file system access.
New in LinkScan 7.2
We have enhanced LinkScan Enterprise so that two or more hosts may be scanned within a single Project. For details see LinkScan Enterprise Extensions. This capability is not available in LinkScan Workstation, Server or ServerPro since they are limited to only one computer.
We have simplified the testing of password protected sites and links. The Auth command may be configured with a blank Realm. LinkScan will use the specified username and password for any Realm on the specified server. You do not need to specify a Realm unless you need LinkScan to use multiple username and password combinations for different Realms on the same server. For example:
Auth www.example.com "" username password
We have enhanced support for Cookies. LinkScan accepts all cookies received during a scan and tracks them in a cookie jar. The cookie jar may be initialized with additional cookies by using the existing Cookie command in linkscan.cfg.
We have enhanced LinkScan to optionally check all <IMG SRC> tags for ALT, HEIGHT and/or WIDTH attributes. To enable this feature, add the following command to the linkscan.cfg file:
Imgtags = AHW # Flag all IMG SRC tags without Alt, Height, Width
We have implemented additional controls which may be used to prevent unnecessary scanning of very large sites, especially those using dynamic content. The new Taglimit command may be used to limit the number of documents scanned that match a specified pattern. For example, the following command may be added to linkscan.cfg:
Taglimit scripts/DatabaseLookup.asp 20
This will limit the number of times that LinkScan will probe the DatabaseLookup.asp script with different query parameters. In this case, LinkScan will probe only the first 20 references to this script. Note that the Taglimit and Maxcgi are both checked for each document.
We have further refined the default JavaScript pattern matching algorithms to improve coverage and reduce false matches.
We have made several enhancements to some of the LinkScan Reports including a complete rewrite of the Selected Status Codes Report.
New in LinkScan 7.1
We have enhanced the Summary Detail Report with a completely new Slowest Pages First option to help webmasters examine page load times especially over slow (i.e. dial-up) connections.
We have improved the algorithms for the identification of JavaScript embedded hyperlinks to increase the percentage of links found and reduce false positives.
We have made several other small improvements especially relating to reliability under Windows 95/98.
New in LinkScan 7.0
LinkScan users with Unix systems may now scan remote systems via HTTP. Please see the LinkScan End-User License Agreement for permitted use. The following command will initiate such a scan:
perl linkscan.pl -remote http://www.example.com/ -project example
We have enhanced LinkScan with support for JavaScript. Links may be extracted from JavaScript code using (customizable) pattern matching techniques.
We have added the capability to specify additional URL's that must be scanned, whether or not LinkScan encounters links to those URL's in other documents. This includes the ability for LinkScan to submit specific forms with specified data values. Forms may be submitted using either the GET or POST methods.
We have included our MailVet technology that can verify, with a high degree of accuracy, whether an e-mail address will or will not bounce mail. MailVet will probe up to 500 unique "mailto" tags without actually sending any mail.
We have provided additional controls to specify document ownership. In particular, owner names may be extracted document META tags and subsequently manipulated via Regular Expressions.
We have added limited support for ldap://... links. LinkScan will attempt to establish a connection to Port 389 of the specified server. It does not currently validate the query and the status will be reported as an Advisory; "LDAP Server Connected - Query Not Checked".
We have added additional support for SSL (https://) secure server proxies.
We have provided powerful facilities to manipulate specific links via Regular Expressions. This feature may, for example, be used to remove or manipulate SESSIONID's that are added dynamically by your HTTP server. It can also be helpful in controlling test conditions for sites that use mainly dynamic content.
New in LinkScan 6.1
We have enhanced LinkScan with the ability to import a simple list of links for validation. This feature may be used to validate large numbers of links that have, for example, been exported from a database management system or other application program.
We have simplified the flexible (but confusing) array of options associated with LinkScan/QuickCheck. QuickCheck will now always attempt to retrieve the page status information from an existing Linkscan database (very fast). If this fails, QuickCheck will fetch the document via HTTP and validate the links in real-time (slower). When the results are based on the database, an option is provided to perform a new real-time check. In addition, QuickCheck will warn the user if the date-time-modified stamp on the source file is later than the data-time-modified stamp on the database. This alerts the user to the fact that the database status may be out of date.
We have enhanced LinkScan/QuickCheck to display the HTTP Request and Response Headers associated with document retrieval.
We have improved the performance of DNS lookups associated with all HTTP requests. This may cause problems on a very small number of installations (as far as we have been able to tell, systems running certain older Linux distributions). This problem normally presents as a series of 900 (DNS), 903 (Timeout) or 999 (Unknown) errors. Or rarely a core dump. In the unlikely event that you experience these symptoms, simply add the following entry to linkscan.sys:
Nodnsalarm = 1
We have greatly improved the support for validating hyperlinks embedded in Adobe Portable Document Format (PDF) documents. To enable this feature, you must set the following parameter in linkscan.cfg:
Pdffiles = pdf
We have enhanced LinkScan to recognize and validate links of the form:
<script src="foo">
We have added support for the special NULL token in the Htmlfiles parameter. This may be used to tell LinkScan to process files with no file extension as if they were HTML documents.
We have changed LinkScan so that it now assumes there is an implied <a name="top"></a> in each HTML document. This means that all references to <a href = "#top"> are considered valid, consistent with all common web browsers.
We have improved LinkScan's processing of references containing %encoded characters.
We have enhanced LinkScan with a new Extraheader command. Adding this command to linkscan.cfg will force LinkScan to send the additional header with each HTTP request. For example, to set a prefered language, use:
Extraheader = Accept-Language: en
We have enhanced LinkScan to prevent simple HTML errors resulting in the creation of databases for phantom Owners. For example, a hyperlink with a missing "http://" such as:
<a href="www.example.com">
will no longer result in the creation of a "www.example.com" Owner.
We have enhanced Linkscan so that the following linkscan.sys parameters may be overridden with the per-Project linkscan.cfg files:
- Timeout1
- Timeout2
- Dprocs
- Nprocs
- Masterport
New in LinkScan 6.0
LinkScan 6.0 includes some significant changes to the scanning modules. For Windows users:
- Multi-tasking HTTP navigation of the site being scanned is supported.
- Multi-tasking validation of External links is supported.
- The timeout/retry logic has been greatly improved when checking slow or hung links.
These changes eliminate prior restrictions due to limitations of the Perl implementation for Windows and can greatly improve performance.
For Unix users:
- Multi-tasking HTTP navigation of the site being scanned is supported.
- When validating external links with multiple processes, the memory requirements are significantly reduced.
New in LinkScan 5.5
LinkScan 5.5 is an exciting new release.
The Graphical User Interface supplied with LinkScan for Windows incorporates numerous enhancements to simplify installation and configuration.
LinkScan for Windows includes a basic HTTP server, the LinkScan WebServer. Users may install the LinkScan Server automatically or elect to integrate LinkScan with an existing HTTP server such as Apache or Microsoft IIS.
Existing LinkScan users should note that the configuration file formats have changed significantly at LinkScan 5.5 to simplify system administration and maintenance. We have supplied a tool to automate the conversion of your existing configuration.
The configuration file format changes are summarized below:
The file linkscan.mas has been simplified. This file now contains a simple list of configured Project directories. Project Descriptions are now stored in the corresponding linkscan.cfg file.
The file linkscan.usr has been eliminated. These options, used to provide access controls to the LinkScan CGI scripts, have been integrated into linkscan.sys.
The file linkscan.ign has been eliminated. The LinkScan customization commands are now stored in the file linkscan.cfg.
The file linkscan.alt has been eliminated. The SiteMap customization commands are now stored in the file linkscan.cfg.
The linkscan.cfg templates have been "normalized". A global linkscan.cfg is always required in the main LinkScan directory. The settings in this file establish defaults for all configured Projects. The project-specific linkscan.cfg files in the individual project directories have been greatly simplified with far fewer items to configure. However, any default setting in the global linkscan.cfg file may be overriden by pasting the appropriate command into the linkscan.cfg file for an individual Project.
We have found that these changes greatly simplify system configuration and administration in complex multi-Project scenarios. The automatic conversion script will attempt to normalize the global and project-specific linkscan.cfg files. However, users may find they can achieve further simplication with a few minutes of manual inspection and editing.
New in LinkScan 5.4
LinkScan 5.4 is primarily a maintenance release that consolidates several minor bug fixes and enhancements:
At LinkScan 5.4 we introduced the new LinkScan Server and LinkScan Workstation products.
At LinkScan 5.4 we introduced a Graphical User Interface (GUI) for LinkScan on the Windows NT and Windows 98 platforms.
We have also included infrastructure to support new upcoming enhancements.
New in LinkScan 5.3
At LinkScan 5.3 we have improved the processing of Server Side Include (SSI) tags when using File System navigation. SSI Include tags are fully expanded by LinkScan provided that Expandssi is enabled in linkscan.cfg. SSI tags that require scripts to be executed (CGI/EXEC) are not processed. When using HTTP Navigation, all SSI's (including executables) are processed by the HTTP server.
At LinkScan 5.3 you may optionally tell LinkScan to check your HTTP server access logs and include the per-document page impressions on the SiteMap reports. To enable this feature, be sure to set the Httpdlogfile parameter in linkscan.cfg.
At LinkScan 5.3, we have incorporated an audit trail of site scans. Each execution of linkscan.pl will append a record to the file .../linkscan/project_name/data/linkscan.sum. This tab delimited file may be imported into spreadsheets and other applications for management reports. See linkscan.sum for the specification of the file format.
At LinkScan 5.3, when scanning via HTTP, LinkScan can submit an arbitary cookie to your server. This makes it easier to validate those sites that use Cookie based user authentication schemes.
We have added support for the Onlyorphans command to provide finer control over which directories on your server should and should not be checked for orphaned files.
We have made several cosmetic improvements to the SiteMap and TapMap reports.
We have made several small improvements to the treatment of pathnames containing non-standard (e.g. %encoded) characters.
We have inserted code to detect/correct several common configuration errors.
New in LinkScan 5.2
At LinkScan 5.2 we have improved HTTP navigation (the Execute command) for validating dynamic content (CGI scripts, Server Side Includes etc.), enhanced several of the LinkScan Reports and added some completely new reporting options. Some of the specific enhancements include:
The LinkScan Reports no longer require the use of Cookies for storing individual user preferences. The system will use cookies if available - otherwise it will maintain current settings by passing them via the URL. This avoids random problems that some users have reported with certain browser installations.
The Summary/Detail Report has been enhanced with an option to display all documents older than "N" days.
The Summary/Detail Report has been enhanced with an option to sort the documents by the number of "Inline Bytes". The Byte Count includes the document itself, any inline images (<img src> but not <img lowsrc> tags), background images and image buttons. Each unique image is only counted once - we assume that the client will cache multiple references to the same image within the same document. In-line image references to remote servers are also counted (assuming LinkScan can reach them via HTTP and that the server will return a size header without having to download the entire file).
The Summary Statistics Report displays separate tables for Internal and External links.
The Summary Statistics Report error counts are hyperlinked to the corresponding Detailed Report.
The All Pages Linking Report displays separate tables for Links To: and Links From:.
We have added the new Redirections Report to summarize all local redirections including the missing "/" on directory references, <META HTTP-EQUIV REFRESH> tags and actual HTTP redirects.
Several Reports provide for Include and Exclude expressions that may be matched on Referer or Target. Include/Exclude expressions may now be matched on Referer, Target or either.
When scanning for Orphaned Files user may control the depth of the scan in terms of directory levels with the new Maxdirlevels configuration option in linkscan.cfg.
We have added the Noorphans command option. This will Exclude all files matching the specified expression from the Orphans Report without effecting any other Reports.
We have added the new Autohttp configuration command to linkscan.cfg. When navigating the Website via File System navigation, LinkScan can automatically attempt HTTP access when file system access fails to locate a specific file. This may be used to eliminate the requirement to configure server aliases and redirections but with some loss of performance. Note: file system access is typically 5 to 10 times faster than HTTP access.
We have improved the detection of, and recovery from, several rare exception conditions. Additional diagnostic capabilities have been incorporated to facilitate problem investigation and resolution in conjunction with Elsop's Technical Support personnel.
New in LinkScan 5.1
LinkScan 5.0 was a major new release. At LinkScan 5.1 we have consolidated several minor bug fixes and a number of improvements designed to further simplify LinkScan administration. The following items are worthy of note:
We have improved the default placement of output files from command-line generated reports (linkscan.cgi and dispatch.pl). Users must define the pathname to the default directory in the file linkscan.sys with the Reportsdir setting.
Some servers require that the LinkScan CGI scripts be installed a special directory (often cgi-bin). In these situations the scripts need to know where to find the remainder of the LinkScan files. In the past, this was achieved by setting a special variable ($LS::Lsdir) in the header of each script. At LinkScan 5.1, we have eliminated that special variable and the the full pathname to the LinkScan directory must be defined in the hidden file called .linkscan. We have updated the LinkScan Configurator accordingly to make this change transparent to users installing LinkScan via that method.
We have enhanced the SiteMap customization features to make it easier to include or exclude different files from the LinkScan SiteMap and TapMap.
We have enhanced LinkScan to validate URL's contained within drop-down lists.
We have improved the error detection and recovery logic associated with various system interfaces to ensure that any misconfiguration errors or exceptions are more clearly detected and reported.
New in LinkScan 5.0
We have significantly reduced LinkScans virtual memory usage on large web sites. Virtual memory usage will depend to some extent on the Operating System, Perl version, malloc() implementation and the nature of the site being scanned. However, in studies, we have found that 1 MByte of virtual memory per 1,000 HTML documents is a reasonable rule-of-thumb. (This compares with 5-10 MBytes per 1,000 documents at LinkScan 3.x/4.x).
We have made many other changes to the internal code and data structures to improve performance, reliability and maintainability as well as providing a platform for future enhancements.
The previous implementation of multiple Projects has been changed. The new model introduces several new concepts which are defined below:
- Projects
- Owners
- Usernames
A Project is defined as a distinct LinkScan configuration. In general, you will only need to create one such configuration for each domain or virtual host on your server. You may, optionally create multiple configurations for a single domain or virtual host. However, you must create at least one Project for each domain or virtual host.
Within a given Project you may define multiple Owners. Each file within the Project may be assigned to one of an arbitrary list of Owners by any or all of the following means:
- A Defaultowner command in linkscan.ign
- The Unix file system ownership attribute
- Pattern Matching on pathname in linkscan.ign
- Meta tags inserted in the document body
- In addition, we have added a command which will automatically create an Owner for each top-level directory under the Home Directory
LinkScan creates (mainly) separate databases for each Owner. This facilitates user-selective queries and greatly improves performance. By default, LinkScan also creates an All Owners database for each Project.
Usernames are used to:
- Optionally, provide per-user access control to the LinkScan reports
- Optionally, control which users may view which Project databases
- Optionally, control which users may view which Owner databases
- Optionally, control which users may access specific reporting options
By default, LinkScan will set the default Owner selection to the current Username.
We have enhanced the LinkScan SiteMap and TapMap. SiteMaps and TapMaps based on Link Ordering are provided for each Project. In addition, SiteMaps and Tapmaps based on Directory Structure are provided for each Project and each Owner within that Project.
Orphaned File listings have been removed from all of the previous reports and we have added a new Orphaned Files Report to the Main Menu.
We have enhanced the All Pages Linking To ... Report. In previous versions you could only view the first "N" referring pages where "N" was limited to the Maxgoodint setting in linkscan.cfg. From the Summary/Detail Overview you may now select a complete list of referring pages.
We have enhanced many other reports with new and more consistent options including:
- More control over sort ordering
- New selection options
- More facilities for including/excluding specific references
- The ability to include/exclude on the target URL or the referring URL
- More options to customize the headers and footers of the LinkScan Menus and Reports
We have also improved the formatting options. Reports may be created in any of the following formats:
- Full HTML with hyperlinks and graphics
- Full HTML with hyperlinks and no graphics
- Basic text without graphics. These reports do not include hyperlinks although they do make some limited use of HTML constructs (mainly <br> and <hr>) where they improve browser based views and facilitate the parsing of the reports by user-written post processors
- Pure ASCII text suitable for viewing on a dumb terminal (command line interface only)
We have similarly enhanced the command line reporting options. The linkscan.rep file format has been extended and you may now define specific default parameters for each report type.
We have updated and improved all of the LinkScan documentation and added the LinkScan Quick Reference Card.
We have provided the capability to relocate the LinkScan documentation and images directory to any URL on your server. You may also control what files the [Help] and [Status Code] hyperlinks on the reports will link to so that you can integrate local site-specific documentation more easily.
We have made several small error corrections and numerous other minor enhancements in response to customer feedback.
New in LinkScan 4.2
At LinkScan 4.2, we have focused on enhancements to the various reporting modules with both new and more consistent options.
We made the new Summary --> Detail Report the default selection with options to sort the report (ascending or descending) on the Number of Errors in the document, Document URL, or Document Age. It includes hyperlinks to LinkScan/QuickCheck which may be used to display all of the potential problems with a selected document.
We improved LinkScan/QuickCheck with many new features including Simple and Advanced Options Menus and the ability to configure default options for it in linkscan.sys.
QuickCheck "remembers" individual user preferences by setting a Cookie in the users browser.
We have also added Source Code Line Numbers to the LinkScan reports where it will be useful in diagnosing and correcting errors in a document.
In addition, QuickCheck integrates with Weblint. Weblint performs rigorous HTML syntax checking of the source document. This optional feature may be used to show all of the HTML syntax errors and broken links in a single report together with the HTML source code.
The menus for the various LinkScan CGI scripts may be customized by creating the files linkhead.txt and linkfoot.txt in the LinkScan directory.
When using custom headers and footers with SiteMap and TapMap, LinkScan displays a discrete version stamp and copyright notice at the bottom of each page.
The LinkScan documentation has been restructured and supplemented with a new LinkScan User Guide. This new guide is directed at the needs of Content Managers and Developers. The LinkScan Reference Manual (this document) is directed at the needs of Systems Administration personnel.
We added significant performance and accuracy enhancements when validating FTP links.
We added greater flexibility when creating and configuring multiple Projects.
We added a "-quiet" option to allow for more succinct progress displays during scanning. LinkScan also displays a total error count on completion of a scan.
We fixed several minor bugs and incorporated numerous other small changes requested by customers.
New in LinkScan 4.1
The following changes and enhancements were incorporated in LinkScan version 4.1:
LinkScan 4.1 is significantly faster at scanning the internal links. In tests, CPU usage was reduced by 30-50 percent
Added LinkScan/QuickCheck
Added the ability to validate FTP links. The FTP protocol is older and less consistently implemented that HTTP. You may, therefore, find that LinkScan produces some false errors when checking links to certain servers. If you discover any such examples, please Email the URL to <[email protected]> and we will seek to address the issue in the next release
Added syntax checking of mailto links. LinkScan does not probe or send Email to those destinations
Added the "All Pages Linking To ..." Report to the Main Menu of reporting options. This report helps webmasters quickly identify the impact of removing a document or file by listing all of the pages that link to it
Added support for server-side image maps
Added support for the HTTP Proxy-Authenticate feature
Added the additional status code Location Header Not Absolute
Added the additional status code URL Contains Unsafe Character
Numerous enhancements to LinkScan/Dispatch including the addition of the Defaultowner and Mailalias commands to linkscan.ign, and the Ownertags command to linkscan.cfg. The dispatch.cfg file has been eliminated and those parameters are now defined in linkscan.sys/linkscan.cfg
Numerous enhancements to the LinkScan Configurator
Several minor bug fixes and improvements
New in LinkScan 4.0
The following changes and enhancements were incorporated in LinkScan version 4.0:
Added the LinkScan/Dispatch module
Added the Indexoptions directive and the ability for LinkScan to create virtual pages based on a directory listing if no default page exists in that directory
Added the Statuscode directive and the ability to customize the severity of any or all LinkScan Error and Status Codes
Several minor bug fixes and improvements
New in LinkScan 3.2
The following changes and enhancements were incorporated in LinkScan version 3.2:
The LinkScan configurator will copy CGI files to a 'cgi-bin' directory and update the '$Lsdir' parameter automatically.
LinkScan automatically creates template for new Projects.
Added new 'Noprojectlist' directive to linkscan.sys file.
Added new 'Hostalias' directive to linkscan.ign file for use with servers that have multiple identities.
LinkScan database is created in a temporary working directory so that previous reports remain available during scanning
Added new !HOME expression to 'Alias' directive in linkscan.ign.
Added support for a new Global linkscan.ign file
Several minor bug fixes and improvements
New in LinkScan 3.1
The following changes and enhancements were incorporated in LinkScan version 3.1:
Added the ability to check links embedded within Adobe PDF files. To enable this capability, simply add the 'pdf' suffix to the list of Htmlfiles in linkscan.cfg
LinkScan now checks <a name=...> tags in documents that are defined as 'NoFollow'.
Enhanced TapMap such that users can create hyperlinks from regular documents to a specific TapMap at the appropriate position and level.
Added specific support for the <!--#echo var="DOCUMENT_URI" --> Server Side Include
The LinkScan Configurator automatically updates the "#!/usr/local/bin/perl" headers in all of the LinkScan executable files
Added a case-sensitive search option to the LinkScan History Report
Added new Hidelinkprefix option to linkscan.cfg.
Several minor bug fixes and improvements
New in LinkScan 3.0
The following changes and enhancements were incorporated in LinkScan version 3.0:
Redesigned Multi-site Manager for simplified configuration management.
New reporting option to display full system configuration parameters
Significant performance improvements (CPU time and memory) to the LinkScan Reports - linkscan.cgi
Overview by Web Page Report now includes a hyperlink to an Error Report for each page
Various new controls added to control the frequency with which external links are tested.
Randomized the order with which external links are testing to avoid load peaks on remote servers
Added controls to automatically purge/expire the History file, linkscan.hst
The file linkscan.red now includes a listing of the URL's for all pages on your site for easy submission to search engines. Infoseek will accept an Email submission containing all the links on your website. In a test submission of 313 pages for one of our websites, Infoseek indexed about 280 of them in about 10 days.
The Noproxy option was changed to work with a partial (versus exact) match.
Improved the Multi-Site Manager and provided for the definition of a default configuration.
<img src=...> tags within <input....> tags are now tested correctly
Added option to disable the TapMap options.
Various minor improvements to the SiteMap/TapMap HTML tags including additional optimization for the Lynx browser family
Several minor bug fixes
New in LinkScan 2.1
The following changes and enhancements were incorporated at LinkScan version 2.1:
Added the ability to emulate server aliases and redirections.
Added the ability to selectively execute CGI scripts and Server Side Includes, parse their output and validate any links that are generated.
Redesigned the capability for validating links to pages that require authentication. Username/password combinations are defined on the basis of server and "realm" rather than specific URL.
Added option to disable orphan checking.
Improved the TapMap navigation tools
Various other minor enhancements and bug fixes
New in LinkScan 2.0
The following changes and enhancements were incorporated at LinkScan version 2.0:
Major restructuring to increase performance and reduce virtual memory utilization especially when scanning large websites with thousands of documents.
Improved Multi-Site Manager to simplify the testing of partial websites and/or sub-sites.
Added "Noproxy" option to selectively disable proxy access on specified servers.
Modified definition of Internal and External links for greater flexibility.
Extended to Hide command to accept Regular Expressions.
Restructured the LinkScan Reference Manual
Various other minor enhancements and bug fixes
New in LinkScan 1.2
The following changes and enhancements were incorporated at LinkScan version 1.2:
Numerous enhancements to the HTML parser
Additional SiteMap and TapMap options. In particular, the incorporation of a Target option to simplify the creation of SiteMaps and TapMaps for use on websites that make use of "frames"
Various other minor enhancements and bug fixes
New in LinkScan 1.1
The following changes and enhancements were incorporated at LinkScan version 1.1:
Addition of the LinkScan Configurator and LinkScan Startup Guide
Initial Release of TapMap
Various other minor enhancements and bug fixes