LinkScan for Unix. Reference Manual. | Section 29 |
Previous Contents Next | Help Reference HowTo Card |
LinkScan incorporates several functions that relate to electronic mail. These include:
Some or all of the following parameters must be configured in order to use these functions:
Sendmailpath = perl utils/sendmail.pl Smtphost = smtp.example.com Hostname = www.example.com Mailfrom = [email protected] Nameservers = [...] Mailto = 1
Sendmailpath: The pathname to the sendmail.pl utility that is installed in the LinkScan utils/ folder.
Smtphost: The full hostname of a SMTP mail server that you are authorized to use.
Hostname: The full hostname of the computer on which LinkScan is installed. This is used for the SMTP HELO. For sending LinkScan reports via email a hostname of localhost may work, depending on your SMTP server. For Active Mailto Checking an accurate hostname (matching the reverse DNS) is required.
Mailfrom: The From: address, used for sending LinkScan reports and Active Mailto Checking.
Nameservers: Leave blank unless running with Active Mailto Checking enabled and LinkScan reports nameserver errors.
Mailto: When enabled, all LinkScan Reports include an option to mail to the current report to a selected address.
Sendmailpath = /usr/lib/sendmail -t Smtphost = Hostname = www.example.com Mailfrom = [email protected] Nameservers = [...] Mailto = 1
Sendmailpath: The absolute pathname to the sendmail executable on your server. The -t switch is required.
Smtphost: This parameter is ignored on Unix systems.
Hostname: The full hostname of the computer on which LinkScan is installed. This is used for the SMTP HELO. For Active Mailto checking an accurate hostname (matching the reverse DNS) is required.
Mailfrom: The From: address, used for sending LinkScan reports and Active Mailto Checking.
Nameservers: Leave blank unless running with Active Mailto Checking enabled and LinkScan reports nameserver errors.
Mailto: When enabled, all LinkScan reports include an option to mail to the current report to a selected address.
For completeness, we address two related settings in the linkscan.cfg file:
Mailhost = example.com Checkmailto = 0
Mailhost: This setting is used exclusively for sending e-mail reports from LinkScan Dispatch. By default, e-mail reports are sent to Owner@Mailhost.
Checkmailto: This parameter enables Active Mailto Checking. It is disabled by default. Note that this feature requires the Perl Module Net::DNS be installed on your computer. The Net::DNS Module is available from http://www.net-dns.org/.
LinkScan includes support for the Wireless Application Protocol (WAP) and Wireless Markup Language (WML). This allows LinkScan to validate wireless sites via an HTTP gateway. Typically, you will need to add the following configuration commands to linkscan.cfg:
Extraheader User-Agent: Nokia7110/1.0 (04.80) Mimetypes text/vnd.wap.wml H
This will cause LinkScan to send an appropriate User-Agent header with each request and to parse/follow documents with a MIME/Content-Type of text/vnd.wap.wml.
LinkScan may be configured to test websites hosted on secure servers running the Secure Sockets Layer (SSL). i.e. sites with URL's of the form https://www.example.com/.
On the Microsoft Windows platforms, you need only specify the URL of the site to be scanned. LinkScan includes native support for the Secure Sockets Layer.
On Unix systems, you will need to install additional software to handle the SSL encryption. The required packages are:
OpenSSL available from http://www.openssl.org/
Perl Module Net::SSLeay available from http://search.cpan.org/search?module=Net::SSLeay
At the time of writing LinkScan has been tested with OpenSSL version 0.9.6 and Net::SSLeay version 1.05.
Installation of both packages is very straightforward if you have root access:
cd $HOME/openssl-0.9.6 ./config make make test make install # See Note 1 cd $HOME/Net_SSLeay.pm-1.05 perl Makefile.PL make make test # See Note 2 make install # See Note 1
Note 1: The make install steps may fail if you do not have root access. You may install and run these packages from a user directory if you do not have root access by using something like this:
cd $HOME/openssl-0.9.6 ./config --openssldir=$HOME/myopenssl make make test make install cd $HOME/Net_SSLeay.pm-1.05 perl Makefile.PL $HOME/myopenssl make make test mv ./blib/lib/Net/ /usr/www/linkscan/ mv ./blib/lib/auto/ /usr/www/linkscan/
Note 2: The make test on Net::SSLeay will produce a number of errors. In general, you can safely ignore them.
Once the module Net::SSLeay has been successfully installed, LinkScan will be able to scan https://... sites without any additional configuration changes.
Each of the above referenced programs (with the exception of LinkScan) is maintained by parties other than Electronic Software Publishing Corporation. You are solely responsible for your use of those products and your compliance with any applicable software license agreements. Several of the referenced products contain encryption algorithms, the distribution and use of which may be subject to various laws and regulations. You are solely responsible for compliance.
When scanning sites that contain (in whole or in part) Japanese pages, include the following directives in the Project configuration file (on Windows systems, via the Advanced Tab of the Project Planning Property Sheet):
Jisencode = 1 Displaylang = EUC-JP
Pages containing JIS, Shift-JIS and/or EUC-JP encoded Japanese characters will be normalized to EUC-JP. This means, for example, that the TITLE tags extracted from different documents may be combined in a single summary document (e.g. the LinkScan SiteMap) even though the original pages were constructed with different encodings.
The encoding type of each document is stored in the LinkScan database together with the MIME type (Content-Type). The Search Documents Report may be used to search/display this data and help enforce consistent encoding standards across mixed language sites.
LinkScan automatically creates a XML Sitemap file in a format suitable for submission to Google Sitemaps. For more background, see Google Webmaster Help Center.
The XML Sitemap file is created automatically. The file name is sitemap.xml and it resides in the Project subdirectory of the LinkScan installation directory. e.g.
The file is formatted in compliance with the Google Sitemaps Protocol. However, Google recommend that the file be compressed using gzip. The gzip utility is standard on most UNIX systems. Windows users may download a free command line implementation of gzip from http://www.gzip.org/.
LinkScan produces the sitemap.xml file with the following Google-defined fields for each web page listed:
changefreq Valid options are "always", "hourly", "daily", "weekly", "monthly", "yearly" or "never". LinkScan sets the changefreq to "weekly" by default. This may be changed by adding a Gsmchangefreq command to the Project linkscan.cfg file [Windows users: add this command via the Advanced Tab of the Project Planning Property Sheet].
lastmod LinkScan uses the data/time last modified data it collects. With File System scanning this is taken from the servers file system attributes. With HTTP scanning this is taken from the Last-Modified HTTP header (if present). If no specific date/time stamp is available, LinkScan supplies the date/time of the last scan.
priority This is assigned automatically by LinkScan, based on the document level within the LinkScan Link Order SiteMap. In summary, it means that pages which are one or two clicks from the home page (start of scan) are assigned a high priority. Pages that are many clicks from the starting page are assigned a lower priority.
In addition, LinkScan will optionally limit the scope of the Google Sitemap to the first "N" levels (as defined by the LinkScan Link Order SiteMap). This may be defined by adding a Gsmlevels command to the Project linkscan.cfg file [Windows users: add this command via the Advanced Tab of the Project Planning Property Sheet].
At version 11.6, LinkScan is able to parse and extract links from the following document types:
The following paragraphs describe how to use LinkScan to scan XML (or other similarly formatted) documents. Activating and configuring the XML parser involves two basic steps.
First, LinkScan must be told to route documents of
the appropriate type to the XML parser for analysis.
On UNIX systems this may be done with the Mimetypes
and Filetypes directives in the linkscan.cfg file.
Mimetypes text/xml X
Filetypes xml X
On Windows systems, these options may be set via the Mimes and Files Tabs of the Project Planning Property Sheet.
The former is used with HTTP Scanning and it will route all documents with Content-Type: text/xml header to the XML parser. The latter is used with File System Scanning and it will route all files with a .xml file extension to the new XML parser.
Second, LinkScan must be told how to extract links from the XML document. This is done via Regular Expressions and is best illustrated by example. Suppose we have an XML document organized like this:
<?xml version="1.0" encoding="ISO-8859-15"?> <link> <linkUrl>http://www.elsop.com/</linkUrl> <linkText>LinkScan</linkText> <linkTarget>_blank</linkTarget> <linkRef>000012345678</linkRef> </link>
We construct an Xmlmatch directive and add it to the linkscan.cfg file:
Xmlmatch = <linkUrl>([^<]+)</linkUrl>.*?<linkText>([^<]+)</linkText> $1 $2
LinkScan will now extract the link (http://www.elsop.com/) and the associated caption (LinkScan) from that XML file.
The new parser means that LinkScan can now be used to quickly and accurately extract links from XML and similarly formatted data files.
At version 12.3 LinkScan provides full support for IPv6. The IPv6 standard was designed to dramatically increase the number of Internet addresses available following the exhaustion of the entire IPv4 address pool. An overview of IPv6 is available at Wikipedia.
Using LinkScan with IPv6 on UNIX systems requires:
Using LinkScan with IPv6 on Windows systems requires:
A new setting, IPv6Prefs, provides user control over LinkScan's affinity for IPv6 versus IPv4 connections. At version 12.3, this setting applies to LinkScan on UNIX systems only. Valid values are:
IPv6Prefs=4 Use only IPv4 connections IPv6Prefs=6 Use only IPv6 connections IPv6Prefs=46 Use IPv4 connections if available and IPv6 if not IPv6Prefs=64 Use IPv6 connections if available and IPv4 if not IPv6Prefs=0 Inherit the system preferences or blank
LinkScan for Unix. Reference Manual. Section 29. LinkScan Application Notes
LinkScan Version 12.4
© Copyright 1997-2013
Electronic Software Publishing Corporation (Elsop)
LinkScan and Elsop are Trademarks of Electronic Software Publishing Corporation
Previous Contents Next | Help Reference HowTo Card |