Indexing is not a very fast process, and the speed may range from 30Kb to 250Kb per second, depending on index size and computer power. The indexer shouldn't be started too often, and frequency of starts depends on web-site update frequency. For static sites one execution of the indexer will do.
During indexing process three files are created:
The indexer can also create statistics file - stats.log, which can be processed right after having the server indexed to store information in database.
Two indexing modes a available:
To start the Indexer it is necessary to run searchctl(.exe) with the following options:
Example
Windows | Unix,Linux |
---|---|
C:\searchctl.exe localhostили C:\searchctl.exe --config=D:\www\ disk | ./searchctl name_of_taskor ./searchctl.exe --config=/home/www/ disk |
All indexer settings are stored in 'search.conf' file. The file has the following structure:
[Job name_of_task] [Action1] Parameter1 Value1 Parameter2 Value2 Parameter3 Value3 [Action2] Parameter1 Value1 Parameter2 Value2 Parameter3 Value3 |
There should be no empty lines and comments in the configuration file.
Action Index - index site. This action starts indexing system. At least one parameter should be specified in HTTP indexing mode and at least two - in local drive indexing mode.
More about parameters:
URL url
Address starting with 'http://...' in HTTP-mode, or local path in local drive mode.
Example:
For HTTP: URL http://www.novgorod.ru/frisbee/ For disk (Windows): URL c:/pub/home/frisbee/ For disk (Unix): URL /pub/home/frisbee/
Extensions ext1,ext2,ext3
Sets a list of extensions of files to be indexed. Can be used in local drive mode only, and is ignored in HTTP indexing mode. Extensions are separated by "," (comma).
Example:
Extensions htm,html,shtml,shtm
Path path
Spesifies working directory. Index files and a log-file are saved to this directory.
Example:
Path c:\www\novgorodor
Path /home/www/novgorod
CharSet cset
Sets the way character coding of the files to be indexed will be identified. The values may be:
Example:
CharSet ByHTTPHeader
MaxFiles num
Sets maximum number of files to be indexed, 10000 by default. Be careful when selecting value, because many servers contain huge numbers of links, for example http://news.novgorod.ru/
Example:
MaxFiles 50
Statistic stat
Sets the way reports are saved. Reports are generated at the end of action Index and are saved to file stats.log. Available options:
Statistics are saved to file stats.log.
Example:
Statistic Append
Exclude excl1,excl2,excl3
Sets a list of words to be excluded. Addresses containing at least one of excluded words are not included in indexing queue. Words are separated by "," (comma)
Example:
Exclude editpost.php?,reply.php?,admin/
AddOption opt
Sets indexing method. Can be used in HTTP indexing mode only. The following values are available:
Example:
AddOption SubPages
Sets language. If this parameter is specified a field 'Accept-Language' is included in HTTP header. This variable may effect document content on some sites.
Example:
Language ru
AFrom pathSets substring which will be replaced in URL by string specified in parameter ATo.
Example:
AFrom /home/dir/mysite/ ATo http://search.codenet.ru/
ATo urlSets substring which will replace AFrom in URL. Used together with AFrom.
Example:
AFrom http://127.0.0.1/ ATo http://www.codenet.ru/
or
AFrom c:/documents/www/www.codenet.ru/ ATo http://www.codenet.ru/
StartWord word
Sets starting word. Page description will be composed of words following the starting one. Hence, it is possible to exclude menus and the like from description. The starting word is obligatory.
Example:
StartWord about
MetaDescription yesno
Sets page description method. Description can be displayed in search results with help of the special symbol %E. Available values are "Yes" or "No". Default is 'No'. If 'Yes' is used, the system attempts to get description from '<META name="description...' tag. If tag can not be found or the value is 'No', description is composed of the first words in the document (see. startword)
Example:
MetaDescription Yes
MetaRobots yesno
If the parameter has value "No", the tag '<META name="robots"...' is ignored, otherwise the tag is analysed for presence of NOINDEX, NOFOLLOW, NONE. More details can be found in section Use of "Robots" META-tags. Default value is "Yes"
Example:
MetaRobots No
UseRobotsTxt <yesno>
If set to "Yes", indexing rules are taken from file 'robots.txt', stored in web-server root directory. Default value is "No". More information about working with 'robots.txt' is available in section robots.txt - Exclusions Standard for Robots. Robot's name is "CNSearch".
Example:
UseRobotsTxt yes
Starting with version 0.91 an option of working through proxy-server became available. 4 new directives were added ProxyServer, ProxyPort, ProxyLogin, and ProxyPassword
ProxyServer server
Specifies proxy-server. The indexer connects directly by default. Works with ProxyPort.
Example:
ProxyServer proxy.domain.ru
ProxyPort port
Sets proxy port. Works with ProxyServer.
Example:
ProxyPort 8080
ProxyLogin login
Sets proxy login. Used only in case the proxy server requires authorization. Works with ProxyPassword.
Example:
ProxyLogin alex
ProxyPassword password
Sets proxy password. Used only in case the proxy server requires authorization. Works with ProxyLogin.
Example:
ProxyPassword qwerty
Runner is used to execute external applications. An external application can process a log-file and store its contents in database or copy index files and so on.
Filename file
Sets name of the file to execute.
Example:
Filename /home/alex/parser.pl
Sets command line parameters for Filename.
Example:
Params --user=root --password=jfiekf