Support Centres
What is ht://Dig?
ht://Dig is extremely powerful, fast, and very customizable. You can use it to index both your html and PDF documents. ht://Dig is a "search engine" used to do two things:
- Index web content into one or more "catalogs"
- Provide a method to search those catalogs for specific content and return a list of "pages" which contain that content
How can I use ht://Dig on my Website?
There are two main ways for you to use ht://Dig.
- Download the software, compile it, and manually configure it to your liking. This requires knowledge of UNIX, compilers, etc. Primus does not provide tech support for customers who install and compile their own custom ht://Dig installations.
- Use the simple ht://Dig administration tool provided by Primus to configure ht://Dig to your liking - this is the recommended option as all of the hard work has been done for you!
Primus' ht://Dig admin tool is supported on the media3.magma.ca, media4.magma.ca, media5.magma.ca and media6.magma.ca servers, plus all Hybrid Hosting servers. First, you need to request that ht://Dig support be enabled for your website. Please send an email to requesting this. Once ht://Dig support is enabled, you can use the ht://Dig admin interface available on myaccount.magma.ca to create and administer your ht://Dig catalogs and the associated search engine(s).
How do I use the ht://Dig Administration Tool on my site?
The administration tool is easy to use. You first need to log in using your VWS userid and password. Once you log in, you will see all of the ht://Dig "catalogs" created for your VWS. You can create new catalogs, delete catalogs, and edit the HtDig (indexer) and HtSearch (search-engine) configurations for each catalog. When Primus enables ht://Dig support for your VWS, a default catalog named "default" is created for you which indexes your entire public_html document tree by default. You can use the following sample HTML FORM code to use this default catalog to search your web content.
<FORM ACTION="/cgi-bin/htsearch" METHOD=POST> | |
Search for : | |
<INPUT TYPE=TEXT NAME="words" SIZE=20> | |
<INPUT TYPE=HIDDEN NAME=config VALUE="/www.yourdomain.com/default/conf/default"> | |
<INPUT TYPE=SUBMIT VALUE="Search"> | |
</FORM></span><span class="style3 style11"> |
The "config" hidden variable is the path to the config file. It should always start with /www.yourdomain.com (replace yourdomain.com with your actual domain name). The remainder of the path is the location of your configuration file (see below for more details on the file structure) except that you do not specify .conf at the end of the pathname.
ht://Dig and your VWS
The following directories are created in your home directory (~$HOME) when ht://Dig support is enabled by Primus. As you can see, nothing is placed in your public_html directory. It is up to you to build the page which contains the HTML FORM to searh your catalog(s) (see above).
~$HOME/htdig/catalogs/default/common/
~$HOME/htdig/catalogs/default/conf/default.conf
~$HOME/htdig/catalogs/default/db/
~$HOME/htdig/catalogs/default/rundig_default
~$HOME/public_html/cgi-bin/htsearch
The common directory contains the html templates used to display the search results. The HTML file search_form_example.html is a sample HTML FORM which you can use to search the "default" catalog. The files header.html, long.html, footer.html are all used when matches are found. The file nomatch.html is used when no matches are found when the user does a search. The conf directory contains the configuration file. This one file is used by both htdig and htsearch. Do not edit this file manually. The db directory is where the databases reside. Do not touch any of the files in this directory. The script rundig_default is the script used to re-create the indexes. Do not edit this script. If you create another catalog named "mysite", the following new directory structure would be created.
~$HOME/htdig/catalogs/mysite/common
~$HOME/htdig/catalogs/mysite/conf/mysite.conf
~$HOME/htdig/catalogs/mysite/db
~$HOME/htdig/catalogs/mysite/rundig_mysite
How often will the catalogs which I create be re-built to pick up new content?
Each catalog which you create will have a "catalog rebuild" scheduled on a weekly basis at a random day and time. You can manually force a rebuild using the administration interface.
Can I customize the HTML and graphics used on the search interface?
Yes! You can completely customize both the input form and the result page(s). You can embed the HTML FORM which calls the ht://Dig CGI on any of your webpages. The search result pages are built using HTML templates stored in the $CATALOG/common/ directory. Make sure you only edit the HTML components in those templates. The templates also contain "variables" which should not be changed. You should make a backup of the templates before you attempt to edit them in case you make a mistake and the template stops working.
Do I need to create multiple catalogs if I want to have different HTML FORM's search different directory structures?
No, you do not. Take a look at the sample search form on the front page of www.htdig.org (at the bottom of the page). It uses an additional form element named "restrict" which allows you to search particular directories depending on what the user wants to search. The benefit of fewer catalogs is less disk spaced will be needed to store the index files.
Notes from Primus:
- The ht://Dig administration tool supplied by Primus allows you to customize almost all available ht://Dig parameters. This admin tool expects your configuration files to be stored in the directory structure specified above. You should not attempt to edit any of the configuration files manually, otherwise the admin tool may not work for your installation anymore.
- The admin tool creates a "cron" job on the server for the scheduling of the weekly catalog rebuilds. You should not attempt to edit or change the cronjob configurations, otherwise the admin tool may not work for your installation anymore.
- Please do not create ht://Dig catalogs which you do not need or which you will not use. Each ht://Dig catalog uses up some of your disk space (quota), and the weekly catalog-rebuild jobs add to the CPU load on the server.