Rietta.com Security logo

Glossary

This product is legacy software that is no longer maintained, nor supported by Rietta Inc. This page is preserved for historical purposes. See the listing of Rietta’s legacy software for a complete list of this software.

This glossary is intended to help provide greater explanation of the terms used in this documentation. The author is avoiding redefining common terms of which numerous definitions may be found. It is assumed that the reader is familiar with terms such as FTP, ASCII, and directories.

Document Root

The document root is the server path that is accessible by web visitors and robots. While it can be the server root, it is commonly set as a secondary directory such as public_html. There is nothing special about this name and the directory which it is set varies from web server to web server. Please consult with your systems administrator, virtual domain host, or server documentation if you are not sure what your document root is.

Disallow List

RoboGen Specific – it is the list on the left of the main window in which agent rules are viewed for the currently selected robot.

Robot Exclusion Protocol

The standard used for robot exclusion files. It defines the syntax and location for ROBOTS.TXT and how web robots are to parse that file. Each robot has a user-agent, which is its handle, and must follow all directives under the section for its user-agent. If there are no directives specific to its user-agent, then the robot is to follow all directives under the universal user-agent (which is denoted by an asterisk). Also, the robot exclusion file must be called ROBOTS.TXT and reside in the server’s document root.

ROBOTS.TXT

The file name utilized by the robot exclusion protocol. Web robots download this file from the server’s document root and parse it for instructions on what to index and not to index. The case of the file name does not matter, but it must exist in the document root.

Web Robot

Also known as a Web Wanderer or Web Spider, it is a program that traverses the Internet automatically by retrieving a document, and recursively retrieving all documents that are referenced. Robots can perform any number of functions, but the most common uses are: indexing, validating HTML, validating links, “What’s New” monitoring, and site mirroring.