Indexing Module (tmg_gui)


tmg_gui is a graphical user interface for "Text to Matrix Generator" that can be used to create or update term-document matrices (TDM's) or create term-query matrices.

See a demonstration of tmg_gui.

For complete up-to-date documentation visit the TMG Website:

http://scgroup20.ceid.upatras.gr:8000/tmg/

Field Name Default Description
Input File/Directory - Files to be parsed with resulting documents separated by "Delimiter". Alternatively, each file in the input directory contains a single document.
Create New tdm Checked if new tdm is to be created (default checked).
Create Query Matrix - Checked if new query matrix is to be created (default checked).
Update tdm - Checked if an existing tdm is to be updated with new documents. Alternatively, ckecked if an existing tdm is to be updated using different options (change update_struct).
Downdate tdm - Checked if an existing tdm is to be downdated according to the "Document Indices" field.
Dictionary - Name of .mat file or workspace variable containing the dictionary to be used by tmg_query function if the "Create Query Matrix" radio button is checked.
Global Weights - Name of .mat file or workspace variable containing the vector of global weights to be used by tmg_query function if the "Create Query Matrix" radio button is checked.
Update Struct - Name of .mat file or workspace variable containing the structure to be updated or downdated by tdm_update (or tdm_downdate) function if the "Udpate tdm" or "Downdate tdm" radio button is checked.
Document Indices - Name of .mat file or workspace variable containing the document indices marked for deletion when the "Downdate tdm" radio button is checked.
Delimiter emptyline The delimiter between tmg's view of documents. Possible values are 'emptyline', 'none_delimiter' (treats each file as single document) or any other string.
Line Delimiter Checked if the "Delimiter" takes a whole line of text.
Stoplist - Name of file containing stopwords, i.e. common words not used in indexing.
Min Length 3 Minimum term length.
Max Length 30 Maximum term length.
Min Local Frequency 1 Minimum local term frequency.
Max Local Frequency inf Maximum local term frequency.
Min Global Frequency 1 Minimum global term frequency.
Max Global Frequency inf Maximum global term frequency.
Local Term Weighting Term Frequency Local term weighting function. Possible values: 'Term Frequency', 'Binary', 'Logarithmic', 'Alternate Log', 'Augmented Normalized Term Frequency'.
Global Term Weighting None Global term weighting function. Possible values: 'None', 'Entropy', 'Inverse Document Frequency (IDF)', 'GfIdf', 'Normal', 'Probabilistic Inverse'.
Database Name - The name of the folder (under 'data' directory) where data are to be saved (currently supported only for the 'Create New tdm' module).
Store in MySQL - Checked if results are to be saved into MySQL (currently supported only for the 'Create New tdm' module).
use Normalization - Indicates normalization method. Possible values: 'None', 'Cosine'.
use Stemming - Indicates if stemming is to be applied. The algorithm currrently supported is due to Porter.
Display Results Display results or not to the command windows.
Remove Numbers - Checked if dictionary should not include numeric words.
Remove Alphanumerics - Checked if dictionary should not include alphanumeric words.
Parse All Subdirectories - Checked if subdirectories should be parsed in batch mode (recommended for large collections to avoid repeated user interaction).
Continue - Apply the selected operation.
Reset - Reset window to default values.
Exit - Exit window.

Return to main page