Field Name
|
Default
|
Description
|
Input File/Directory
|
-
|
Files to be parsed with resulting documents separated by "Delimiter". Alternatively, each file in the input directory contains a single document.
|
Create New tdm
|
●
|
Checked if new tdm is to be created (default checked).
|
Create Query Matrix
|
-
|
Checked if new query matrix is to be created (default checked).
|
Update tdm
|
-
|
Checked if an existing tdm is to be updated with new documents. Alternatively, ckecked if an existing tdm is to be updated using different options (change update_struct).
|
Downdate tdm
|
-
|
Checked if an existing tdm is to be downdated according to the "Document Indices" field.
|
Dictionary
|
-
|
Name of .mat file or workspace variable containing the dictionary to be used by tmg_query function if the "Create Query Matrix" radio button is checked.
|
Global Weights
|
-
|
Name of .mat file or workspace variable containing the vector of global weights to be used by tmg_query function if the "Create Query Matrix" radio button is checked.
|
Update Struct
|
-
|
Name of .mat file or workspace variable containing the structure to be updated or downdated by tdm_update (or tdm_downdate) function if the "Udpate tdm" or "Downdate tdm" radio button is checked.
|
Document Indices
|
-
|
Name of .mat file or workspace variable containing the document indices marked for deletion when the "Downdate tdm" radio button is checked.
|
Delimiter
|
emptyline
|
The delimiter between tmg's view of documents. Possible values are 'emptyline', 'none_delimiter' (treats each file as single document) or any other string.
|
Line Delimiter
|
●
|
Checked if the "Delimiter" takes a whole line of text.
|
Stoplist
|
-
|
Name of file containing stopwords, i.e. common words not used in indexing.
|
Min Length
|
3
|
Minimum term length.
|
Max Length
|
30
|
Maximum term length.
|
Min Local Frequency
|
1
|
Minimum local term frequency.
|
Max Local Frequency
|
inf
|
Maximum local term frequency.
|
Min Global Frequency
|
1
|
Minimum global term frequency.
|
Max Global Frequency
|
inf
|
Maximum global term frequency.
|
Local Term Weighting
|
Term Frequency
|
Local term weighting function. Possible values: 'Term Frequency', 'Binary', 'Logarithmic', 'Alternate Log', 'Augmented Normalized Term Frequency'.
|
Global Term Weighting
|
None
|
Global term weighting function. Possible values: 'None', 'Entropy', 'Inverse Document Frequency (IDF)', 'GfIdf', 'Normal', 'Probabilistic Inverse'.
|
Database Name
|
-
|
The name of the folder (under 'data' directory) where data are to be saved (currently supported only for the 'Create New tdm' module).
|
Store in MySQL
|
-
|
Checked if results are to be saved into MySQL (currently supported only for the 'Create New tdm' module).
|
use Normalization
|
-
|
Indicates normalization method. Possible values: 'None', 'Cosine'.
|
use Stemming
|
-
|
Indicates if stemming is to be applied. The algorithm currrently supported is due to Porter.
|
Display Results
|
●
|
Display results or not to the command windows.
|
Remove Numbers
|
-
|
Checked if dictionary should not include numeric words.
|
Remove Alphanumerics
|
-
|
Checked if dictionary should not include alphanumeric words.
|
Parse All Subdirectories
|
-
|
Checked if subdirectories should be parsed in batch mode (recommended for large collections
to avoid repeated user interaction).
|
Continue
|
-
|
Apply the selected operation.
|
Reset
|
-
|
Reset window to default values.
|
Exit
|
-
|
Exit window.
|