Text to Matrix Generator - Indexing Module

Classification Module (classification_gui)

Assume we have processed a collection with tmg_gui, construct a TDM with 6,495 documents and 21,764 terms (a single label dataset corresponding to the well-known modapte split of the Reuters-21578 collection) and store the results to "TMG_HOME/data/reuters". Assume then, we want to classify the test part of the modapte split,using the k-Nearest Neighboors classifier for the following input:

Multiple docs (file): yes
filename: sample_document/reuters.test
delimiter: </reuters>
line delimiter: yes
use stored global weights: yes
stoplist: common_words
local term weighting: Term Frequency
classification method: k Nearest Neighboors (kNN)
num. of NNs: 10
collection type: Single-Label
preprocessed by: Clustered Latent Semantic Indexing
number of factors: 100
similarity measure: Cosine

Return to main page Step 2