Classification Module (classification_gui)

Assume we have processed a collection with tmg_gui, construct a TDM with 6,495 documents and 21,764 terms (a single label dataset corresponding to the well-known modapte split of the Reuters-21578 collection) and store the results to "TMG_HOME/data/reuters". Assume then, we want to classify the test part of the modapte split,using the k-Nearest Neighboors classifier for the following input:

  • Multiple docs (file): yes

  • filename: sample_document/reuters.test

  • delimiter: </reuters>

  • line delimiter: yes

  • use stored global weights: yes

  • stoplist: common_words

  • local term weighting: Term Frequency

  • classification method: k Nearest Neighboors (kNN)

  • num. of NNs: 10

  • collection type: Single-Label

  • preprocessed by: Clustered Latent Semantic Indexing

  • number of factors: 100

  • similarity measure: Cosine

Return to main page    Step 2