Dimensionality Reduction Module (dr_gui)

Assume we have processed a collection with tmg_gui, construct a TDM with 1,033 documents and 12,184 terms (corresponding to the well-known MEDLINE collection) and store the results to "TMG_HOME/data/medline". Assume then, we want to construct a low-rank approximation of the TDM, using the Clustered Latent Semantic Indexing (CLSI) technique for the following input:

  • compute SVD with: MATLAB (svds)

  • clustering algorithm: PDDP

  • principal directions: 1

  • maximum number of PCs: -

  • variant: basic

  • automatic determination of num. of factors from each cluster: yes

  • number of clusters: 10

  • number of factors: 100

and you want to store results at directory "medline".

Return to main page    Step 2