Dear Rajarshi,

I hope you are doing good.

To find the top 5 words of each cluster, please follow below steps:

Create a for loop in which you will
  • Make a subset for the records belonging to each cluster i.e assign all the records of a particular cluster into a variable
  • apply TermDocumentMatrix on each subset, 
  • Inspect the elements in it 
  • Find the count of the words 
  • and then output the Log Group, Log Count, Top Words,Word Count, Counter into a file.

Log Group -> displays the cluster number to which a particular word belongs to.
Log Count -> It represents the rows present in that cluster
Top Words -> display the top 5 words present in that particular cluster
Word Count -> displays the frequency of each top word.
Counter  -> displays the iteration counter of the for loop.