Dear Rajarshi,
I hope you are doing good.
To find the top 5 words of each cluster, please follow below steps:
Create a for loop in which you will
-
Make a subset for the records belonging to each cluster i.e assign all the records of a particular cluster into a variable
- apply TermDocumentMatrix on each subset,
- Inspect the elements in it
- Find the count of the words
- and then output the Log Group, Log Count, Top Words,Word Count, Counter into a file.
Log Group -> displays the cluster number to which a particular word belongs to.
Log Count -> It represents the rows present in that cluster
Top Words -> display the top 5 words present in that particular cluster
Word Count -> displays the frequency of each top word.
Counter -> displays the iteration counter of the for loop.