Quick Question
Given a corpus in R, how many commands do you need to run in R to clean up the irregularities (removing capital letters and punctuation)?
How many commands do you need to run to stem the document?
Explanation
In R, you can clean up the irregularities with two lines:
corpus = tm_map(corpus, tolower)
corpus = tm_map(corpus, removePunctuation)
And you can stem the document with one line:
corpus = tm_map(corpus, stemDocument)