5.4 Predictive Coding: Bringing Text Analytics to the Courtroom (Recitation) | 5.4 Predictive Coding: Bringing Text Analytics to the Courtroom (Recitation) | 5 Text Analytics | The Analytics Edge | Sloan School of Management

<Video 2: The Data
5.4.1Welcome to Recitation 5
5.4.2Video 1: The Story of Enron
5.4.3Video 2: The Data
5.4.4Video 3: Pre-Processing
5.4.5Video 4: Bag of Words
5.4.6Video 5: Building Models
5.4.7Video 6: Evaluating the Model
5.4.8Video 7: The ROC Curve
5.4.9Video 8: Predictive Coding Today
>Video 4: Bag of Words

Video 3: Pre-Processing

Important Note: In the following video, we ask you to use the "tm" package to perform the pre-processing steps. Due to function changes that occurred after this video was recorded, you will need to run the following command immediately after converting all of the words to lowercase letters (it converts all documents in the corpus to the PlainTextDocument type):

corpus = tm_map(corpus, PlainTextDocument)

Then you can continue with the R commands as they are in the video.

Flash and JavaScript are required for this feature.

> Download from Internet Archive (MP4 - 6MB)

> Download English-US transcript (PDF)

> Download English-US caption (SRT)

If the code length(stopwords("english")) does not return 174 for you, then please run the line of code in stopwords (TXT) file, which will store the standard stop words in a variable called sw. When removing stop words, use tm_map(corpus, removeWords, sw) instead of tm_map(corpus, removeWords, stopwords("english")).