Now that we have a dataset, we would like to train a classifier. Since the files in question are scripts, we approach the problem as an NLP problem.