stat 579
Homework: week 12
Due in class, Tuesday/Thursday Nov 16/18.
Spam Detector
The file email.rdata contains 200 emails from the ENRON email database. The first 100 are regular emails, the second 100 are spam.
We want to process these emails in several steps.
- Write functions that allow you to extract sender, subject and date received of an email.
- Write functions that extract from a character string
- the ratio of upper case to lower case letters
- true/false for the presence/absence of a key word
- Process the emails from the ENRON database with the help of your functions, i.e. in a first step summarize all emails by sender, subject, and date. Then further process subject lines. Think of five keywords that might allow you to distinguish between spam and regular email. Report percentages of regular email/spam for each of these keywords.
Deliverables:
Submit a commented R script of your code (use the filename firstname-lastname-hw12-X.R where X is either A or B depending on the section you're in (Tuesday is B, Thursday is A).
Great Answers: