Artificial Intelligence and Data Mining

Lecture 15 - February 8th, 2017

Data collected and stored at enormous speeds (GB/hour)
- remote sensors on a satellite

Other Data Sources?

Example:

Predict if an email is spam or not
Train a model
- Show the computer a bunch of emails, some of which are spam, some of which are not
- Tell the computer which words appear in each email
- Computer learns words that tend to appear with spam emails, or with not-spam emails.
Test the model
- Show the computer a new email
- Check if the computer predicts the right class

Learning

We can think of at least three different problems....

Ex. Imagine I'm trying predict wether my neighbour is going to drive into work, so I can ask for a ride.

Memorize vs New Data

Temp	Precip	Day	Clothes
25	None	Sat	Casual	Walk
-5	Snow	Mon	Casual	Drive
15	Snow	Mon	Casual	Walk
-5	Snow	Mon	Casual	?

Temp	Precip	Day	Clothes
25	None	Sat	Casual	Walk
25	None	Sat	Casual	Walk
25	None	Sat	Casual	Drive
25	None	Sat	Casual	Drive
25	None	Sat	Casual	Walk
25	None	Sat	Casual	Walk
25	None	Sat	Casual	Walk
25	None	Sat	Casual	?

Average the data when it's noisy data

Decision Tree
- Predict by splitting on attribute values
- ID3
  - Normal procedure: top down in a recursive divide-and-conquer fashion
  - Process Stops

Decision Tree

AI Experience (withGoogle)

Attribution Selection:

Measuring Purity with Entropy:

Entropy

Bonus:

Trump's Twitter: