Coursera Data Science Specialization Capstone Project – thoughts

Finally, I am at the capstone project — after three years of on and off working on this coursera specialization, I am finally here.

The project is to give you a set of text documents, asking you to mine the texts, and come up your own model. So far, I am on week 2. I haven’t dived into the project deep enough yet, so don’t know how exactly I am going to mine the texts, and what kind of model I will be using. But since I was working on preparing for our 3-minute presentation of “what is your passion” last week, for our Monday team retreat at Leadercast, I came across the Maslow’s Needs Hierarchy. I think it would be neat to look at words in each level of the hierarchy, and see how frequent people use words in each hierarchy in their daily blog posts, tweets, and news.

Maslow's Hierarchy

To do this, I need to:

  1. Obtain a dictionary and have all words categorized into Maslow’s hierarchy
  2. Run all words in the files against the dictionary to determine which hierarchy they belong to.
    1. Calculate the frequency of each unique word
    2. Calculate the frequency of each level
  3. It would be fun to look at the frequency of each level in general; then look at the correlations between each level.

