Books versus Movies - Data Science for User Researchers

Human Centered Design & Engineering, Master's Program

[Original] Research Questions:

  1. Do more people read a book before or after a movie based on that book has come out?
  2. Which have better reviews: books or movies based off those books?

Class Objective:

Introduces widely-adopted programming and data science tools in order to use data to answer questions about the characteristics, behaviors, and needs of people who use a wide variety of products.

  • Write or modify a program to collect a dataset from Wikipedia or the City of Seattle’s open data portal (Data.Seattle.gov)
  • Effectively read web API documentation and write Python software to parse and understand a new and unfamiliar JSON-based web API
  • Understand database schemas and use MySQL to extract user data from relational databases
  • Use web-based data to effectively answer a substantively interesting question and to present this data effectively in the context of both a formal presentation and a written report

Process:

  • Develop intriguing initial research questions based on personal interest and existing API's with relevant data
  • [Re]Learn Python to grab and manipulate API data in XML and JSON formats
  • Re-evaluate and scope research questions based on available API data and formatting issues
  • Grab API data from sources using Python into Excel
  • Analyze data in Excel using graphs
  • Analyze data in Tableau

Revised Research Questions:

Based on the limitations and restrictions of the data, along with time restraints, I had to adjust my research questions to the following:

  1. Is there a correlation between the Goodreads rating for a book that has become a movie and the Wikipedia page edits for that movie?
  2. What genres of books are most commonly adapted to films?

API's Used:

  • Wikipedia
    • Retrieve list of films based on books
    • Gather film name, release data, genre, and edit counts for each film page
  • Goodreads
    • Retrieve book from list of film titles
    • Gather book title, average ratings for a book, number of ratings for a book, and original publishing year

Findings:

  1. No correlation between the Goodreads rating for a book that has become a movie and the Wikipedia page edits for that movie. See Tableau graph below.
  2. The genre of book most commonly adapted to film is drama! Followed by comedy and romance. See excel graph below.

 

tab1.PNG

Limitations:

  • Could not verify if the book title was actually used for the identified film match - had to match the names exactly instead
  • Limited recent data in the Wikipedia category "Films based on novels", which was used for the base list
  • Unknown relationship and lack of data to compare Goodreads readers and Wikipedia editors
  • Limited to none Wikipedia pages for books / novels that matched the list of "Films based on novels"