My Project 2 Experience

Automation, Predictive Modeling, and Partners? Oh my!

Usually, I’m not too thrilled to work with a partner on a project. I’ve always been the one to end up with the heaviest workload, often because by eye for detail allows me to catch mistakes when reviewing the group’s work. I’m always nervous that my partner will do a lousy job and I’ll have to put in twice the effort.

This project was far from that scenario! For this project, my partner Ryan Bunn and I analyzed a news popularity dataset, producing multiple files reporting analyses for each of the six different data channels: lifestyle, entertainment, social media, business, tech, and world. We went through the process of reading in the dataset, data manipulation and variable creation, summary table creation, data visualization, and model fitting using linear regression, random forest, and boosted tree methods. At the end of each document, a “best model” was declared.

It was a pleasure to collaborate with Ryan on this project and I appreciate the opportunity for my view of group work to change. There is nothing that I would change about the process or the product of this project. Throughout the development process, each of our tasks as collaborators were clear and each person completed them in a timely manner. Communication between Ryan and I was always thorough and clear, allowing for a smooth workflow.

The most difficult part for me was getting excited about this data. During my last project, it was easy to visualize trends in the data during graph creation. However, this dataset was more complex, having many observations, interactions, and relationships between variables that were less pronounced. Exploring the variables absorbed a lot of my time because it was difficult to find variables that created an interesting scattterplot, histogram, etc.

My biggest takeaway from this project was the importance of doing model comparison on a test set. Sometimes the random forest or boosted tree model would appear to perform better on the training set we created, but one of the linear regression models would rise to the top when tested on the test set. This was surprising to me, but it exposed the value of having a test set to accurately measure the efficacy of each model.

Overall, my experience with this project was a pleasant one and I’m grateful to have been paired with someone so easy to work with. I’m excited to tackle the next project!

Visit our Project 2 Repository here

Visit the Project 2 landing page here

Written on October 30, 2021