Data Analysis Case Problem – Week 7

Review the attached Excel spreadsheet and answer all the Case Study respective questions.

Case Problem: Grey Code Corporation

Grey Code Corporation (GCC) is a media and marketing company involved in magazine and book publishing and in television broadcasting. GCC’s portfolio of home and family magazines has been a long-running strength, but they have expanded to become a provider of a spectrum of services (market research, communications planning, web site advertising, etc.) that can enhance their clients’ brands.

GCC’s relational database contains over a terabyte of data encompassing 75 million customers. GCC uses the data in its database to develop campaigns for new customer acquisition, customer reactivation, and the identification of cross-selling opportunities for products. For example, GCC will generate separate versions of a monthly issue of a magazine that will differ only by the advertisements they contain. They will mail a subscribing customer the version with the print ads identified by their database as being of most interest to that customer.

One particular problem facing GCC is how to boost the customer response rate to renewal offers that it mails to its magazine subscribers. The industry response rate is about 2%, but GCC has historically performed better than that. However, GCC must update its model to correspond to recent changes. GCC’s director of database marketing, Chris Grey, wants to make sure GCC maintains its place as one of the top achievers in targeted marketing. The file GCC contains 99 variables (columns) and 50,000 rows (distinct customers).

Play the role of Chris Grey and construct a classification model to identify customers who are likely to respond to a mailing. Write a report that documents the following steps:

  1. Explore the data. This includes addressing any missing data as well as treatment of variables. Variables may need to be transformed. Also, because of the large number of variables, you must identify appropriate means to reduce the dimension of the data. In particular, it may be helpful to filter out unnecessary and redundant variables.
  2. Partition the data into training, validation, and test sets.
  3. Experiment with various classification methods and propose a final model for identifying customers who will respond to the targeted marketing.
    1. Your report should include a chart of the Class 1 and Class 0 error rates for various values of the cutoff probability.
    2. Recommend a cutoff probability value. For the test set, what is the overall error rate at this value? What are the Class 1 and Class 0 error rates at this value?
    3. If GCC sends the targeted marketing to the model’s top decile, what is the expected response rate? How does that compare to the average industry rate?