BUAN 201 Class project

For this project you are asked to find data that you think can answer a question of interest. You can use government data, financial market data, NGO provided data, sports team data, census data, or any other data you can find. There is plenty of data freely available on the web, and there is more available through the school (Bloomberg, CapIQ). Some websites containing large quantities of easily accessible data are listed below.

Federal Reserve Bank of St Louis


The US Government’s Open Data Initiative


World Bank Data


IMF data


CT State Data


NYC Data


I generally caution against obtaining hand collected data because it is difficult to check the accuracy, and it is time consuming to construct a large sample. For complex analysis 30 observations of all variables is often considered a minimum, significantly more is preferred. Only hand collect data if it answers a truly unique question.

Data for various sports can be found online, but there are generally different sources and formats for each sport, making it difficult to provide one link that will fit all inquiries. You are welcome to google for more info, and use this data if it is helpful.

To complete this project you will need to submit a (short) academic style research paper. This will consist of the following (numbered) sections. You do not need to break down the lettered sub portions in the paper, these are intended to act as a guide. For a sample paper to illustrate reasonable formatting please see the Undergraduate Research Paper I posted to blackboard. The sample research paper conducts more elaborate analysis than you are expected to for this course, but it hopefully gives you an idea of what is possible if you take further courses in analysis/metrics/quant/statistics.

If you are completing a similar paper for another course it is expected that you choose a different topic for each class! (I will check with other professors, no double dipping).

Paper structure

Cover page including title, author and abstract (1 page)

  1. The title should indicate the question/topic being addressed
  2. Your name should be listed
  3. The abstract should be 5-10 lines long, and should summarize the importance of the question and the answer (convince the reader this paper is worth reading).
  1. Introduction (expected 2+ pages)
    1. Frame the problem, explain why the reader should be interested
    2. Describe the other work that has been done in this area, and how it relates to your research (use google scholar to find related work). You may link to non-academic articles and sources (NGO reports and news articles for example) but there should be some scholarly work. Be sure to explain how your work enhances or extends the current work you have found in this area. This is otherwise known as a literature review.
  2. Data (expected 2-3 pages)
    1. Where did you get the data you will use to answer the problem
    2. What are the different variables (how are they constructed, we need to understand them before we use them), how are they measured, what are the problems with the data (missing data? Not measuring exactly what you want? Not recent enough?)
    3. Summary Statistics of the data to help the reader understand what is being used. You may use tables, histograms, bar charts, line charts, scatter charts, etc.
  3. Methodology (length highly variable depending on analysis completed)
    1. What analysis do you propose that would help to answer the problem you set out in the introduction? Why this analysis? If you’re plotting an equation, write it into the paper! Microsoft has a nice equation editor (insert>equation).
    2. Run the analysis. For this section you can use whatever type of analysis you feel is appropriate for the data you have access to. Often we will run a regression, either a standard linear regression or a time series regression.
  4. Results (1-4 pages depending on results)
    1. Present the results of the regression/analysis in a (or multiple) clear table(s). Be sure to present P-values, and variable estimates where appropriate. Use black and white tables (see example paper for appropriate academic formatting). You may include charts as well. If there are several questions you may want to include more than one regression or piece of analysis here.
    2. Explain the output of your analysis. Point out if it is statistically significant, and if it is economically significant (it is big enough to matter). Show how your model might be used to predict future data if it applies to your problem. Explain the R squared if you run a regression as a means to explain the predictive power of your model.
    3. Depending on the analysis run you may wish to provide evidence of the robustness of your models. This may include residual scatterpolts for example.
  5. Conclusion (1-2 pages)
    1. What did we learn, and why is it important
    2. If you were to get more data, have more time, and extend the work, what would you do and how would you do it to answer bigger/tougher/more important questions?
    3. If others want to work in this area, what do you think are some interesting questions in the field that should provoke further inquiry?
  6. References (1+ page)
    1. A list of all external sources you consulted before doing your analysis. Every reference in this list should have been mentioned in the text of the paper. Most of these will probably be mentioned in the literature review in the introduction.

For including tables and charts in the paper you may list them at the end of the document. Label them all (Table 1, Table 2, Chart 1, Chart 2, Chart 3 etc.) and then reference them in the text. This makes it easier to format the written work in MS word. See sample paper for an example.

In most cases tables should be simple, black and white, with minimal ink. Charts are generally in black and white also, but some color here is acceptable if it aids in the understanding of the data.

If you have questions about the assignment I recommend stopping in to see me to ask questions. If you stop in to ask a question I will not have time to read an 8 page paper, please come with specific questions you would like me to help with.

For the analysis you can use excel, or you can use software you may have learned in another class (python, SAS, R, SPSS, Matlab). I will not give extra credit for doing analysis that is more complicated than can reasonably be completed in excel, as that would be unfair to some students who have not covered other software.