**Question A**

- The
*tvshows*data set from AI:FCA-1 (exercise 7.3) is shown below:

Assume that we are reporting errors based on the absolute error value (basically, each error counts as 1). Gove very brief answers to the following questions:

- What is the optimal tree with one node only? What is the associated error?
- What is the optimal tree with a depth of 2 (i.e. a root node with leaves as children)? What is the associated error? Which instances end up at each leaf?
- What is the smallest tree that classifies correctly all training instances? How will it classify a new instance described as (
*Comedy*=*true*,*Doctors*=*true*,*Lawyers*=*true*,*Guns*=*true*) and another one as (*Comedy*=*false*,*Doctors*=*false*,*Lawyers*=*true*,*Guns*=*true*)? Which of the two test instances allow us to say that the tree is able to generalize? - If you were building the tree using the information gain as a splitting criterion, what would be the root?