Tuesday, December 2, 2008

note discrepancy

I found a discrepancy between the typed out notes and the slides, and could only speculate as to which one was correct. In the notes (for "Prediction and Validation", 3rd page, last sentence above Validation of models) it says
"As sample size increases and number of predictors INCREASE, these three types of R^2 grow closer..." (emphasis added).
The slides however (slide 17 i believe) says these three types "merge as sample size N increases and number of predictors (k) DECREASES"
It makes sense to me that k should decrease because an increase will take away from degrees of freedom, decreasing power. But i don't want to jump to conclusions...which is right? Thanks!!!

2 comments:

Kris said...

I believe that the most important thing to consider is that as sample size increases, your R^2 values will grow closer together. In fact, if you have an increase in sample size and an increase in predictors you can see an decrease in the difference between R^2, adj. R^2 and cross-validated R^2. I quickly computed this using a value for R^2 = .62, N = 50 and k = 2, which yielded an adj. R^2 of .60 and cross-validated R^2 of .59. When the number of predictors was increased to 4 and the sample size to 200, adj. R^2 = .61 and cross-validated R^2 = .60. While at smaller sample size changes you may see an increase in the difference between these values if you dramatically increase your predictors, generally the larger your sample size the closer your R^2 values will get, despite adding more predictors.

Mari said...

Congrats on the eagle eye! Thanks for providing the opportunity to correct this for everyone.

The problematic sentences from the typed notes read as follows:

"As sample size increases and number of predictors increase, these three types of R2 grow closer together. Similarly, the gap between them grows as k increases and as N decreases."

Obviously we have a problem here because the two sentences contradict each other.

As N increases and k decreases, the two merge. Note that when R = 1, R = R squared = Adjusted R square = Cross validated R square.

Similarly, as Kris pointed out, except at very small sample sizes, N will have the larger effect such that as N becomes large, R square approaches adjusted R square which in turn approaches cross validated R square.