Wednesday, December 2, 2009

Plots to test assumptions

I'm a little confused on the which assumptions are being tested & with what plots? Can you correct me/give me more info on the following:
Assumptions 1 & 2 we don't look at any plots for
Assumption 3: we look at a partial regression plot...is it the same one as for homoscedasicity??
Assumption 4:I have no idea!
Assumption 5: Homoscedasicity...a residual plot...is this the same as a partial regression plot?
Assumption 6: Histogram and Normal p-p plot?

Also, what is a bivariate scatterplot & is this different from a partial regression plot? do we ever use a bivariate scatterplot?

3 comments:

Grace Liu said...

Assumption 1 and 2: correct, they are conceptual.

Assumption 3: you actually do not look at the same graph that you look at for homoscedasticity. If I am getting this right, I can see why you were thinking about that because Sung got the same bullet point on his handout: "Plot the residuals (i.e., errors) on the Y axis and the predicted values (Y hat) on the X axis." For the plot that you look at for assumption 3, you will get a line indicating the relationship between each IV and the DV. So, you will have separate plots for each IV. If the line is not straight, let's say it's a bell curve, the relationship between IV and DV is not clear and cannot explained by LINEAR regression, hence it violate assumption 3.

assumption 4: it basically means that the data points of your DV is not independent from each other. Let's say your DV is marriage satisfaction, and your subjects are husbands and wives. They data that you get is not independent from each other because husband and wives share the same environment and their shared environment probably affects marriage satisfaction. So, the assumption is not violated when we're predicting grad. school GPA from college GPA just because all of you went to different colleges.

assumption 5: you actually don't look at partial regression plot for this assumption. you look at a plot which has residuals around the line (y = 0). y = 0 when the predicted values is exactly the same with the actual score. so, this time you really have to look at the plot with "the residuals (i.e., errors) on the Y axis and the predicted values (Y hat) on the X axis"

assumption 6: you're right! just look at Q-Q plot and histogram with a normal curve of the residuals.

Bivariate scatter plot does not take other independent variables into account while a partial regression plot take other IVs into account, ie. holding other IVs constant, how the regression is like for that one particular IV predicts the DV in multiple regression. So, a bivariate scatter plot is different from a partial regression plot in multiple regression (2 or more IVs) while not in simple regression (1 IV).

Hope this helps!

Nikki Frederick said...

Yes, that helps! My only remaining question is: when you do use a bivariate scatterplot and/or when do you use a partial regression plot? Assumption 3?

Grace Liu said...

bingo! you use the partial regression plot to find out whether your data violate assumption #3 or not.

You get one partial regression plot for each IV that you have in your model. It tells you the relationship between that particular IV and the DV and it shows whether the relationship is going in one direction or going in two directions. So, let's say that you gets a bell shape pattern of residuals (not the line/slope!) on the partial regression plot, it is obvious that the relationship between the one IV and the DV goes to two different direction, hence, it violates the assumption of correct form of relationship.