Sunday, December 7, 2008

Testing Assumption of Homoskedasticity

After creating my scatterplot, i would like to test the ratio of highest to lowest variance. I am not sure how to split the file- the options within this drop down menu do not seem to include the ability to separate the predictor (or is it the predicted values i use?) into meaningful levels. Is there another step i am missing? I messed up my data file once already trying to accomplish this and had to red0 everything all over again :-(

Also, is there a generally accepted cut-off point when you have missing data (i.e. if the missing data represents < 5% of the total sample) that is applied when deciding to omit participants? Predicting values has already proven to be difficult due to more missing data in the alternative variables used.

4 comments:

Mari said...

You could create a new variable by recoding your existing variable into categories (use recode --> into different variable --> and then define the ranges you'd like to separate. You'll see options for LOWEST THROUGH ____, ____ to ____, and ____ THROUGH HIGHEST, or some similar wordings.

Using some combinations of these (say lowest through 1.5, 1.51 through 2.5, 2.51 through 3.5, and 3.51 through highest, just to draw numbers completely at random), you could create a variable that seems to capture the distribution of the X axis of your scatterplot well (i.e., relatively equal groups, and enough groups that the spread of the data is well represented...so you might have multiple commands with values in it in the middle, like 2.51 to 2.75 and 2.76 to 3.50). (Note that values that fall between named values will generate missing data. So, for instance, in the example above, a value of 1.50001 would generate a missing value for the new variable because it is neither 1.50 nor 1.51, but between those two values, so if you choose to go this route, chose your values VERY carefully.)

Having created this new variable, you'd use the split file command to split the file on on the new variable, and then use frequencies to ask for the variance of the Y axis variable...

...OR you could just look at the scatterplot and determine if you think have an opening fan pattern or some other clear deviation.

(I do remember that your version of SPSS is giving you problems with the scatterplot, but in this case, it might be worth using the computer lab for this one test of assumptions.)

In any case, you will want to attach your printouts and describe what you did in an accompanying paragraph. I'm mostly interested here in what you recognize as important to examine, and what steps you identify as important in making such examinations. If you have a marginal distribution, and describe it well--as well as your decisions about whether or not to transform your data--that will be absolutely fine...whether I would have transformed those variables or not if it were my study.

Mari said...

BTW, if it were me, I would go with the scatterplot. In regression, where creating the correct grouping in a new variable is at least as much of a judgment call as inspecting the scatterplot, most people use the 10:1 ratio as a visual guide rather than a tested value.

Bettyvs said...

Thanks for the clarifications. By the way, i solved the problems with generating a scatterplot so I am good to go!

Kris said...

Testing, 1, 2, 3