Wednesday, December 10, 2008

Missing Values

Mari said earlier that
Absolutely you should drop any participant who is missing both predictors and outcome.


I have 2 predictors and 1 outcome.
In the event that the outcome and one predictor is missing values do I delete the case even if the other predictor has an value listed?

2 comments:

Mari said...

You don't have to delete the case. SPSS will, by default, drop any observation that has missing data on either X or Y variables.

The bigger question is whether you should replace the missing data.

It is best to replace missing data when only a small percentage of the data are missing, and small is defined in both of the following ways.

First, only a few participants are missing the data in question. That is, if it is a question that many people refused to answer, you should worry about the representativeness of the sample who did answer (why did they agree to answer when many or most did not?) and/or about the question itself (was it too confusing/intrusive/offensive/personal for people to feel comfortable answering it)?

Second, I'd define small in terms of the proportion of variables you need to replace in your model. If you have three variables, and all three need some missing data replacement, that's a bit concerning. (Don't worry if you already replaced missing data on all variables--you won't be marked down for it, but read on for the concern.) Every time you replace a missing data point, you must get a bit more uncertain about the validity of your data. That is, the more you have to extrapolate values that weren't really there, the more concerned you need to be about how well your conclusions reflect the data you actually have.

There is one other thing to note. Make sure that the data you are replacing aren't meaningfully missing. That is, there were relationship variables in the data set (e.g., the degree to which the participant trusts the romantic partner) that could not meaningfully be completed by someone who had not been in a romantic relationship. It would not be appropriate to estimate those data points.

Mari said...

So, my final answer would be that I would not replace data for the participant missing 2 out of 3 variables in your equation.

(But don't worry if you did, as long as you reported what you did in the write up.)