Thursday, December 4, 2008

Final Project: Missing Values

My outcome variable is missing 8 values. I used a multiple regression with my outcome and two other variables that are significantly related to my outcome but aren’t my variables of interest. Yet, my outcome variable is still missing 8 values because those 8 cases were also missing values for the two significantly related variables. Next, I used the regression equation (Transform à Computer Variable) and formed a “predictedoutcome” variable a follows:

Predictedoutcome – B + (B * variable) + (B * variable)

My new predictedoutcome variable is still missing 8 values.

Where shall I go from here? Am I supposed to continue to look for additional variables that are significantly related to my outcome variable? Should I drop a case (subject) if there are missing values on all three of my variables of interest (i.e., my two predictors and outcome)?



Also, can you please specifically explain how to merge two variables on SPSS? I tried merging two variables in “Compute Variable” under the “Function group” but received an error messsage.

Thank you.

3 comments:

Mari said...

If your participants are missing values not only for the variable you want to use in your regression, but also for other variables that you tried to use to predict those variables, SPSS cannot estimate what the variables should be. That is, it cannot predict missing data from more missing data.

In this case, it would be in your best interest to determine whether or not you really think these 8 participants are worth keeping, if they have this much missing data. That is, if they didn't answer the questions you are interested in and they also didn't answer several other questions in the data set that you were trying to use to predict the missing values, why are those data points missing? Are they all about relationships, for instance, in which case the possibility exists that they haven't been in a relationship? Or are they all questions about their father, in which case they might not answer because they don't have a relationship with their father? In either of these cases, replacing their missing data would not be a good idea at all.

Absolutely you should drop any participant who is missing both predictors and outcome.

Without seeing your SPSS output, I'm not sure where your error on creating a new variable is. The merge command in SPSS is used to merge two or more separate files.

To take two variables and make them into one, you'll need two transform steps. (Note that I do not have access to SPSS from this computer, so cannot give you syntax as exact as in class handouts...Check those for the correct syntax of the transform command.)

"Transform --> compute --> newvariablename = oldvariablename" is your first step.

Then "transform --> compute --> newvariablename = predictedvariablename IF missing oldvariable name" is your second step. The if command is, I believe, in the lower left corner of the popup window. Note that newvariable name is the same name used in step 1.

After your second step, you will, as you may or may not remember from class, get a message that says something like, "Change existing variable?" to which you say OK or Yes or whatever affirmative answer SPSS gives you the option for. As long as you have correctly told the computer that you only want to replace variables for which the original data was missing, it will not overwrite the values copied over in step 1.

timothykw said...

Perfect, thanks for answering both my questions!

Mari said...

Glad it helped...