SPSS Basics - Part 2

 

·       Running Statistical Procedures

·       Managing and Exporting Output

·       SPSS Command Syntax

 

 

Tuesday, April 22, 2009

Thursday, April 24, 2009

 

329 Carman Hall

 

 

This draft document last updated April 17, 2009

 

 

This document and related materials are available at http://www.lehman.edu/faculty/john/spss/

 


Presenter:        John Dono

                        ITR

                        john@lehman.cuny.edu

                        (718) 960-8338

 

 

SPSS workshop series and format
see
http://www.lehman.edu/docs/workshops/workshops.html

 

1.     Overview of SPSS

 

1.1.   Comprehensive collection of tools for data analysis, reporting, and data management
 

1.2.   Available versions and compatibility issues

1.3.   Licensing at Lehman

1.4.   Locations where SPSS is installed


 

2.     Opening an existing SPSS dataset (and review of Part 1)

2.1.   Sample datasets and related files may be found in Samples folder on desktop.

 

General Social Survey 2008 subset*               minigss8.sav

General Social Survey 2008 complete            gss2008.sav

                        Complete codebook:                           GSSCodeBook.pdf

                        Frequencies for subset:                       gssfreqs.pdf

      gssfreqs.htm

      gssfreqs.doc

      gssfreqs.spv

                        Variable list:                                        see page 10

 

Height-Weight            **                                            htwt.sav

                        Codebook:                                          see page 9

                        Frequencies;                                        htwt.spv

      htwt.doc

 

Health***                                                       health.sav

                        Codebook:                                          see page 9

                        Frequencies:                                        health.spv

      health.doc

 

 

*If you plan to use the GSS for serious work outside of this workshop, please visit the NORC website at http://www.norc.org and refer to the codebook, GSSCodeBook.pdf for usage guidelines, sampling techniques, question wording, coding schemes etc.

 

**Hypothetical data from Cody, Ronald P. and Smith, Jeffrey K. Applied Statistics and the SAS Programming Language. (p.15)

 

***Hypothetical data from Kleinbaum, David G. and Kupper, Lawrence L. Applied Regression Analysis and Other Multivariable Methods. (p. 60)

 

Other sources of high-quality data in SPSS format include ICPSR at University of Michigan of which CUNY is a member. Visit http://www.icpsr.umich.edu or contact William Bosworth (william.bosworth@lehman.cuny.edu, ext. 8465) for further information.

 

You can also find sample files, most of which are hypothetical and intended for instructional purposes, in the samples folder in the SPSS installation directory.  Descriptions may be obtained by searching for the phrase “sample files” in SPSS Help.


2.2.   Starting SPSS

2.3.   The Data Editor window

2.4.   Opening an existing SPSS-format data file (known as a “system file” in the old days) – htwt.sav
File > Open > Data

2.5.   .sav file extension for SPSS-format data files

2.6.   Structure of an SPSS data file – “spreadsheet-like” rectangular array or matrix with

Rows as cases (units of analysis, observations e.g. respondent to a survey, a company, participant in an experiment)

Columns as variables (measurements, responses, treatments on the units)

Values for particular cases on particular variables in cells at row-column intersection

 

2.7.   compare Data View and Variable View

 

 

3.     Frequencies Procedures

 

3.1.   Select statistical procedures appropriate to the type of variables you are working with and verify that your data meet the assumptions of the procedures (e.g. normal distributions, equality of variance).   Refer to reputable statistical texts and consultants if necessary.

 

3.2.   Use Frequencies to describing distribution of discrete variables (limited number of values or categories, nominal or ordinal “level of measurement”)

 

3.3.   In Part 1 we used Frequencies for data validation purposes – identifying outliers and illegal codes.

 

3.4.   Setting some global options to make output more informative

 

3.4.1.      Select Edit > Options

3.4.2.      On the General sheet,             select Display Names and Alphabetical under Variable Lists and Open only one data set at a time under Windows. Click on Apply.

 

3.4.3.      On the Output Labels sheet, select Names and Labels under Variables in item labels and select Values and Labels under Variable values in labels. Click on Apply.

 

3.4.4.      On the File Locations sheet, change Specified Folders for data and other files to point to the samples folder on your desktop.  Click OK.

 

3.5.   Run Frequencies on variables in height-weight dataset (htwt.sav)

 

3.5.1.      Select Analyze > Descriptive Statistics > Frequencies

 

3.6.   the Frequencies dialog box

 

3.6.1.      Selecting variables

3.6.2.      Selecting appropriate statistics

3.7.   SPSS output window and the SPSS viewer

3.8.   Reviewing results and navigating the SPSS viewer

3.9.   Retention of dialog box settings

3.10.                    Saving output in native SPSS output format

3.11.                    .spv file extension for output

3.12.                    Closing output window

3.13.                    Optional exercise: run Frequencies on some suitable variables from minigss8.sav (see page 10 for list of variables categorized) and save output


4.     Descriptives procedure

4.1.   Use Descriptives for describing distribution of continuous variables (many ordered categories, interval or ratio level of measurement)

4.2.   Open health.sav and click on Variable View to display variable information

4.3.   Run Descriptives on variables in dataset

4.4.   the Descriptives dialog boxes

4.4.1.      variables selection

4.4.2.      the Options subdialog box to select statistics

4.5.   Optional exercise: run Descriptives on age, educ and rincom06 from minigss8.sav but check frequencies on rincom06 first!

4.6.   Save and close Output window

 

5.     Crosstabs procedure

5.1.   Use Crosstabs to examine associations among categorical variables (variable with a limited number of possible values ordered or not)

5.2.   Open minigss8.sav (then close previously used datasets if still opened) and click on Variable View to display variable information

5.3.   Run Crosstabs to examine the association between happiness and highest degree earned

                  (happy * degree or happy by degree)

5.4.   the Crosstabs dialog box

5.4.1.      row/column variable selection procedures

5.4.2.      cells subdialog box to specify contents of cells

5.4.2.1.            decision regarding direction of percentaging  

5.4.2.2.            statistics subdialog box

5.5.   Run Crosstabs to produce the following tables

happy * agegroup

happy * sex

happy * marital

happy * health

happy * class

5.6.   Introducing additional variables into the analysis to explain or specify the bivariate relationship in a two-way table

5.7.   Run Crosstabs to produce the following table:

                              happy * marital * sex

5.8.   Optional exercise: pres04 by degree by sex

 

 

6.     Correlation

6.1.   Use Correlate to examine linear association among continuous variables (ordinal with many categories, interval, ratio level of measurement)

6.2.   Correlation may be positive or negative

6.3.   Run Correlate to obtain correlation of  height and weight in htwt.sav dataset

6.4.   Optional exercise: obtain scattergram to visualize the linear relationship

6.5.   Optional exercise: generate a correlation matrix from minigss8.sav on the following variables:

                  paeduc, maeduc, educ, rincom06

6.6.   Partial correlation procedures as analog of a three-way crosstabulation


 

7.     Some other procedures

7.1.   Comparison of means analysis with t-tests and Anova

7.2.   Linear Regression



8.     Managing output

8.1.   Working in the SPSS Viewer

8.2.   Navigating in the Viewer

8.3.   Editing in the Viewer (Save first!)

8.4.   Using Save As to save a modified version of output in SPSS format

8.5.   Compatibility issues and obtaining the legacy viewer

8.6.   Export output into alternative formats for further editing, presentation, distribution, publication etc.

8.6.1.      Acrobat format (.pdf extension)

8.6.2.      Web page format (.htm)

8.6.3.      Microsoft word format (.doc,.rtf)




9.     SPSS Command Syntax

9.1.   SPSS Viewer log

9.2.   Generating command syntax from dialog boxes using Paste

9.3.   The Syntax window

9.4.   Using the syntax windows to

9.4.1.      use options not available through dialog boxes

9.4.2.      save to rerun in current or later session

9.4.3.      edit then rerun

9.4.4.      document procedures

9.4.5.      simplify procedures when dialog boxes are too cumbersome

9.5.   Starting in the syntax window




 

 



10.            Learning more about SPSS

10.1.                    http://www.spss.com

10.2.                    Manuals in pdf format provided with license

10.3.                    Help > Tutorial etc.

10.4.                    Visit academic web sites, e.g.
 

           http://www.usc.edu/its/stats/spss/index.html

           http://www.usc.edu/its/stats/spss/index.html

 

Codebook for HTWT dataset

 

description                              variable name

 

Identification Number                        ID

 

Gender                                                GENDER

 

Height in inches                      HEIGHT

 

Weight in lbs                           Weight

 

 

 

 

 

Codebook for HEALTH dataset

 

description                              variable name 


Identification number             ID                   

Systolic blood pressure           SBP                
9999=missing

 

Quetelet index                                    QUET
9999=missing

 

Age in years                            AGE   

98 = 98 or more                                  

9999= missing
            .                                  

Smoking History                     SMK  

0 = nonsmoker
1 = current or previous smoker

9=missing

 

*Quetelet Index (a measure of size) = 100 * (weight/height**2)


 

Selected variables from 2008 General Social Survey

 

 

Demographic variables

 

age

agegroup

sex

marital

 

 

Economic variables

 

class

rincom06

class

union

 

 

Happiness

 

hapmar

happy

 

 

Education

 

educ

degree

 

 

Family background

 

paeduc

padeg

maeduc

madeg

 

 

Political variables

 

vote04

pres04

partyid

polviews