SPSS Basics - Part 2
· Running Statistical Procedures
· Managing and Exporting Output
· SPSS Command Syntax
Tuesday, April 22, 2009
Thursday, April 24, 2009
329 Carman Hall
This draft document last updated April 17, 2009
This document and related materials are available at http://www.lehman.edu/faculty/john/spss/
Presenter: John Dono
ITR
(718) 960-8338
SPSS workshop series and format
see http://www.lehman.edu/docs/workshops/workshops.html
1. Overview of SPSS
1.1. Comprehensive
collection of tools for data analysis, reporting, and data management
1.2. Available
versions and compatibility issues
1.3. Licensing
at Lehman
1.4. Locations
where SPSS is installed
2. Opening an existing SPSS dataset (and review of Part 1)
2.1. Sample datasets and related files may be found in Samples folder on desktop.
General Social Survey 2008 subset* minigss8.sav
General Social Survey 2008 complete gss2008.sav
Complete codebook: GSSCodeBook.pdf
Frequencies for subset: gssfreqs.pdf
gssfreqs.htm
gssfreqs.doc
gssfreqs.spv
Variable list: see page 10
Height-Weight ** htwt.sav
Codebook: see page 9
Frequencies; htwt.spv
htwt.doc
Health*** health.sav
Codebook: see page 9
Frequencies: health.spv
health.doc
*If you plan to use the GSS for serious work outside of this workshop, please visit the NORC website at http://www.norc.org and refer to the codebook, GSSCodeBook.pdf for usage guidelines, sampling techniques, question wording, coding schemes etc.
**Hypothetical data from Cody, Ronald P. and Smith, Jeffrey K. Applied Statistics and the SAS Programming Language. (p.15)
***Hypothetical data from Kleinbaum, David G. and Kupper, Lawrence L. Applied Regression Analysis and Other Multivariable Methods. (p. 60)
Other sources of high-quality data in SPSS format include ICPSR at University of Michigan of which CUNY is a member. Visit http://www.icpsr.umich.edu or contact William Bosworth (william.bosworth@lehman.cuny.edu, ext. 8465) for further information.
You can also find sample files, most of which are hypothetical and intended for instructional purposes, in the samples folder in the SPSS installation directory. Descriptions may be obtained by searching for the phrase “sample files” in SPSS Help.
2.2. Starting SPSS
2.3. The Data Editor window
2.4. Opening
an existing SPSS-format data file (known as a “system file” in the old
days) – htwt.sav
File > Open > Data
2.5. .sav file extension for SPSS-format data files
2.6. Structure of an SPSS data file – “spreadsheet-like” rectangular array or matrix with
Rows as cases (units
of analysis, observations e.g. respondent to a survey, a company, participant
in an experiment)
Columns as variables (measurements, responses, treatments on the units)
Values for particular cases on particular variables in cells at row-column
intersection
2.7. compare Data View and Variable View
3. Frequencies Procedures
3.1. Select statistical procedures appropriate to the type of variables you are working with and verify that your data meet the assumptions of the procedures (e.g. normal distributions, equality of variance). Refer to reputable statistical texts and consultants if necessary.
3.2. Use Frequencies to describing distribution of discrete variables (limited number of values or categories, nominal or ordinal “level of measurement”)
3.3. In Part 1 we used Frequencies for data validation purposes – identifying outliers and illegal codes.
3.4. Setting some global options to make output more informative
3.4.1. Select
Edit > Options
3.4.2. On the General sheet, select Display Names and Alphabetical under Variable Lists and Open only one data set at a time under Windows. Click on Apply.
3.4.3. On the Output Labels sheet, select Names and Labels under Variables in item labels and select Values and Labels under Variable values in labels. Click on Apply.
3.4.4. On the File Locations sheet, change Specified Folders for data and other files to point to the samples folder on your desktop. Click OK.
3.5. Run Frequencies on variables in height-weight dataset (htwt.sav)
3.5.1. Select Analyze > Descriptive Statistics > Frequencies
3.6. the Frequencies dialog box
3.6.1. Selecting variables
3.6.2. Selecting
appropriate statistics
3.7. SPSS
output window and the SPSS viewer
3.8. Reviewing
results and navigating the SPSS viewer
3.9. Retention
of dialog box settings
3.10.
Saving output in native SPSS output
format
3.11.
.spv file extension for output
3.12. Closing output window
3.13.
Optional exercise: run
Frequencies on some suitable variables from minigss8.sav (see page 10 for list
of variables categorized) and save output
4. Descriptives procedure
4.1. Use Descriptives for describing distribution of continuous variables (many ordered categories, interval or ratio level of measurement)
4.2. Open health.sav and click on Variable View to display variable information
4.3. Run Descriptives on variables in dataset
4.4. the Descriptives dialog boxes
4.4.1. variables selection
4.4.2. the Options subdialog box to select statistics
4.5. Optional exercise: run Descriptives on age, educ and rincom06 from minigss8.sav but check frequencies on rincom06 first!
4.6. Save and close Output window
5. Crosstabs procedure
5.1. Use Crosstabs to examine associations among categorical variables (variable with a limited number of possible values ordered or not)
5.2. Open minigss8.sav (then close previously used datasets if still opened) and click on Variable View to display variable information
5.3. Run Crosstabs to examine the association between happiness and highest degree earned
(happy * degree or happy by degree)
5.4. the Crosstabs dialog box
5.4.1. row/column variable selection procedures
5.4.2. cells subdialog box to specify contents of cells
5.4.2.1. decision regarding direction of percentaging
5.4.2.2. statistics subdialog box
5.5. Run Crosstabs to produce the following tables
happy * agegroup
happy * sex
happy * marital
happy * health
happy * class
5.6. Introducing additional variables into the analysis to explain or specify the bivariate relationship in a two-way table
5.7. Run Crosstabs to produce the following table:
happy * marital * sex
5.8. Optional exercise: pres04 by degree by sex
6. Correlation
6.1. Use Correlate to examine linear association among continuous variables (ordinal with many categories, interval, ratio level of measurement)
6.2. Correlation may be positive or negative
6.3. Run Correlate to obtain correlation of height and weight in htwt.sav dataset
6.4. Optional exercise: obtain scattergram to visualize the linear relationship
6.5. Optional exercise: generate a correlation matrix from minigss8.sav on the following variables:
paeduc, maeduc, educ, rincom06
6.6. Partial
correlation procedures as analog of a three-way crosstabulation
7. Some other procedures
7.1. Comparison of means analysis with t-tests and Anova
7.2. Linear
Regression
8. Managing output
8.1. Working in the SPSS Viewer
8.2. Navigating in the Viewer
8.3. Editing in the Viewer (Save first!)
8.4. Using Save As to save a modified version of output in SPSS format
8.5. Compatibility issues and obtaining the legacy viewer
8.6. Export output into alternative formats for further editing, presentation, distribution, publication etc.
8.6.1. Acrobat format (.pdf extension)
8.6.2. Web page format (.htm)
8.6.3. Microsoft word format (.doc,.rtf)
9. SPSS Command Syntax
9.1. SPSS Viewer log
9.2. Generating command syntax from dialog boxes using Paste
9.3. The Syntax window
9.4. Using the syntax windows to
9.4.1. use options not available through dialog boxes
9.4.2. save to rerun in current or later session
9.4.3. edit then rerun
9.4.4. document procedures
9.4.5. simplify procedures when dialog boxes are too cumbersome
9.5. Starting in the syntax window
10.
Learning more about SPSS
10.1.
http://www.spss.com
10.2.
Manuals in pdf format provided with
license
10.3.
Help > Tutorial etc.
10.4.
Visit academic web sites, e.g.
http://www.usc.edu/its/stats/spss/index.html
http://www.usc.edu/its/stats/spss/index.html
Codebook for HTWT dataset
description variable name
Identification Number ID
Gender GENDER
Height in inches HEIGHT
Weight in lbs Weight
Codebook for HEALTH dataset
description variable name
Identification number ID
Systolic
blood pressure SBP
9999=missing
Quetelet
index QUET
9999=missing
Age in years AGE
98 = 98 or more
9999=
missing
.
Smoking History SMK
0
= nonsmoker
1 = current or previous smoker
9=missing
*Quetelet Index (a measure of size) = 100 * (weight/height**2)
Selected variables from 2008 General Social Survey
Demographic variables
age
agegroup
sex
marital
Economic variables
class
rincom06
class
union
Happiness
hapmar
happy
Education
educ
degree
Family background
paeduc
padeg
maeduc
madeg
Political variables
vote04
pres04
partyid
polviews