SPSS Basics - Part 1
Creating, Importing, Reading and Validating SPSS Data Files
March 31, 2009
April 2, 2009
Draft
March 30, 2009
This document and related materials will be available at http://www.lehman.edu/faculty/john/spss/
1.
About SPSS
a. Comprehensive
collection of tools for data analysis, reporting, and data manipulation
b. SPSS
workshop series and format
c. Available
versions
d. Licensing
at Lehman
e. Locations
where SPSS is installed
2.
An example of data analysis using SPSS
and NORC’s 2008 General Social Survey (GSS)
a. For
further information on the GSS including a detailed codebook, visit http://www.norc.org
b. Starting
SPSS for Windows
c. SPSS
interface (menus, dialog boxes, and other standard Windows features)
d. HELP!
e. Opening
SPSS window for data entry and display – Data Editor
f. Opening
an existing SPSS-format data file (“system file”)
g. .sav
file extension for SPSS data files
h. Data
View/Variable View
i. Basic structure of an SPSS-format data file (units of analysis, cases, or observations as rows; columns as variables or measures; cells contain values)
j. Subset of variables from the 2008 General Social Survey
1) age
2) attend (religious services)
3) class
4) confinan (confidence in banks/financial institutions)
5) consci
6) degree
7) educ
8) fatalism
9) finrela
10) geomobil
11) getahead
12) happy
13) health
14) income
15) intecon (interest in economics)
16) life (exciting/dull)
17) marital
18) newsfrom
19) partyid
20) polviews
21) pres04
22) racecen1
23) region
24) relig
25) satfin
26) sex
27) trust
28) vote04
29) wrkstat
k. Running
statistical procedures (Frequencies, Descriptives, Crosstabs, Means) to answer
questions about the data
l.
SPSS window for output – Output/viewer window
m. Reviewing
results
n. Saving
output
o. .spv file extension for output
3. Creating your own SPSS data files in the Data Editor
a. The following hypothetical height/weight data will be used:
ID GENDER HEIGHT WEIGHT
(inches) (pounds)
1 M 70 155
2 F 61
3 F 64 125
4 M 175
5 M 72 180
6 M 69 170
7 F 65 115
8 M 77 200
9 F 68 140
10 M 70
Note that some measurements are missing.
b. Define variables in Data Editor-Variable View
Name ID GENDER HEIGHT WEIGHT
Type Numeric String Numeric Numeric
Width 1
Decimals 0 0
Variable labels Identifi cation Height in inches Weight in lbs
Number
Value labels M Male
F Female
Missing values 99 999
c. Enter the following data in the Data Editor - Data View
ID GENDER HEIGHT WEIGHT
1 M 70 155
2 F 61 999
3 f 64 125
4 M 99 175
5 72 180
6 m 69 170
7 F 65 115
8 M 775 200
9 F 68 140
10 M 70
d. Structure
of an SPSS data file – rectangular array or matrix with
Rows as cases
(Units of analysis, observations)
Columns as variables
Values for particular cases on particular variables in cells at row-column
intersection
e. System missing versus user-defined missing values
f. Saving your data file (.sav file extension)
4.
Data validation
a. Running
procedures to validate coding and data entry
b. Frequencies
c. Crosstabulation for contingent questions (not applicable here)
5.
Creating SPSS data files from other data
sources
a. Importing
data from Excel
b. Variable
names in first row
c. Variable
type and potential for error
d. Other sources: Access, SAS etc.
6.
Creating SPSS data files from “ASCII”
text files
a. Common
non-SPSS formats: delimited or fixed format plain text (“ASCII”) files
b. File
extensions for text files .dat, .txt
c. Notepad
to view text files
In ASCII files, a line is generally referred to as a RECORD. The columns
assigned to a variable are collectively referred to as a FIELD. HEALTH.DAT is
an example of a fixed format ASCII file since the same information is coded in
the same location for every case. AGE, for example, is always found in columns 14-15
of the first and only record of a case. (Column positions may appear distorted
when using proportional fonts in Word or other word processors.)
The use of the term column when describing the layout of an ASCII text file is
different from the use of the term column when describing the contents of the
Data Editor. A variable occupies a column in the Data Editor; a character
occupies a column in an ASCII data file. The ASCII text file is not an SPSS
data file.
d. Record
layout ("Codebook") for HEALTH.DAT
description variable record columns
Identification number ID 1 1-2
Systolic
blood pressure SBP 1 3-5
Quetelet
index QUET 1 6-10
Age in years AGE 1 11-12
98 = 98 or more
99
= missing
.
Respondent smokes SMK 1 13
0
= no
1 = yes
*Quetelet Index (a measure of size) = 100 * (weight/height**2)
Fixed format file
011352.876450
021223.251410
031303.100490
041483.768520
051462.979541
061292.790471
071623.668601
081603.612481
091442.368441
101804.637641
111663.877591
121384.032511
131524.116990
141383.673560
151403.562541
161342.998501
171453.360491
18 3.024461
191353.171570
201423.401560
211503.628561
221443.751580
231373.296530
241323.210500
251493.301541
261323.017481
271202.789430
281262.956431
291613.800630
301704.132631
311523.962620
321644.010650
Source
Kleinbaum, David G. and Kupper, Lawrence L. (1978). Applied Regression Analysis and Other Multivariable Methods. Boston, Massachusetts: Duxbury Press. (p. 60)
Comma delimited format
1,135,2.876,45,0
2,122,3.251,41,0
3,130,3.1,49,0
4,148,3.768,52,0
5,146,2.979,54,1
6,129,2.79,47,1
7,162,3.668,60,1
8,160,3.612,48,1
9,144,2.368,44,1
10,180,4.637,64,1
11,166,3.877,59,1
12,138,4.032,51,1
13,152,4.116,99,0
14,138,3.673,56,0
15,140,3.562,54,1
16,134,2.998,50,1
17,145,3.36,49,1
18,,3.024,46,1
19,135,3.171,57,0
20,142,3.401,56,0
21,150,3.628,56,1
22,144,3.751,58,0
23,137,3.296,53,0
24,132,3.21,50,0
25,149,3.301,54,1
26,132,3.017,48,1
27,120,2.789,43,0
28,126,2.956,43,1
29,161,3.8,63,0
30,170,4.132,63,1
31,152,3.962,62,0
32,164,4.01,65,0
e. Read an ASCII file using the Text Import Wizard
7. Using command syntax to read file
8.
More complex input file structures
a. multiple
lines per case
b. hierarchal files (e.g. household record, followed by one record per member of household)
c. different
record types (e.g. personal data record, course records, financial data record)
d. varying numbers of measures per unit
9.
Obtaining and using existing SPSS data
files
a. Example
of GSS 2008
b. Other
data archive sites
http://www.icpsr.umich.edu
Contact
William Bosworth (william.bosworth@lehman.cuny.edu) for further
information
10.
Learning more about SPSS
b. Manuals
in pdf format provided with license
c. Help
> Tutorial etc.
d. Academic
web sites, e.g.
http://www.usc.edu/its/stats/spss/index.html
http://www.usc.edu/its/stats/spss/index.html