Home FAQ Syntax Macros Scripts Python Tips Links

Index of SPSS Tips

  1. Safeguard the initial data file!
  2. Syntax 1, syntax 2, syntax 3...
  3. My Syntax Window Toolbar
  4. Changing data folder each month?
  5. How to ask data transformation or automation question?
  6. When to use EXECUTE.
  7. About INCLUDE files
  8. How can I apply variable labels and value labels of my old sav file to my new sav file?
  9. Multi-line comments



Say your initial data file is named "mydata.sav". It is a good practice to never modify that file by syntax. Instead, the first time you save the data file by syntax, save it under a new name such as "mydata mod.sav". This is necessary to avoid accidentally crippling you original data file with commands such as SELECT IF or AGGREGATE.


When you work on a complicated syntax, periodically save the current version under a different name.



It is very useful to customize the toolbars (refer to the User Guide for instructions). This is my Syntax Window toolbar.

SPSS icones.jpg (20616 bytes)

Number Description of use
1 Empty the data editor
2 Open a new syntax window
3 Open a new script window
7 List the most recently used procedures
13 Run the command on which the cursor is positioned
14 Run the selected lines of syntax
15 Run syntax from the command on which the cursor is positioned to the end of the syntax
16 Run the complete syntax in the syntax window
18 When the cursor is on a command (for example FLIP or DO REPEAT), a window containing a brief description of the syntax for the command pops up.
20 This opens the spssbase.pdf file (the electronic version of the Syntax Reference Guide). I use this all the time.
21 This has the same effect as entering an EXECUTE command. It gets ride of the "Transformation Pending" message in the status bar. Very convenient.
22 This is an "home made" icon. It is linked to the Empty Designated Output scripts. When I write / debug syntax, I click this icon before each run, this way I have only fresh output in the Output window.


There are many circumstances where one need change the paths contained in a syntax file. For instance
  1. If you run the same syntax periodically (each week or month) but the new data is in a folder whose name identifies the data;
  2. if you work both at home and at the office and it is not convenient to reproduce the drives and folders of the office at home
  3. or you apply the same syntax to different clients file.

I handle these situations by defining a macro at the beginning of the syntax file:

define !Path1 ()'d:\project xy\my program files\'!enddefine.

define !Path2 ()'d:\project xy\my data files\'!enddefine.

GET FILE=!Path2+"data1.sav".

INCLUDE !Path1+"evaluate.sps".

*** note that evaluate.sps should refer to !path1 and !Path2 ***

*** do other calculations here ***.

SAVE OUTFILE=!Path2+"results.sav".                                   
With the above method, changing paths is done only once per syntax file.                                    
Ideally, you should include a sample initial data file as well as the desired result file. More advanced users should (when the data file is relatively complex)  do this using DATA LIST or INPUT PROGRAM as this saves a lot of time to the person trying to answer the question. In some cases it  took me as long to create the dummy data file as it took to solve the problem.  The easier you make it for the potential solver, the greater your chances that he / she will devote time to help you out.

Why: It is also useful to explain why you want to do this. The purpose of the why is that maybe there is  better way to achieve your goal which does not require the described  transformation.

Number of cases & variables: The solution when one has 100 cases or 10,000,000 cases is not always the same. Similarly the number of existing variables will affect the solution. It is therefore advisable to provide that information.

Frequency of use: An other element affecting the design of the solution is how frequently will the solution be applied. A one time use obviously requires less automation than a syntax which will be run every night through the Production facility.

In summary, you may simplify the example to make it easier to understand but to get a solution which really solves you problem, you should mention the elements listed above in order to described the full context.

Asking the right question remains the best strategy to get the right answer...

Students or teachers (who have access to huge libraries...) often ask questions of the type "Does anybody have a syntax to calculate statistics XYZ as described in Book ABC?". This is certainly a short way of asking a complex question but those of us who do not have readily access to a university library simply skip the question...If you have scanned pages of the formulas and could email them to persons interested, say so. Do not attach these documents to you postings.


Understanding the following Q and A  might save yourself a lot of problems. This was posted to the SPSSX-L list on 1999/02/04 by

David Matheson
SPSS Technical Support

I run a series of transformations from syntax in SPSS and am
puzzled to find that obtaining the correct results may require
an insertion of an EXECUTE command among the transformations.
The following commands are one example.

DATA LIST FILE '/tmp/ret.dat' FIXED /da 1-15 w 19-20 .
COMPUTE sv =(w<=1 or w>=5).
SELECT IF (sv=0).

The results for the variable RETURN were incorrect for some cases.
Correct results were obtained if an EXECUTE command was placed
somewhere between lines 2 (compute RETURN...) and 4 (SELECT IF...).
What are the rules, in this case and more generally, that dictate
when an EXECUTE command should be placed between transformation

The key here was to run EXECUTE before the SELECT IF. (In this
particular example, placing the EXECUTE between the 2 COMPUTES would
also work.) Otherwise, when you compute RETURN as DA - LAG(DA)
for a given case, the case that originally preceded the current case
may have already been dropped from the active file and LAG(DA) may
capture the value of DA for an unintended case.

To further illustrate the use of EXECUTE among transformations,
consider any 3 sequential cases with ID values of 1, 2, and 3.
Suppose you want to keep or drop case 2 depending on the result of
a comparison with case 1. Likewise, you wish to compare case 3 to
case 2 and keep or drop case 3 as a result. Suppose also that case 2
fails the comparison test but case 3 would pass it, i.e., its relation
to case 2 is such that you would want to keep case 3. Without an
EXECUTE (or other command that forces a data pass) before the
SELECT IF, case 2 is evaluated and dropped from the active file
before case 3 is evaluated. Therefore, case 3 is compared to case 1,
rather than case 2, and may be kept or dropped in error. Placing the
EXECUTE before the SELECT IF results in all cases being present when the
LAG function is being used. One can envision data selection tasks where
each case is compared to the last case that passed a similar
comparison - there you might leave out the EXECUTE to achieve that

A similar situation arise when cases are being selected by original
case number in the data set. Suppose you wanted to select every fifth
case and used the following syntax:

compute seq = $casenum.
select if (mod(seq,5) = 0).
frequencies x.
* the mod function returns the remainder when the first argument is
divided by the 2nd.

You would have no cases remaining in your frequency report. The first
case would be given a value of 1 for seq, since it's $casenum would
be 1. (mod(seq,5)=0) would therefore be false and the case would be
deleted. The case that was 2nd would now become the first, so
that $casenum = 1, so seq = 1 and case would be deleted. This would
eventually happen to the case that was originally the 5th case, as
well as the 10th, etc. The following syntax would work.

compute seq = $casenum.
select if (mod(seq,5) = 0).
frequencies x.

Adding the execute before the select allows seq to be calculated
correctly before any cases are deleted.

If you have a series of transformation commands (COMPUTE, IF, etc)
followed by a MISSING VALUES command that involves the same variables,
you will often want to place an EXECUTE statement before the
MISSING VALUES command. This is because the MISSING VALUES command
changes the dictionary before the transformations take place. For
example, consider:

IF (x = 0) y = z*2.

The cases where x=0 would be considered user-missing on x and the
transformation of y would not occur. Placing an EXECUTE before the
MISSING VALUES allows the transformation to occur before 0 is assigned
missing status.

An EXECUTE command is often necessary after you run the WRITE command
to save the data to an ASCII file, or after you use XSAVE, rather
than SAVE, to save data to an .sav file. WRITE and XSAVE are treated
like transformations. If your program ends with a write or xsave, with
no procedure to force a data pass, the file to which you had tried to
write would be empty. If the WRITE or XSAVE was part of a LOOP or
DO IF structure, the EXECUTE command would not be placed within that

Also, if a statistical procedure followed the WRITE or XSAVE commands,
then the new file would be written.

Finally, if you have such an extensive sequence of transformations
that you get an insufficient memory message when SPSS tries to
process them, you could intersperse EXECUTE commands among the
transformations to occasionally force a pass of the data and free up
memory for the next set of transformations. Don't place any of these
EXECUTE commands within transformation structures such as LOOP..END LOOP,
commands between commands that define scratch variables and subsequent
commands that reference those scratch variables. If you followed an
extensive set of transformation commands with a memory-intensive
command such as CLUSTER or MANOVA, you might place an EXECUTE
command before that statistical procedure. Although the procedure
alone would force the data pass that executed the transformations,
placing the EXECUTE command before the procedure would free memory
that was needed for the transformations.

This is just a sample of cases where EXECUTE commands should be placed
among or after transformations. Placing an EXECUTE after every
compute would almost always be inefficient at best, and unworkable
at worst (e.g. in a series of transformations in a LOOP or DO IF


If a Syntax file is called through an INCLUDE command, it is recommended to have a comment ( a line which starts with an * and ends with a period) in the first line of the syntax. This is to circumvent reported occurrences of the interpreter silently "swallowing" the first line code.

Note that the syntax of file which are to be run using the INCLUDE command needs to follow special rules:

  1. Commands must start in the first column
  2. Continuation lines cannot start in the first column.

For instance, the following code works in direct mode but does not work when invoked using the INCLUDE command:

    COMPUTE firstc=1.

Either of the following variation does work when the code is invoked by INCLUDE:

-    COMPUTE firstc=1.


COMPUTE firstc=1.

See Execute selective portions of syntax.SPS for a typical use of the INCLUDE command. It is important to know that the processing of a file that is called by the  INCLUDE command stops as soon as an error occurs. Some warnings also stop the execution of the file.

The following 2 situations are common with INCLUDE files.

  1. sometimes the user would like the syntax to simply continue to execute the subsequent command of the included file (this is what happens when the syntax file is run from the syntax window). See Include stops because select if results in no data.SPS for a possible solution.
  2. sometimes the include file is very long and because a certain condition occurs, the user would like to stop the processing of the syntax file. See Choice of include file depends on data.SPS for a possible solution to this problem.
Assuming the variable names are the same, all you have to do is use the menu FILE > APPLY DATA DICTIONARY ...

OR using syntax:
APPLY DICTIONARY FROM='C:\Program Files\SPSS\old data file.sav'.


To apply the Variable Label and Value Labels of a given variable to other variables, see this syntax.

The easiest way to have multi-line comments in a syntax file is
      - to indent the second and subsequent lines
      - avoid periods at the end of all but the last line.

That way a single * (or "Comment"  key word) is needed at the beginning of the comments.

Top of page