ECN No Name Newsletter: December, 1990

The ECN No Name Newsletter is no longer being published. This is an archived issue.

[previous article] [next article]

Using SAS

Julie Dickinson

SAS, which is now online at ECN, can be a powerful tool in data analysis. Before performing some incredible statistical procedure, though, you must understand the workings of the SAS Display Manager System and the basics of SAS programming.

SAS Display Manager System

To access SAS and begin your session, enter the command "sas". Your terminal screen should look like this:

                         screen picture

The three windows that you see in front of you correspond to parts of your SAS session.

The Output Window displays the output of your program.

The Log Window displays messages from the SAS system. It also displays your SAS statements as they are executed.

The Program Editor Window is where you enter your programs and statements to be executed.

Each window contains the line "Command ===>." This is the command line. When you want to end your SAS session, enter either "BYE" or "endsas" on any of the three command lines.

Move the cursor to any command line (using the arrow keys) and enter the command "keys". You will see the Keys Window appear on the right side of the screen. In the window is a listing of certain function keys which have been predefined for the machine you are using. You can use these keys as a shortcut in performing many commonly used commands. Now type "end" on the Keys Window command line. This will remove the window.

Next enter "help" on any command line. This displays the Help Window, which provides information about SAS, such as procedures, other windows, and syntax of commands. To scroll forward in the window type "forward" on the command line (or use the forward key if one was specified in the key window). To scroll backward, use the "backward" command. When you've seen enough, type "end" and, wow, we're back where we started.

Now move the cursor to the command line in the Program Editor window and enter the command "zoom". The window now occupies the entire screen. This comes in rather handy when you are typing a program and would actually like to see more than four lines of it at a time. Typing zoom again will "unzoom" the screen.

Programming In SAS

The SAS language is divided into DATA and PROC steps. The DATA step is a group of statements which produce a data set. PROC steps are used to analyze your data via one of the many SAS procedures.

Before we discuss these two steps, let's discuss some general aspects of SAS statements. Each statement in the progam MUST end in a semicolon. Also, where the statement starts on a line is not important. Indenting lines usually makes it easier to read, but you can put two statements on the same line if you so desire. Spacing between words and between lines is not critical either.

Now we will examine the DATA step.

DATA Step

Suppose that I want to analyze test scores for two sections of a class that I am teaching on, oh, the joy of SAS. I have the following information for each of my students: a 5 digit identification number, a section (A or B), and a test score. One student did not take the test, but I want his number and section listed with the data anyway. A DATA step for this example would look like the following:

options  pagesize = 60  linesize = 80  nodate;
data grades;
  input IdentNum section $ score;
  cards;
12345 A 91
15678 B 78
83749 A 68
23793 B 87
23478 B 89
34293 A 78
34034 B .
88378 A 100
37492 A 82
48392 A 55
;

Let us examine each statement.

options pagesize = 60 linesize = 80 nodate;

This statement sets the page size and line size of the output. Nodate tells the computer not to print the date on the output. The settings I used (pagesize = 60 and linesize = 80) work well for output printed on 8.5" X 11" paper.

                          data grades;

This assigns the name 'grades' to the data set I am about to create.

                 input IdentNum section $ score;

This statement tells SAS what the names of the variables that I will be using are and how the values will be arranged on the data lines. The order in which the variables are listed in the input statement should be the order in which they are arranged on the data lines. There are a few rules about what variables can look like:

  1. They must start with a letter. The remaining characters can be letters or numbers, but NOT blanks. For example, IndentNum is valid, but Ident Num is not.
  2. They must be eight characters or less.

If the data value is not completely comprised of numbers, you must place a dollar sign after the variable name. I put a dollar sign after section because A and B are not numbers. Got it? There are ways of describing your data set using column locations or by putting several short observations on one line. I'll let you explore these options on your own.

                             cards;

This statement basically says, "Hey! The very next thing you're going to see is the beginning of my correctly formatted data lines." So make sure you put it IMMEDIATELY before your data lines.

                         the data lines

This is where you enter your data lines--in the order specified in the input statement. Each value must be separated by at least one blank. When a value is missing, we place a period where the value would have been. The data lines are not followed by a semicolon since they are not really statements. After you have entered all of your data, enter a semicolon on a line by itself. This ends the data lines.

There are many other options available with the data step, but this is the basic outline.

Submitting, Editing, and Saving a Data Set

After you have typed the above program in the Program Editor Window, enter the command "submit" on the command line. The step is now running. Look at the log window. You will see your program as well as messages from SAS. Zoom the window if you cannot read all of the messages. If SAS points out errors in your program you will need to edit it.

To do this, move the cursor to the command line of the Program Editor Window and enter the command "recall". The program will reappear in the window. If you need to change a line, simply move the cursor to the spot you need to rewrite and type directly over the old line. If you need to insert a line, type "i" on any part of the line number (over one of the 0's) of the line above where you need a new line. When you press "ENTER", a new line will be inserted.

When you have eliminated all of your errors and have decided that you want to save your data set, type "file 'filename'" on the program editor command line (filename being the name you wish to assign to your file). Now your data set will be saved and you can use it again when you return to SAS. The file command saves whatever is in the window in which you enter the command.

To access a file, type "include 'filename'" on the command line of the Program Editor Window. The contents of the file program appears in the window. This same procedure is used for saving a program or output.

A Few SAS Procedures

The DATA step simply created a data set. To do anything more we need to write a SAS procedure. This section will describe three procedures: PRINT, SORT, and MEANS.

PRINT Procedure

The PRINT procedure, logically enough, prints the values in your data set--in the order in which they were entered. Examine the following statements:

     proc print  data=grades;
        title 'Test Data';
        footnote 'first test, fall 1990';
     run;
"proc print"
tells SAS that you want to implement the print procedure.
"data = grades"
names the data set that you want it to use.
"title"
prints a title on the top line of the page of output. If you do not specify a title, it will simply print the word sas.
"footnote"
prints a statement at the bottom of each page of output. If you do not use a footnote statement, no footnote will be printed. A new title or footnote statement replaces the previous one. In other words, the title (or footnote) that you choose will be used on each page of output until a different title is specified. Notice that the title and footnote are enclosed in single quotes.
"run"
ends the procedure. EVERY PROCEDURE MUST END WITH A RUN STATEMENT! The output of this procedure is shown below:
                             screen

SORT Procedure

Suppose we wanted our data printed in two separate groups--one for each section. Or maybe we wanted the data printed with the identification numbers in numerical order. The solution to our dilemma is the proc SORT. We add the following lines to our program:

     proc sort  data=grades;
        by section;
     run;

The first line is similar to the first line in proc print: "proc sort" names the procedure, and "data=grades" names the file that is to be sorted. Now give attention to the BY statement. This statement divides the data in two groups--section A and section B. If we were to change the by statement to read

by section IdentNum;

SAS would first sort the data by section, then it would reorder the observations within each section in order of identification number. In general, proc SORT arranges the data in the order in which the variables are arranged in the BY statement.

PROC SORT simply reorders the data in the data set. If we want to see this newly organized form, all we have to do is add another print procedure identical to the one we used before. This will print the reordered data set.

MEANS Procedure

We are now ready to do some descriptive statistics with our data using PROC MEANS. This procedure is a nice way to summarize data with a few simple statistics. Examine the following lines which we will add to our program:

   proc means  data=grades;
      by section;
      var score;
      title 'Grade Averages in my Exciting Class';
   run;

PROC MEANS calculates and prints the following statistics for each variable listed in the VAR statement:

In this case, the procedure does the calculations on the variable score.

The BY statement allows separate analyses by the groups listed in the BY statement. Here, we will have two listings of calculations--one for section 1 and one for section 2. If I had wanted these summary calculations for the course as a whole, then I would have omitted the BY statement. When you use a BY statement, the data set must have already been sorted in the order of the BY variables. This was done by PROC SORT.

Examine the output of the procedure:

                          output shown

PROC MEANS is a very simple way to obtain a brief summary of your data set.

Printing Your output

After we have submitted our program, we will see the output in the output window Move the cursor to the command line of the that window and save the output using the file command, typing "file 'grades.output'".

Now exit SAS by typing "endsas" or "BYE". Enter the command "ls" and you should see a listing of your SAS files. You can now print these files the way you normally do.

                        lpr grades.output

Although this is only a basic introduction to the workings of SAS, it should have given you a general knowledge of the SAS Display Manager System as well as the fundamentals of programming in SAS. If you would like to know more about what SAS can do, contact the Statistical Software Consultants in the basement of the Math Building.

Good luck and have fun!


webmaster@ecn.purdue.edu
Last modified: Thursday, 23-Oct-97 20:25:47 EST

[HTML Check] HTML