Split Train and Test data in SAS

In order to split the train and test data in SAS we will using ranuni() and PROC SURVEY SELECT() Function. Splitting the dataset to Train and Test is done in two ways one using random number to each row by ranuni() function and other by using PROC SURVEY SELECT. Let’s see an example of Each

  • Split train and test dataset in SAS using ranuni() Function
  • Split train and test dataset in SAS using PROC SURVEY SELECT() Function

We will be using the table name CARS.

Split Train and Test data in SAS 1

 

 

Split Train and Test Data set in SAS  –  ranuni() : Method 1

The ranuni() function returns random values between 0 and 1.  ranuni() is a function that returns a pseudo-random number generated from the  uniform  (0,1) distribution.

Step 1: Assign Random values between 0 and 1


data temp; 
set cars; 
n=ranuni(100); 
run; 
 
 
proc sort data=temp; by n;

Step 2: Split the data into 75 % Training and 25 % Testing


data training testing; 
set temp nobs=nobs; 
if _n_<=.75*nobs then output training; 
else output testing; 
run;

Training Data: so the resultant training dataset will be

Split Train and Test data in SAS 2

Testing Data: so the resultant test dataset will be

Split Train and Test data in SAS 3

 

 

 

 

Split Train and Test Data set in SAS  –  PROC SURVEYSELECT : Method 2

Step 1:  Use  PROC SURVEYSELECT and specify the ratio of split for train and test data (70% and 30%  in our case) along with Method which is SRS – Simple Random Sampling in our case


proc surveyselect data=cars rat=0.7 
out= cars_select outall 
method=srs; 
run;

Details of SURVEYSELECT Procedures are

Split Train and Test data in SAS 4
Resultant table “cars_select” will have column “selected” with values 1 and 0

Step 2:  Split all the 1s as Train data set and all 0s as Test data set as shown below


data cars_train cars_test; 
set cars_select; 
if selected =1 then output cars_train; 
else output cars_test; 
run;

Training Data: so the resultant training dataset will be

Split Train and Test data in SAS 5

Testing Data: so the resultant test dataset will be

Split Train and Test data in SAS 6

 

                                                                                               

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.