In order to split the train and test data in SAS we will using ranuni() and PROC SURVEY SELECT() Function. Splitting the dataset to Train and Test is done in two ways one using random number to each row by ranuni() function and other by using PROC SURVEY SELECT. Let’s see an example of Each
- Split train and test dataset in SAS using ranuni() Function
- Split train and test dataset in SAS using PROC SURVEY SELECT() Function
We will be using the table name CARS.
Split Train and Test Data set in SAS – ranuni() : Method 1
The ranuni() function returns random values between 0 and 1. ranuni() is a function that returns a pseudo-random number generated from the uniform (0,1) distribution.
Step 1: Assign Random values between 0 and 1
data temp; set cars; n=ranuni(100); run; proc sort data=temp; by n;
Step 2: Split the data into 75 % Training and 25 % Testing
data training testing; set temp nobs=nobs; if _n_<=.75*nobs then output training; else output testing; run;
Training Data: so the resultant training dataset will be
Testing Data: so the resultant test dataset will be
Split Train and Test Data set in SAS – PROC SURVEYSELECT : Method 2
Step 1: Use PROC SURVEYSELECT and specify the ratio of split for train and test data (70% and 30% in our case) along with Method which is SRS – Simple Random Sampling in our case
proc surveyselect data=cars rat=0.7 out= cars_select outall method=srs; run;
Details of SURVEYSELECT Procedures are
Resultant table “cars_select” will have column “selected” with values 1 and 0
Step 2: Split all the 1s as Train data set and all 0s as Test data set as shown below
data cars_train cars_test; set cars_select; if selected =1 then output cars_train; else output cars_test; run;
Training Data: so the resultant training dataset will be
Testing Data: so the resultant test dataset will be