# Split Train and Test data in SAS

In order to split the train and test data in SAS we will using ranuni() and PROC SURVEY SELECT() Function. Splitting the dataset to Train and Test is done in two ways one using random number to each row by ranuni() function and other by using PROC SURVEY SELECT. Let’s see an example of Each

• Split train and test dataset in SAS using ranuni() Function
• Split train and test dataset in SAS using PROC SURVEY SELECT() Function

We will be using the table name CARS. #### Split Train and Test Data set in SAS  –  ranuni() : Method 1

The ranuni() function returns random values between 0 and 1.  ranuni() is a function that returns a pseudo-random number generated from the  uniform  (0,1) distribution.

Step 1: Assign Random values between 0 and 1

```
data temp;
set cars;
n=ranuni(100);
run;

proc sort data=temp; by n;

```

Step 2: Split the data into 75 % Training and 25 % Testing

```
data training testing;
set temp nobs=nobs;
if _n_<=.75*nobs then output training;
else output testing;
run;

```

Training Data: so the resultant training dataset will be Testing Data: so the resultant test dataset will be #### Split Train and Test Data set in SAS  –  PROC SURVEYSELECT : Method 2

Step 1:  Use  PROC SURVEYSELECT and specify the ratio of split for train and test data (70% and 30%  in our case) along with Method which is SRS – Simple Random Sampling in our case

```
proc surveyselect data=cars rat=0.7
out= cars_select outall
method=srs;
run;

```

Details of SURVEYSELECT Procedures are Resultant table “cars_select” will have column “selected” with values 1 and 0

Step 2:  Split all the 1s as Train data set and all 0s as Test data set as shown below

```
data cars_train cars_test;
set cars_select;
if selected =1 then output cars_train;
else output cars_test;
run;

```

Training Data: so the resultant training dataset will be Testing Data: so the resultant test dataset will be ## Author

• With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.