**PROC** **SURVEYSELECT** in SAS is used to select samples from the dataset. **PROC** **SURVEYSELECT** is used for simple random sampling and stratified sampling. **PROC** **SURVEYSELECT** is also used for selection of train and test data set. Let’s see an example of each

**Syntax PROC SURVEYSELECT in SAS:**

**PROC SURVEYSELECT options;**

**STRATA variable;**

**CONTROL variable;**

**SIZE variable;**

**ID variable;**

So we will be using **CARS** Table in our example

** **

**Simple Random Sampling PROC SURVEY SELECT:**

**Select N% samples**

Selecting Random N% samples in SAS is accomplished using PROC SURVEYSELECT function, by specifying **method =srs **&** samprate = n%** as shown below

/* Type 1: proc survey select n percentage sample*/ proc surveyselect data=cars out = cars_sample_60perc method=srs samprate=60; run;

So the resultant table with 60% of samples will be

**Select N samples**

Selecting Random N samples in SAS is accomplished using PROC SURVEYSELECT function. by specifying **method=srs** & **sampsize = N** as shown below

/* Type 2: proc survey select n samples*/ proc surveyselect data=cars out = cars_sample_n method=srs sampsize=10; run;

So the random 10 sample of population will be

**Simple Random Sampling with replacement – proc survey select**

Simple Random sample with replacement in SAS is accomplished using PROC SURVEYSELECT function. by specifying **sampsize = N **and** rep=1** as shown below which indicates 10 samples with repetition will be selected.

/* simple random sampling with replacement - proc survey select */ proc surveyselect data=cars method = srs sampsize = 10 rep=1 seed=12345 out=cars_rep_n; run;

So the random 10 sample of population will be

**Stratified Sampling in SAS : PROC SURVEYSELECT **

**Note : **PROC SURVEYSELECT expects the dataset to be sorted by the strata variable (s).

Luxury is the strata variable. 4 samples are selected for each strata (i.e. 4 samples are selected for Luxury=1 and 4 samples are selected for Luxury=0).

proc sort data=cars; by Luxury; run; /** sample size of 4 for each strata */ proc surveyselect data=cars out = strat_sample_n method=srs sampsize=4; strata Luxury; run;

So the resultant stratified sample in SAS with N Sample for each stratum will be

**Total N Samples split proportionately according to distribution of strata**

In below example we decided to have totally 4 samples with strata variable as luxury and split is proportionate to the distribution of strata. Luxury= 1 has 5 entries and Luxury=0 has 11 entries. So split will be 1:3 approximately. So out of 4 samples, 3 will have Luxury =0 and 1 will have Luxury =1

proc sort data=cars; by Luxury; run; /** total sample size of 4 with allocation proportionate to strata*/ proc surveyselect data=cars out = strat_sample_n method=srs sampsize=4; strata Luxury / alloc=proportional; run;

** **So the resultant sample table will be

**Split Train and Test Data set in SAS – PROC SURVEYSELECT **

**Step 1: **Use PROC SURVEYSELECT and specify the ratio of split for train and test data (70% and 30% in our case) along with Method which is SRS – Simple Random Sampling in our case

proc surveyselect data=cars rat=0.7 out= cars_select outall method=srs; run;

Details of SURVEYSELECT Procedures are

Resultant table “**cars_select”** will have column “**selected”** with values 1 and 0

**Step 2: **Split all the 1s as Train data set and all 0s as Test data set as shown below

data cars_train cars_test; set cars_select; if selected =1 then output cars_train; else output cars_test; run;

**Training Data:**

**Testing Data:**