Stratified Sampling in SAS

In Stratified sampling every member of the population is grouped into homogeneous subgroups and representative of each group is chosen. Stratified sampling in SAS is achieved by using ranuni() Function and PROC SURVEYSELECT.  Select n% samples percentage from each strata in SAS. Select N samples from each strata.

  • Stratified sampling in SAS using ranuni() Function
  • Stratified sampling in SAS using PROC SURVEYSELECT Function
  • Stratified samples in SAS with N samples for each strata.
  • Stratified samples with total N samples split according to proportion of Strata

Stratified Sampling in SAS 0

So we will be using CARS Table in our example

Stratified Sampling in SAS 1

 

 

 Method 1 : Stratified sampling in SAS with proc survey select

Note : PROC SURVEYSELECT expects the dataset to be sorted by the strata variable (s).

Luxury is the strata variable. 4 samples are selected for each strata (i.e. 4 samples are selected for Luxury=1 and 4 samples are selected for Luxury=0).


proc sort data=cars; 
by Luxury; 
run; 
 
/** sample size of 4 for each strata */ 
proc surveyselect data=cars 
out = strat_sample_n 
method=srs  
sampsize=4; 
strata Luxury; 
run;


So the resultant stratified sample in SAS with N Sample for each stratum will be
Stratified Sampling in SAS 2

 

 

Total N Samples split proportionately according to distribution of strata:

In below example we decided to have totally 4 samples with strata variable as luxury and split is proportionate to the distribution of strata.  Luxury= 1 has 5 entries and Luxury=0 has 11 entries. So split will be 1:3 approximately. So out of 4 samples, 3 will have Luxury =0 and 1 will have Luxury =1


proc sort data=cars; 
by Luxury; 
run; 
 
/** total sample size of 4 with allocation proportionate to strata*/ 
proc surveyselect data=cars  
out = strat_sample_n 
method=srs 
sampsize=4; 
strata Luxury / alloc=proportional; 
run;


So the resultant sample table will be
Stratified Sampling in SAS 3

 

 

Stratified Sampling in SAS with PROC SQL  : Method 2

Roundabout way: Extract list of random rows for one strata value say Luxury =1
Then again extract list of random rows for the strata with alternate value say Luxury =0
finally append these two to get the stratified sampling using proc sql


/*strata - luxury*/ 
proc sql outobs=4; 
create table cars_lux as  
select * from cars where luxury=1 order by ranuni(10); 
quit; 
 
/*strata - nonluxury*/ 
proc sql outobs=4; 
create table cars_non_lux as  
select * from cars where luxury=0 order by ranuni(10); 
quit; 
 
/* append dataset */ 
proc sql; 
create table cars_strat_samp as  
(select * from cars_lux 
union corresponding all  
select * from cars_non_lux); 
run; 

So the resultant table with stratified sampling will be

Stratified Sampling in SAS 4

 

                                                                                   

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.