Tumgik
Week 3: Muzahid --Making Data Management Decisions
LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;
DATA new1; set mydata.nesarc_pds;
--------------------------------------------------------------------
CHECK321 = Cigarate smoking status
s2aq8a = how offen drank any alchol in last 12 months
-----------------------------------------------------------------------------
Proc freq of CHECK321  the new1 table: 
proc freq data=new1; tables check321 /list missing;run;
    CHECK321FrequencyPercentCumulative FrequencyCumulative Percent.2508058.202508058.201991323.003499381.202807818.754307199.959220.0543093100.00
from the code book 9 = unknown and BL= NA or uknow or never.
missing treatment data management: 
DATA new1; set mydata.nesarc_pds;
/*set missing data*/ if check321=9 then check321=.;run; proc freq data=new1; tables check321 /list missing;run;
CHECK321
    CHECK321FrequencyPercentCumulative FrequencyCumulative Percent.2510258.252510258.251991323.003501581.252807818.7543093100.00
----------------------------------------------
s2aq8a = how offen drank any alchol in last 12 months
new usfreq subvaraible created from s2aq8a ;
code :
data new2 ;set new1;
/*Reverse code values*/
if s2aq8a =1 then usfreq = 'more than 12 times per year';
if s2aq8a =2 then usfreq = 'more than 12 times per year';
if s2aq8a =3 then usfreq = 'more than 12 times per year';
if s2aq8a =4 then usfreq = 'more than 12 times per year';
if s2aq8a =5 then usfreq = 'more than 12 times per year';
if s2aq8a =6 then usfreq = 'more than 12 times per year';
if s2aq8a =7 then usfreq = 'lessthan12 time per year';
if s2aq8a =8 then usfreq = 'lessthan12 time per year';
if s2aq8a =9 then usfreq = 'lessthan12 time per year';
if s2aq8a =10 then usfreq = 'lessthan12 time per year';
if s2aq8a=99 then s2aq8a=.;
/*coding in valid data*/
IF s2aq8a NE 99 AND s2aq8a=. THEN s2aq8a=11;
run;
proc freq data=new2;
tables   s2aq8a usfreq/list missing;run;
proc freq of S2AQ8A
             S2AQ8AFrequencyPercentCumulative FrequencyCumulative Percent118654.3318654.33212102.8130757.14326196.08569413.21429146.76860819.98532617.571186927.54635578.251542635.80726636.181808941.98818054.191989446.17932107.452310453.611036378.442674162.05111635237.9543093100.00
        Proc freq of new sub varaible usfreq  :
      usfreqFrequencyPercentCumulative FrequencyCumulative Percent 1635237.951635237.95lessthan12 time per year1131526.262766764.20more than 12 times per year1542635.8043093100.00
         I collapsed the responses for how ofen drank any alchol in last 12 months,  from S2AQ8A new variables: usfreq. For usfreq, the less than 12 time per year is 26% and more than 12 times per year is 35.80% , and unknow or former drinker or lifetime obstainer is 37.95%.
usfreqFrequencyPercentCumulative FrequencyCumulative Percent 1635237.951635237.95lessthan12 time per year1131526.262766764.20more than 12 times per year1542635.8043093100.00
0 notes
Week 2 Assigment September 11,2017 Running your first program - SAS
1) Programm:
LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;
DATA new; set mydata.nesarc_pds;
LABEL TAB12MDX="Tobacco Dependence Past 12 Months"  CHECK321="Smoked Cigarettes in Past 12 Months"  S3AQ3B1="Usual Smoking Frequency"  S3AQ3C1="Usual Smoking Quantity";
/*subsetting the data to include only past 12 month smokers, age 18-25*/ /* IF CHECK321=1; */ IF AGE LE 50;
PROC SORT; by IDNUM;
PROC FREQ; TABLES  CHECK321 S3AQ3B1 AGE;
RUN;
log file :
Errors Warnings Notes (2)
 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly; NOTE: Libref MYDATA was successfully assigned as follows:        Engine:        V9        Physical Name: /courses/d1406ae5ba27fe300 61          62         DATA new; set mydata.nesarc_pds; NOTE: Data file MYDATA.NESARC_PDS.DATA is in a format that is native to another host, or the file encoding does not match the        session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce        performance. 63          64         LABEL TAB12MDX="Tobacco Dependence Past 12 Months" 65           CHECK321="Smoked Cigarettes in Past 12 Months" 66           S3AQ3B1="Usual Smoking Frequency" 67           S3AQ3C1="Usual Smoking Quantity"; 68          69         /*subsetting the data to include only past 12 month smokers, age 18-25*/ 70         /* IF CHECK321=1; */ 71         IF AGE LE 50; 72          73         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 
2)  frequency distributions  CHECK321 S3AQ3B1 AGE variables
PROC FREQ; TABLES  CHECK321 S3AQ3B1 AGE; RUN:
PROC SORT; by IDNUM; 62           NOTE: There were 26795 observations read from the data set WORK.NEW. NOTE: The data set WORK.NEW has 26795 observations and 3008 variables. NOTE: PROCEDURE SORT used (Total process time):
file:///C:/Users/roshan/Downloads/Program%201-results.html
The FREQ Procedure Smoked Cigarettes in Past 12 Months CHECK321 Frequency Percent Cumulative Frequency Cumulative Percent Frequency Missing = 16790 1 7151 71.47 7151 71.47 2 2841 28.40 9992 99.87 9 13 0.13 10005 100.00 Usual Smoking Frequency S3AQ3B1 Frequency Percent Cumulative Frequency Cumulative Percent Frequency Missing = 16790 1 7908 79.04 7908 79.04 2 304 3.04 8212 82.08 3 457 4.57 8669 86.65 4 501 5.01 9170 91.65 5 288 2.88 9458 94.53 6 514 5.14 9972 99.67 9 33 0.33 10005 100.00 AGE Frequency Percent Cumulative Frequency Cumulative Percent 18 746 2.78 746 2.78 19 722 2.69 1468 5.48 20 715 2.67 2183 8.15 21 748 2.79 2931 10.94 22 734 2.74 3665 13.68 23 731 2.73 4396 16.41 24 803 3.00 5199 19.40 25 639 2.38 5838 21.79 26 628 2.34 6466 24.13 27 702 2.62 7168 26.75 28 721 2.69 7889 29.44 29 777 2.90 8666 32.34 30 869 3.24 9535 35.58 31 861 3.21 10396 38.80 32 873 3.26 11269 42.06 33 846 3.16 12115 45.21 34 843 3.15 12958 48.36 35 861 3.21 13819 51.57 36 885 3.30 14704 54.88 37 991 3.70 15695 58.57 38 989 3.69 16684 62.27 39 924 3.45 17608 65.71 40 992 3.70 18600 69.42 41 881 3.29 19481 72.70 42 912 3.40 20393 76.11 43 832 3.11 21225 79.21 44 823 3.07 22048 82.28 45 878 3.28 22926 85.56 46 799 2.98 23725 88.54 47 810 3.02 24535 91.57 48 789 2.94 25324 94.51 49 742 2.77 26066 97.28 50 729 2.72 26795 100.00
3) a few sentences describing your frequency distributions in terms of the values the variables take, how often they take them, the presence of missing data, etc.
variable 1. -   CHECK321 Smoked Cigarettes in Past 12 Months    cigarate smoking status
- at the age >50,   There were 26795 observations read from the data set.
- out of 26795 total dataset,  Smoked Cigarettes in Past 12 Months  has 10,005 count with three vallues and missing frequency is 16,790.
smoked cigarates in the past 12 months =  7151
smoked cigarates prior to the last 12 months = 2841
unknown =13
missing 16,790.
variable 2. -   S3AQ3B1 - Usual Smoking Frequency      
age >50  population usual smoking frequency distribution from the population 26795. missing population as the same 16,790.
1  everyday-              7908        79.04% 
2   5-6 days a week    304        3.04%   
3    3 -4 days a week  457         4.57%   
4    1 - 2 days a week  501          5.01% 
5   2 - 3 daya a month 288         2.88% 
6    once in a month    514        5.14% 
9   Unknow                    33        0.33% 
Variable 3 - Age 
data set filter the age >50 and frequecy distribution from age 18-50 as follows
18 746 2.78 746 2.78
19 722 2.69 1468 5.48
20 715 2.67 2183 8.15
21 748 2.79 2931 10.94
22 734 2.74 3665 13.68
23 731 2.73 4396 16.41
24 803 3.00 5199 19.40
25 639 2.38 5838 21.79
26 628 2.34 6466 24.13
27 702 2.62 7168 26.75
28 721 2.69 7889 29.44
29 777 2.90 8666 32.34
30 869 3.24 9535 35.58
31 861 3.21 10396 38.80
32 873 3.26 11269 42.06
33 846 3.16 12115 45.21
34 843 3.15 12958 48.36
35 861 3.21 13819 51.57
36 885 3.30 14704 54.88
37 991 3.70 15695 58.57
38 989 3.69 16684 62.27
39 924 3.45 17608 65.71
40 992 3.70 18600 69.42
41 881 3.29 19481 72.70
42 912 3.40 20393 76.11
43 832 3.11 21225 79.21
44 823 3.07 22048 82.28
45 878 3.28 22926 85.56
46 799 2.98 23725 88.54
47 810 3.02 24535 91.57
48 789 2.94 25324 94.51
49 742 2.77 26066 97.28
50 729 2.72 26795 100.00 
0 notes
Data Analysis Peer grade assignment - NESARC data 
------------------------------------------------------------------------------------------------------------------------------------ 42-42 CENDIV CENSUS DIVISION --------------- 2018 1. New England 6191 2. Middle Atlantic 6430 3. East North Central 2561 4. West North Central 8665 5. South Atlantic 2658 6. East South Central 4832 7. West South Central 3046 8. Mountain 6692 9. Pacific ------------------------------------------------------------------------------------------------------------------------------------ 43-43 CCS MSA TYPE -------- 15002 1. In MSA - in central city 20295 2. In MSA - not in central city 7796 3. Not in MSA
from NESARC data I took two variables 1. CENDIV 2. CCS
Total data having 43,093 unique id numbers. and noticed proc freq of census division from 1-9 values has the above count. for example cendiv 1 has 2018 uniqe id. 1 represents New england.
CENDIV has 1-9 vallues  
MSA type has 1-3 vallues and freq of each MSA , example MSA 1 = central city has 15,002 Unique ID.
MSA =“ 2″ (Not in central city) has 20,295 unique id.
MSA=“3″ (Not in MSA) has 7,796.
0 notes