***************** INTRODUCTION TO STATA ************ ***** 11 November 2022 ************* ***** EXERCISE *IMPORT data "01 - Body Data" STATA code: import delimited "C:\...\01 - Body Data.txt" Data Set : Body Data - Body and exam measurements from 300 subjects *Describe pulse variable, are there any extreme values? *Find the mean, standard deviation and quartiles values *Plot distribution of pulse rate by histogram and boxplot *Describe sex *Plot distribution of pulse rate by sex // IMPORT import delimited "C:\...\01 - Body Data.txt" //Data Set 1: Body Data //Body and exam measurements are from 300 subjects *from drop-down menu: Import text data (delimited , .csv,...) summarize pulse,d histogram pulse graph box pulse tabulate gender1m graph box pulse, over(gender1m) histogram pulse, by(gender1m) *identify extreme values by Tuckey definition display 80-64 display 64 - 16*1.5 display 80 + 16*1.5 list if pulse<40 list if pulse>104 ***** DO files * You can type all the same commands into the Do-file that you would type into the command windo * BUT…the Do-file allows you to save your commands * Your Do-file should contain ALL commands you executed – at least all the "correct" commands! * I recommend never using the command window or menus to make CHANGES to data * Saving commands in Do-file allows you to keep a written record of everything you have done to your data * Allows easy replication * Allows you to go back and re-run commands, analyses and make modifications ******** GAUSSIAN APPROXIMATION ******************** summarize pulse histogram pulse, normal bin(10) * which is the range that include nearly the central 68% of the observation? display 71.76667-12.12803 display 71.76667+12.12803 count if pulse>59.63864 & pulse<83.8947 display 205/300 * which is the range that include nearly the central 95% of the observation? display 71.76667+2*12.12803 *96.02273 display 71.76667-2*12.12803 *47.51061 count if pulse>47.51061 & pulse<96.02273 * 288 display 288/300 *.96 bin(10) * find the 99th percentile summarize pulse,d *100 display invnormal(0.99) display 2.3263479*12.12803+71.76667 *99.980687 ******************** WORK WITH BODY DATA (Triola et al Appenix B -data set 1)*********************** // IMPORT import delimited "C:\...\01 - Body Data.txt" //Data Set 1: Body Data //Body and exam measurements are from 300 subjects *from drop-down menu: Import text data (delimited , .csv,...) //variables, data editor edit * decription of data codebook summ inspect // add an identificator variable gen id = _n,before(age) label variable id "unique identific" save "C:\...bodydata.dta" ******* QUALITATIVE VARIABLES ********** tabulate gender1m label define Sex 1 "Male" 0 "Female" label values gender1m Sex decode gender1m,generate(sex) graph pie, over(gender1m) title(Sex distribution) legend(on order(1 "Males" 2 "Females" )) graph pie, over(sex) title(Sex distribution) graph bar, over(gender1m) graph bar, over(sex) *customise titles with drop down menu graph bar, over(gender1m) graph bar (count), over(gender1m) ******* QUANTITATIVE VARIABLES ********** tabulate bmi * not readable! ********** histogram histogram bmi *customise number of classes by drop-down menu *create classes of width 1: egen float bmi_c = cut(bmi), at(15(1)60) tabulate bmi_c *see data change ********** Galton ogiva sort bmi_c cumul bmi_c, gen(cu_bmi_c) line cu_bmi_c bmi_c ********** Boxplot graph box bmi ********* summary indicatoris summarize bmi *inspect extreme data list if bmi>50 sum waist sum armcirc ****** plots by sex histogram bmi, normal by(gender1m) by gender1m, sort : summarize bmi, detail graph box bmi,by(sex) graph box bmi,over(sex) **** classify by BMI categories egen bmi_class= cut(bmi), at(0,18.5,25,30,40,100) graph box bmi,over(bmi_class) //generate bmi_class2 = bmi //recode bmi_class2 (min/18.5=1) (18.5/25=2) (25/max=3) graph box waist,over(bmi_class)