0% found this document useful (0 votes)
3 views

Lab 1 Manual - Introduction to R

Lab 1 of BM090IU introduces students to R and RStudio, focusing on basic calculations, object manipulation, and data import/export. Students will learn to run R codes, assign values to objects, and perform operations on data frames. The lab includes practical tasks such as calculating values, importing data, and analyzing patient data related to urinary tract infections.

Uploaded by

Tuan Nguyen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lab 1 Manual - Introduction to R

Lab 1 of BM090IU introduces students to R and RStudio, focusing on basic calculations, object manipulation, and data import/export. Students will learn to run R codes, assign values to objects, and perform operations on data frames. The lab includes practical tasks such as calculating values, importing data, and analyzing patient data related to urinary tract infections.

Uploaded by

Tuan Nguyen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

BM090IU - Statistics for Health Science

Lab 1: Introduction to R

LEARNING OUTCOMES

• Familiarize yourself with R and R Studio interface.


• Calculation and basic object manipulation in R.
• Import and export data tables from R

1. Introduction to R
The objective of this lab is to allow you to become familiar with R and RStudio. R is a freely
available computational language and environment for most modern statistical
procedures, especially in biology and medicine. The “base” version of the R itself already
has numerous built-in functions, but statisticians sometimes contribute new statistical
procedures in the form of downloadable packages (https://cran.r-project.org/). While the
language can be used on its own, we will only work with the software RStudio in this
course, which is a nice interface for R programming.

1
BM090IU - Statistics for Health Science

Upon opening RStudio, you should see three to four sections:

• The Script window (or Source) on the top left. This is where you write your code.
This window may be missing when you RStudio without a script file. In this case,
press Ctrl + Shift + N to create a new Script window.
• The Console window on the bottom left. This is where your commands are run
and where you can see the printed outputs of most commands (except graphs).
You may also type your code directly here.
• The Environment window (or History) on the top right. This is where you can see
the data files and variables that you have.
• The Files/Plots/Packages/Help/Viewer window on the bottom right. This is a
miscellany of different tabs, of which we will mostly work with the Files tab and
Plots tabs.

2. Running R codes
As previously explained, R codes are typed either directly onto the console or to the
script window. To run a line or a block of code, you can use Ctrl + Enter or press the Run
button on the script window. Press Ctrl + Shift + Enter if you want to run the entire script.

Occasionally, you may find lines starting with # in the code. These are called chunk texts
and are not run as part of the script but are comments explaining the codes to the
viewers. The following are some basic calculation operations of R. Similar to many
programming languages, R syntax is also case-sensitive.

Syntax Description Example


+ Addition 11 + 34
sum Addition sum (11, 34)
- Subtraction 11 - 34
* Multiplication 11 * 34
/ Division 11 / 34
^ Power 11^34
sqrt() Square Root Sqrt(121)
log() Natural Logarithm log(34)
exp() Exponent (base e) exp(11)
c Combine (create a vector) c(1, 2, 3)
<- Assignment a <- c(1, 2, 3)

2
BM090IU - Statistics for Health Science

TASK 1: Try calculating the following in R

1. 1.8 + 9.9 − 2.152


7
2. √9.5
−3
3. 54.3 × 𝑒 0.99
4. log 3 (e − 1)⁡
5. √2 + √1 + 𝑥⁡where⁡x = 1.81.4

3. R objects
a. Assigning values to objects
Objects in R can store a wide variety of data types, including numerical and categorical.
Data can be combined and structured as a vector (one-dimension), or a data frame (two-
dimension, like a table). We may want to assign names to objects for convenient analysis.

The function data.frame() combines all the mentioned vectors into a table. Type and run
the name of the data frame (i.e. data) and some of the vectors. Run and observe what
happens.

To access one column of the data frame, we can use the operator $. Try running
data$group2 to access the third column of the data frame. (I usually think of this operator
as ’s in data’s group2).

3
BM090IU - Statistics for Health Science

b. Object sizes and changing sizes


The size of a vector and data frame can be viewed with the following commands. (Note
that a single column of a data frame is also treated as a vector.)

In handling data, occasionally you would need to calculate new variables. To add a new
column to an existing data frame, the operator $ can be used as follows to add group3
and group4. If you simply want to insert an empty column, NA can be assigned to the new
column instead. NA is understood by R as a placeholder for absent values.

TASK 2:

a. Store the first seven primary numbers in an object named primin.


b. Store the seven largest two-digit primary numbers in an object named primax.
c. Multiply primin with primax. See what happened and create a data frame
containing both the original numbers and products.
d. Add comments to help other people understand your code (and remind yourself
in case you forget)?

TASK 3: Your fellow researcher has trouble running his code. Help him identify and
correct the errors.

a. Height <- (170, 171, 182, 175, 165, 149, 167.1)


b. pH <- c(7, 8, 10, 7, 4, 8. 9, 6, 6)
c. HospitalVisit <- c[1, 0, 0, 2, 0, 1, 0, 1, 1]
d. Log(Height)
e. sum(Height)/sum(ph)

4
BM090IU - Statistics for Health Science

4. Import data
R Studio allows extracting data from files with a variety of extensions. For CSV and TSV:

By default, the first row of the data is understood as the column names. As such, if you
want to use the first row as data, include header = FALSE inside the bracket. Note that
while paths are commonly written with backlash “\”, R only processes paths written with
forward slashes “/”. You can also import data directly from Excel files (.xls and .xlsx). This
would require loading the packages readxl. The function library() serves this purpose.

A second method is through setting up the working directory. In R, a working directory is


a folder on your computer that R Studio uses to read and write files. Try running the
function getwd() to get the location of your current working directory.

Data files within the working directory folder are accessible and imported via the
directory tab on the bottom-right window. To set a different folder as the working
directory, try running the function setwd() followed by the path to your desired folder.

TASK 4: Now try importing the file Lab1_Participants.csv from Blackboard. This contains
various measurements of participants in a clinical study.

1. What is the dimension of the imported data?


2. What type of variable is each column of the data frame?
3. Try to extract only SP (systolic blood pressure) measurement.
4. Calculate the Body Mass Index (BMI) of each participant and add the results as a
new column to the data frame.
𝑀𝑎𝑠𝑠(𝑘𝑔)
𝐵𝑀𝐼 =
𝐻𝑒𝑖𝑔ℎ𝑡(𝑚)2

5
BM090IU - Statistics for Health Science

5. Create a new column named “Group” and run the following command

Print the patient data frame to see the result. The first command labels patients as either
“Treatment” or “Control” with equal chances. What is the proper name of this process,
and how is it relevant to the type of research we are doing?

6. Export data
After finishing your analysis, new data frames can be exported using the following
commands:

TASK 5:

1. Briefly explain the difference between TSV, CSV, and TXT files.
2. Export the complete patient data as a CSV file.
3. Export the Control group’s data using a semicolon as a separator (which is
common in Europe). How would you open this file?

7. Exercise
TASK 6: Urinary tract infection (UTI) is monitored by the quantity of bacterial cells in mid-
stream urine. In microbiology, bacterial abundance is characterized as colony-forming
units (CFU) when culturing urine samples on agar. The data in this exercise contains
patient ID, gender, and CFU before and after treatment with a new antibiotic therapy.

1. Download and import the file Lab1_UTI as a data frame. What are the dimensions?
And what does each column represent?

6
BM090IU - Statistics for Health Science

2. Use the summary() function to inspect important quantiles of the CFU before and
after treatments. What do you think about the distribution of the data? What types
of graphs would you use to verify your suspect?
3. Perform a log-10 transformation to CFU before and after treatment and add the
results as two new columns.
4. Use the summary() function again to inspect the data. What do you notice? What
can you conclude about the use of log transformation?
5. Save your new patient data as a tab-separated file and show it to the instructor.

_______________________________________________________________

This is the end of Lab 1! Have a great week!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy