Exercise 1
- Create a variable called
text1
and populate it with the value “The current year is 2017”
- Create a variable called
my_pattern
and implement the required pattern for finding any digit in the variable text1
.
- Use function
grepl
to verify if there is a digit in the string variable.
Exercise 2
- Use function
gregexpr
to find all the positions in text1
where there is a digit.
- Place the results in a variable called string_position
- Can you obtain the same result using a function from the
stringr
package?
Exercise 3
- Create a variable called
my_pattern
and implement the required pattern for finding one digit and one uppercase alphanumeric character, in variable text1
. HINT: combine predefined classes in the regex pattern.
- Use function
grepl
or its stringr
equivalent to verify if the searched pattern exists on the string.
Exercise 4
- Use function
regexpr
to find the position of the first space in text1
.
- Place the results in a variable called
first_space
and Use function grepl
or its stringr
equivalent to verify if the searched pattern exists on the string.
Exercise 5
- Create a pattern that checks in
text1
if there is a lowercase character, followed by any character and then by a digit.
Exercise 6
- Find the starting position of the above string. Place the results in a variable called
string_pos2
Exercise 7
- Find the following pattern: one space followed by two lowercase letters and one more space.
- Use a function that returns the starting point of the found string and place its result in
string_pos3
.
Exercise 8
- Using the sub function, replace the pattern found on the previous exercice by the string " is not “”
- Place the resulting string in
text2
variable.
Exercise 9
- Find in
text2
the following pattern: Four digits starting at the end of the string.
- Use a function that returns the starting point of the found string and place its result in
string_pos4
.
Exercise 10
- Using the
substr
function, and according to the position ofthe string found in the previous excercise, extract the first two digits found at the end of text2
.
Exercise 11
- File “LipidsData.csv” contains the values obtained in a metabolomics studies on lipidic concentrations in HIV patients.
- The researchers who provided us with the data for the analysis also need to extract some information from the lipid names and give us this information.
- The nomenclature is easy: We wish you to extract
- the number of carbon atoms (the first number, before the two points-)
- vs. number of double bonds (the second number -after two points-) and also
- the lipid family (last part of the name that is not a number.
- Example
- C24Cer 24 carbons; 0 double bonds, family name=“Cer”
- C24: 1Cer (a) 24 carbons; 1 double bonds, family name=“Cer”
- C24: 2Cer 24 carbons; Two double bonds, family name=“Cer”
- Read the file into R and prepare a script that parses the names and writes another file with the information desired.