Exercises on regular expressions

Exercise 1

Create a variable called text1 and populate it with the value “The current year is 2017”
Create a variable called my_pattern and implement the required pattern for finding any digit in the variable text1.
Use function grepl to verify if there is a digit in the string variable.

Use function gregexpr to find all the positions in text1 where there is a digit.
Place the results in a variable called string_position
Can you obtain the same result using a function from the stringr package?

Create a variable called my_pattern and implement the required pattern for finding one digit and one uppercase alphanumeric character, in variable text1. HINT: combine predefined classes in the regex pattern.
Use function grepl or its stringr equivalent to verify if the searched pattern exists on the string.

Use function regexpr to find the position of the first space in text1.
Place the results in a variable called first_space and Use function grepl or its stringr equivalent to verify if the searched pattern exists on the string.

Create a pattern that checks in text1 if there is a lowercase character, followed by any character and then by a digit.

Find the starting position of the above string. Place the results in a variable called string_pos2

Find the following pattern: one space followed by two lowercase letters and one more space.
Use a function that returns the starting point of the found string and place its result in string_pos3.

Using the sub function, replace the pattern found on the previous exercice by the string " is not “”
Place the resulting string in text2 variable.

Find in text2 the following pattern: Four digits starting at the end of the string.
Use a function that returns the starting point of the found string and place its result in string_pos4.

Using the substr function, and according to the position ofthe string found in the previous excercise, extract the first two digits found at the end of text2.

File “LipidsData.csv” contains the values obtained in a metabolomics studies on lipidic concentrations in HIV patients.
The researchers who provided us with the data for the analysis also need to extract some information from the lipid names and give us this information.
- The nomenclature is easy: We wish you to extract
  - the number of carbon atoms (the first number, before the two points-)
  - vs. number of double bonds (the second number -after two points-) and also
  - the lipid family (last part of the name that is not a number.
- Example
  - C24Cer 24 carbons; 0 double bonds, family name=“Cer”
  - C24: 1Cer (a) 24 carbons; 1 double bonds, family name=“Cer”
  - C24: 2Cer 24 carbons; Two double bonds, family name=“Cer”
Read the file into R and prepare a script that parses the names and writes another file with the information desired.