Loops - Data Science for Lawyers

It is often useful to apply a function to all of the elements in a vector or a data frame, rather than just applying it to one. For that we use a loop. Imagine you want to multiply the numbers 20 to 29 by 13. You can write 20*13, 21*13, 22*13 … A loop does that work for you.


# Let's first create a list of numbers from 20 to 29.

number_list <- c(20:29)

print(number_list)

 ## [1] 20 21 22 23 24 25 26 27 28 29

In this example, we want to repeat the *13 operation for all 10 elements and print the result. We thus write a for loop like this: for each elements (20,21,22 …) in my list multiply that element by 13. Note that the variable “element” could have any name, e.g. “i”: for (i in number_list)) {print (i*13)}.

for (element in number_list) {

print(element*13)
}

 ##
[1] 260
[1] 273
[1] 286
[1] 299
[1] 312
[1] 325
[1] 338
[1] 351
[1] 364
[1] 377

We can also write the same loop without using the value of the element, but instead use its rank in the vector. That means we perform the function on the first element of the list, then the second and so on. This becomes useful when we want to access the same rank in different vectors. So how many ranks does our list have?

length(number_list)

 ## [1] 10


# So we can write: for the ranks from 1 to length of list (here 10), perform a function on each element of that rank.

for (rank in 1:length(number_list)) {print(number_list[rank]*13)}

 ##
[1] 260
[1] 273
[1] 286
[1] 299
[1] 312
[1] 325
[1] 338
[1] 351
[1] 364
[1] 377

Both approaches yield the same result, but use a different grammar. You can see that when you print element and rank.


# This yields the last value of the numbered list i.e. 29.

print(element)

 ## [1] 29

# This yields the last position of the numbered list i.e. 10.

print (rank)

 ##[1] 10

Let’s try a loop on the on a dataframe.


# This is just a sample dataframe with 3 rows and 4 columns:

sample_dataframe <- as.data.frame(matrix(data=1:12, nrow=3, ncol=4, byrow=FALSE))

print(sample_dataframe)

##
  V1 V2 V3 V4

        1  1  4  7  10

        2  2  5  8  11

        3  3  6  9  12

Now, we want to know the sum of each of the three rows and columns. The loop works as follows.


# FOR the rows IN row number 1 to the total number of rows in my sample_dataframe
# PRINT the SUM of each row.

for (row in 1:nrow(sample_dataframe)) {
print(sum(sample_dataframe[row,]))
}

 ##
[1] 22
[1] 26
[1] 30

 # To do the same thing for column sums, we just need to (1) adjust for the number of columnes (ncol) and (2) adjust the comma. Note that I also changed "row" to "col" but this is not necessary. But it helps understanding what is being done.

for (col in 1:ncol(sample_dataframe)) {print(sum(sample_dataframe[,col]))
}

## [1] 6 [1] 15 [1] 24 [1] 33

We are now all done for today. Don’t forget to complete the exercise to apply the new programming skills you have learned.

Last update May 8, 2020.