Constitutions as Data (III): Using Regular Expressions to Explore Gender Protections

Civil Society and an Independent Judiciary Protect Against Discrimination

Over the past 125 years, states have consistently updated their laws to protect against discrimination. This means that historically legal systems were ripe with inequities that would not be tolerated by today’s standards. The list of potential grievances is lengthy, but discrimination on the basis of sex and gender would be high on any list.

 In 1893, New Zealand was the first country in the world to extend its right to vote to all women in the country by modifying its Electoral Act. This occurred after decades of trying to pass such a bill through Parliament.  At the time some regions already let women vote, but this island in the South Pacific was the first country to extend this right to every woman. New Zealand was followed by Australia in 1902, Finland in 1905, Norway in 1913, and the USA in 1920.

If we fast forward one hundred years, 152 of 198 countries in the world now use the term “sex” in their constitution and 79 use the term “gender”. What a distance we’ve come; today, only 20 constitutions lack this language altogether.

This blog post will use a text-as-data analysis to uncover patterns about gender-related protections around the world. In doing so, we will share the code we used to undertake this analysis, present the insights that were uncovered, and discuss the results.

Frequency Counts

This post takes the previous post a step further. In the last post, we used the str_count() function to determine how many times firearms were mentioned in each constitution. In this post, we move beyond counts and use regular expressions to extract word context.

We start by extracting additional meta-data, the year of last revision, as discrimination provisions may have been added to constitutions after their original adoption.

# We look in the meta data for the last year a constitution has been revised
# We start by extracting all the years 
pattern_matching<-gregexpr("\\d{4}", constitutions$doc_id)
years<-regmatches(constitutions$doc_id, pattern_matching)[[1]]

# Then we look for the latest year for each constitution, which indicates the date of last revision
constitution_texts$lastRevision <- as.numeric(lapply(years, function(x)(max(x))))

Next we determine how many times each constitution uses the words gender and sex using the str_count() function.

#Count how many times the words sex gender appears in each constitution and 
#assign the value to a new column in the data frame

constitution_texts$gender <- str_count(constitution_texts$text, 
constitution_texts$sex <- str_count(constitution_texts$text, 

Next, we will create a smaller data frame which will make it easier to manage the data and make a plot. To create it, we choose specific columns that we want from the original data frame, glue them together using the cbind() function, and turn them into a data frame using the function. Then, we use the colnames() function to assign names to each of these columns.

#Create a smaller data frame to better present the data and provide more #mangeable data to make a plot

GenderperCountry <-$sex,
colnames(GenderperCountry) <- c("Sex", "Gender", "Country", 
                                "Last Revision", "Year of adoption")

After some light editing on Excel, the code outputs the following table. It is important to note that the year the constitution was adopted is not necessarily the same year that gender provisions were added to the constitutions. However, we can see a few patterns in the following table. All the constitutions were adopted relatively recently, and all of the top five countries are either in Africa or South America.

(# of hits)
(# of hits)
Country Last
Year of adoption
3 33 Zimbabwe 2017 2013
3 19 Guyana 2016 1980
3 16 Kenya 2010 2010
18 15 Ecuador 2015 2008
4 15 Zambia 2016 1991

We can also use this table to see patterns in countries that use the word sex frequently. Here, the countries are more diverse, and the years of adoption are more varied. They range from New Zealand in 1852 to Bolivia in 2009. Yet, all the countries have had their constitutions revised relatively recently.

We can also see that that some countries have both gender and sex provisions in their constitutions. Ecuador, Sweden, and Bolivia all have roughly the same number of both types of provisions.

Rank Sex Gender Country Last
Year of adoption
1 48 1 New
2014 1852
2 18 15 Ecuador 2015 2008
3 17 0 Cambodia 2008 1993
4 10 3 Malta 2016 1964
5 8 8 Bolivia 2009 2009
Used in the discussion section
7 6 Sweden 2012 1974

Adding Context through Regular Expressions

We want to go a step further and determine the context in which these words are used. To do so, we will use regular expressions introduced in Lesson 3. Regular expressions are similar to key word searches, but they look for patterns rather than just specific words. In the code below, we search all the constitutions in the world for the word “gender” and extract the word that follows it to determine in what context the word “gender” is used. Finally, we use the unique() function to eliminate duplicates.

for (i in constitution_texts$text) {
  genderwords <- gregexpr("gender\\s[a-zA-Z]+", constitution_texts$text)
  genderwordz <- regmatches(constitution_texts$text, genderwords)

The output of this function provides many interesting results that offer context to the word gender. For instance, some protections ensure that men and women are treated equally. The terms related to these which are found in constitutions are “gender equity” and “gender equality”. Alternatively, some protections require government institutions to have certain levels of men and women represented. These terms include “gender composition” and “gender representation”. Finally, there is terminology that promotes the fair treatment of all individuals like “gender identity”, “gender affiliation”, and prohibitions on “gender hate”. Further, there are many other results that will require more context. These include terms like “gender and”, “gender or”, and others.

We also perform the same context-analysis above using the word “sex”.

for (j in constitution_texts$text) {
  sexwords <- gregexpr("sex\\s[a-zA-Z]+", constitution_texts$text)
  sexwordz <- regmatches(constitution_texts$text, sexwords)

Here, there are terms that prohibit “sex trafficking” and the “sex trade”. There are also constitutions that use the term “sex offender”. These types of results help flag articles that researchers could study in more depth. However, many constitutions that use the word sex need more than one word to provide adequate context. Results like “sex shall” and “sex or” provide little help to those hoping to provide a substantive analysis of the text. Instead, it may be worthwhile to extract more words (“sex\\s\\w+\\s\\w+”), or the word preceding the word “sex” (“\\w+\\ssex”) which can be easily done by adjusting the regular expression.

Limitations: Counts Are Often Not Enough to Draw Solid Conclusions

It is important to note that quantitative findings usually need to be supplemented with qualitative research to be able to make sense of them. Above, for instance, the number of mentions of gender and sex terminology suggest that New Zealand and Zimbabwe offer the best sex- and gender-based protections respectively. However, we compared this data to a few easily accessible sources to provide more context. In this case, the data alone would have lead us to misleading conclusions. As such, in the discussion, we will better situate these results.


Using the above data as a starting point, we will now look through the two constitutions to determine the impact of gender- and sex-based protections. This will help us determine to what extent a constitution helps promote gender equality.

New Zealand

New Zealand was not only the first country to allow women to vote, but it is also the country that mentions “sex” the most times in its constitutional documents. The frequent mentions exist for several reasons. First, New Zealand does not have a single constitutional document, but rather a collection of documents and norms that in their totality form a constitution. This means that New Zealand’s constitution includes a large number of documents, and many of which were written recently, by constitutional standards.

Amongst these documents is New Zealand’s Human Rights Act. The Human Rights Acts is where most mentions of the word “sex” appear. This includes laws protecting against discrimination based on sex and sexual orientation, in addition to prohibitions on sexual harassment. The act also provides limits to these protections which allows for distinctions based on sex in certain circumstances. As such, the document’s level of detail outlines what may or may not be protected and explains why the word “sex” is used so often.

It is safe to assume that these protections help promote fairness between men and women in New Zealand. The country has the 7th best legal system in the world, according to the Rule of Law Index, and ranks highest in the East-Asia and Pacific region. Further, according to a 2020 report by the World Economic Forum, New Zealand is 6th most gender-equal country in the world.

As such, we may conclude that New Zealand’s constitution protects sex and gender rights effectively.


Zimbabwe is the country that uses the word gender the most often in its constitution. At first glance, one may conclude that this means that women’s rights are well protected in the country. However, it is worthwhile to have a second look.

In 2013, Zimbabwe adopted a new constitution in a process riddled with political controversies. In doing so, the text of the constitution protects fundamental human rights including gender equality. These texts require the government to ensure that both genders are equally represented in all spheres of society and at every level of government. Further, the text compels the government to take positive action to rectify any discrimination and set up the Zimbabwe Gender Commission which evaluates complaints about gender composition and gender discrimination. While the constitution provides strong protections, there is reason to believe they are not well implemented.

According to the Rule of Law Index, Zimbabwe ranks 119th out of 128 countries and 28th of 31 in Sub-Saharan Africa when it comes to fairly applying the law. Thus, it is unlikely that the constitution has as important an impact as it would in a country with a stronger rule of law. That said, the country ranks 47th out of 153 countries with respect to gender equality. This rank is quite strong in comparison to its rule of law ranking. As such, the constitution may serve as an aspirational standard that Zimbabwe’s society wants to live up to.


Counts alone are often not enough to draw well-reasoned conclusions. Often, words need to be put in context to be given real meaning. This includes understanding what it means within the document itself, to a country’s civil society and in the international context.

Here, we see that constitutions require more than simply adding words to a government document. They require a strong civil society to contest violations of the constitution and an independent judiciary that is willing to inconvenience the government to protect certain rights. Countries will need more than words to combat gender- and sex-discrimination.

Next post.

Link to the rankings we used:

access_time Last update July 6, 2020.

chat networking coding local-network layer menu folders diagram panel route line-chart compass search flow data-sharing search-1 message target translator candidates studying chat networking coding local-network layer menu folders diagram panel route line-chart compass search flow data-sharing search-1 message target translator candidates studying