The number of international trade agreements has dramatically increased since the early 1990s. These agreements cover ever more issues and an average agreement text is now around ten times longer than 25 years ago. This makes it more and more difficult to analyze the content of trade agreements and assess their impact on international trade and welfare. Big data and text-as-data methods can help researchers, policy-makers and other stakeholders to better manage the growing complexity of trade agreements. The Text of Trade Agreements (ToTA) database provides a digital infrastructure for the computational analysis of international trade law.
The ToTA data contains 448 PTA texts notified to the WTO, and two texts for Trans-Pacific Partnership agreement (in English and Spanish). When the PTA texts are available in more than one of the official WTO languages (English, French, Spanish) we prioritise English and report the respective XML in this language.
This corpus builds on the WTO Regional Trade Agreements Information System data. We gathered metadata and full texts from this source, corrected the deficiencies (missing full texts or incorrect metadata), applied optical character recognition or other methods to arrive at machine-readable texts, removed annexes or schedules, imposed two-level hierarchy of treaty elements, and, finally, produced XMLs.