The Legal Māori Corpus

Search the corpus—a digitised collection of thousands of pages of legal and law-related texts in the Māori language that span 1829 to 2009.

About the corpus

The Legal Māori Corpus is a collection of Māori language law-related documents. At approximately 8 million words, it is the largest known structured corpus of the Māori language.

Start searching

To do a simple search, enter a Māori word (or two-word phrase) into the search bar above. If it appears anywhere in the corpus, you'll get a results page.

You can do an advanced search using filters in the corpus browser.

Viewing your search results

If your word or phrase appears in the corpus you will see a list of sentences (called a ‘concordance’). Your search word or phrase will be highlighted in the middle.

Some words are really common and might have tens of thousands of examples. Others may only occur once.

If you have lots of results, use the buttons at the bottom of the list to choose how many you see at once: 100, 500 or 2000. There is a limit of 2000 at a time to stop the site from being overloaded.

Sorting your results

The concordance (results list) is chronological, so you can see the oldest example first and the most recent example last.

You can also choose to view the concordance alphabetically, with the ‘right collocate’ button. The list will be sorted based on the first letter of the word to the immediate right of your search term.

The alphabetical sort lets you see the most commonly-used grammatical structures associated with your search term. For example, the word ‘whakatau’ has Western legal meanings of ‘determine, adjudicate and arbitrate’. As a verb ‘whakatau’ can take ‘i’ or ‘ki’ to its immediate right. By doing an alphabetical sort you can see which form ‘whakatau’ takes more often: ‘i’ or ‘ki’.

Use the ‘date’ button to switch back to chronological view.

The sort method you choose will apply to the results on your screen. Every time you add new results, they will be sorted by date. To sort new results alphabetically, select ‘date’ then ‘right collocate’.

You can sort the first 100 results of your concordance (or however many you choose to bring up on screen). If you sort the first 100 results and then want to add more, you will need to repeat your preferred sort, to ensure the new material is sorted properly.

Finding the source information

On the left side of each example of your search term are three numbers. The first is the number of the example, based on chronological order.

The second is a linked reference code. Select the link to see information about the source of the example. You will also be able to see more details about that source, the digital text of that source document, and (where available) the original scans of that document.

You can find out more about the abbreviations used in reference codes.

The third column is the year the example was used.

Changes in Māori vocabulary and sentence structures

One of the main reasons for carrying out these searches of text collections is to see how Māori vocabulary and sentence structures has changed over time since the earliest texts in the corpus were written.

For example, if you enter a search term such as ‘kooti’ (court) or ‘whakatau’ (determine, arbitrate, adjudicate) you can see how these words have been used since they first appeared in the earliest texts of the corpus. Language and vocabulary shifts over time. This resource allows you to see how that change can happen.

Exploring digital text documents

Some people find the digital text of the documents in the corpus dense, hard to read and containing many strange codes or symbols.

That's because some text still contain coding from when they were digitised. Eventually all documents will have this extra material deleted, making those documents easier to read.

If you would like to see another version of these documents, most (but not all) are also available in more reader-friendly format through the Legal Māori Archive hosted by the New Zealand Electronic Text Collection.

Errors in the Legal Māori Corpus

The corpus is the largest structured digital collection of written Māori, containing around 8 million words. Among them are some errors and variant forms of words. The causes of these errors vary. Emma Osbourne prepared a report profiling the kinds of errors contained in the corpus, and how they came about. Download the error report.