The Tsugaru Dialect


This page contains information about the various research and projects that deal with the Tsugaru dialect.


Matsunoki Corpus

The Matsunoki treebank is a parsed corpus of the Tsugaru dialect. The data collected up to this point is from audio samples spoken by native speakers of Tsugaru-ben. The samples consist of readings of folktales of the Tsugaru region, which are presented in the dialect. There are 26 readings in total, representing nearly four hours of spoken data.

The annotation system, including the methods of linguistic analysis and presentation on the internet is based on the NINJAL Parsed Corpus of Modern Japanese project (NPCMJ) (2020). The audio data is transcribed using the CLAN program from TalkBank (MacWhinney, 2000). It is transcribed according to the WAKACHI2002 (Miyata, 2018) format for Japanese text. The MOR program is then used with a version of the JMOR08 (Miyata, 2018) grammar that has been modified for the dialect to generate part of speech, declension, and English gloss information for the transcribed data. The transcribed data and its applied part of speech information is then output to the Berkeley Parser 1.7 which has been trained on data from the NPCMJ to generate parsed trees, which are then fixed by hand.

As of now, a total of twenty readings have been fully transcribed, given morphological analysis, and have been converted into trees. There are a total of 748 trees and 16,512 words, with just over 1,000 trees expected when after the final six files are processed

The corpus interface is can be accessed at

COJADS 日本語諸方言コーパス

COJADS, also referred to as Corpus of Japanese Dialects, is a corpus that builds off of audio samples of dialect throughout Japan from the 1970s and 1980s. Information, including transcribed .csv files of the audio, are available at

Hirosaki University Tsugaru-ben AI Project 「弘大×AI×津軽弁プロジェクト」

One of the more ambitious projects involving the dialect is the 'Hirosaki University Tsugaru-ben AI Project'. The project is a collaboration between Hirosaki University Graduate School of Science and Engineering Professor Masa Imai and eight other internal faculty members, and two research collaborators. The aim of the project is to collect Tsugaru-ben written and audio samples with Standard Japanese translations, which is crowdsourced from the public. It was reported to have collected 11,000 sample sentences by June 2020.

The website, accessible at, has a submission form allowing samples to be submitted. Also made available is a searchable dictionary of Tsugaru-ben words with Standard Japanese translations. According to the website, the goal is to help facilitate communication between native Tsugaru-ben speakers, and will aim to potentially provide translations to foreign languages in the future.


Agadamburi Dictionary

Agadamburi is a Tsugaru-ben Standard Japanese translation dictionary by Isao Kumeta (久米田 いさお). It provides Standard Japanese translations with sample sentences for every word, along with inflection information. Details about the book can be found here:

Nihongo to Tsugaru-ben (日本語と津軽弁―方言の持つ伝統の味わいとは) (2003)

This book by Isao Ogasawara that provides an overview of the Tsugaru Dialect by linking it to earlier forms of the Japanese language. An especially interesting book for those interested in understanding the origin of words and sounds of the dialect.