API Reference¶

pylangacq.read_chat(path: str, *, filter_files: str | Sequence[str] | None = None, filter_participants: str | Sequence[str] | None = None, cls: type = <class 'builtins.CHAT'>, strict: bool = True) → CHAT¶

Read CHAT data.

Parameters:

path – Path to a .zip file, a local directory containing .cha files, or a single .cha file.
filter_files – Filename(s) to keep. Regular expression matching is supported. If None, all files are included.
filter_participants – Participant code(s) to keep. Regular expression matching is supported. If None, all participants are included.
cls – The class used to create the reader. Must be CHAT or a subclass of it.
strict – If True, enforce strict parsing of the CHAT data.

Returns:

A CHAT instance filtered by the specified files and participants.

Raises:

TypeError – If cls is not CHAT or a subclass of it.
ValueError – If path does not point to a .zip file, a directory, or a .cha file.

class pylangacq.Age¶

Age in the CHAT format: years;months.days (e.g., “2;10.05”).

in_months()¶: Return the age in total months as a float.

class pylangacq.CHAT¶

CHAT data reader for CHILDES/TalkBank transcripts.

ages()¶: Return the age of the target child (CHI) in each file.

append(other)¶: Append data from another CHAT reader.

append_left(other)¶: Left-append data from another CHAT reader, preserving order.

clear()¶: Remove all data from this reader.

extend(others)¶: Extend data from multiple CHAT readers.

extend_left(others)¶: Left-extend data from multiple CHAT readers, preserving order.

file_paths¶: Return the list of file paths.

filter(*, files=None, participants=None)¶: Return a new CHAT filtered by file path and/or participant regex.

classmethod from_dir(path, match=Ellipsis, extension='.cha', parallel=True, strict=True)¶: Recursively load CHAT data from a directory.

classmethod from_files(paths, parallel=True, strict=True)¶: Load CHAT data from file paths.

classmethod from_strs(strs, ids=None, parallel=True, strict=True)¶: Parse CHAT data from in-memory strings.

classmethod from_zip(path, match=Ellipsis, extension='.cha', parallel=True, strict=True)¶: Load CHAT data from a ZIP archive.

head(n=5)¶: Return the first n utterances with a formatted display.

headers()¶: Return file-level headers.

info(*, verbose=False)¶: Print a summary of this reader’s data.

ipsyn(*, participant='CHI', n=Ellipsis)¶: Index of Productive Syntax, one value per file.

languages(*, by_file=False)¶: Return languages, optionally grouped by file.

mlu(*, participant='CHI', n=Ellipsis)¶

Mean length of utterance in morphemes, one value per file.

Alias for [mlum][Chat::mlum].

mlum(*, participant='CHI', n=Ellipsis)¶: Mean length of utterance in morphemes, one value per file.

mluw(*, participant='CHI', n=Ellipsis)¶: Mean length of utterance in words, one value per file.

n_files¶: Return the number of files.

participants(*, by_file=False)¶: Return participants, optionally grouped by file.

pop()¶: Remove and return the last file as a new CHAT reader.

pop_left()¶: Remove and return the first file as a new CHAT reader.

tail(n=5)¶: Return the last n utterances with a formatted display.

to_chat(path, *, is_dir=False, filenames=None)¶: Write CHAT data to disk.

to_strs()¶: Return CHAT data strings, one per file.

tokens(*, by_utterance=False, by_file=False)¶: Return tokens, optionally grouped by utterance and/or file.

ttr(*, participant='CHI', n=Ellipsis)¶: Type-token ratio, one value per file.

utterances(*, by_file=False)¶: Return utterances, optionally grouped by file.

word_ngrams(n)¶

Return an Ngrams for word n-grams across all utterances.

N-grams do not cross utterance boundaries.

# Arguments

n - The n-gram order (1 for unigrams, 2 for bigrams, etc.).

words(*, by_utterance=False, by_file=False)¶: Return words, optionally grouped by utterance and/or file.

class pylangacq.ChangeableHeader¶: A changeable header that can appear mid-file in CHAT transcripts.

class pylangacq.Gra(dep, head, rel)¶: A grammatical relation from the %gra tier.

class pylangacq.Headers¶: All file-level (non-changeable) headers from a CHAT file.

class pylangacq.Ngrams(n, *, min_n=None)¶

An n-gram counter for counting n-gram frequencies.

Accumulates n-gram counts from sequences of elements. N-grams do not cross sequence boundaries.

clear()¶: Clear all counts.

count(seq)¶

Count n-grams from a single sequence.

Extracts all n-grams of the configured order from the sequence and increments their counts. N-grams do not cross sequence boundaries.

count_seqs(seqs)¶

Count n-grams from multiple sequences.

Each sequence is treated independently (n-grams do not cross boundaries).

get(ngram)¶

Return the count for a specific n-gram.

Returns 0 if the n-gram has not been observed.

items(*, order=None)¶: Return all (n-gram, count) pairs.

min_n¶: The minimum n-gram order.

most_common(n=None, *, order=None)¶

Return the n most common n-grams with their counts.

If n is None, returns all n-grams sorted by count (descending).

n¶: The n-gram order.

to_counter(*, order=None)¶

Convert to a Python collections.Counter.

Returns a Counter mapping n-gram tuples to their counts.

total(*, order=None)¶: Return the total number of n-gram tokens counted.

class pylangacq.Participant¶: A single participant from @Participants + @ID fields merged.

class pylangacq.Token(word, pos=None, mor=None, gra=None)¶: A token with word, POS, morphology, and grammatical relation.

class pylangacq.Utterance(*, participant=None, tokens=None, time_marks=None, tiers=None, changeable_header=None)¶

A single utterance from a CHAT transcript.

For changeable headers (e.g., @Comment, @New Episode), only changeable_header is set; all other fields are None.

raw¶: Raw transcript of this utterance, or None for headers.

to_str()¶: Return a plain text tabular representation of this utterance.

class pylangacq.Utterances¶

A sequence of utterances with a formatted display for terminal/notebook use.

Returned by [Chat::head] and [Chat::tail].