segram.grammar.components module

Abstract base class for grammar components.

Grammar components are groups of associated tokens controlled by a root token, e.g. a verb with its auxiliary verbs.

class segram.grammar.components.Component(*args: Any, **kwds: Any)[source]

Bases: TokenElement

Abstract base class for grammar components.

All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare __slots__ = (). This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard __init_subclass__ interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. Components consists of a root token associated with (optional) additional subordinated tokens, e.g. a noun and its determiner. Default syntactic role assigned to the given component type can be defined using __role__ class attribute. This is a base class used for implementing concrete components classes. Names of all possible controlled tokens (e.g. neg for a negation token) must be defined in __tokens__ class attributes along the iheritance chain of concrete subclasses up to GrammarComponent`. Each controlled token name should be declared only once and all must be present also in __slots__. Component classes not defining any new controlled token slots have to define __tokens__ = (). The same rules apply to defining component attributes through __attrs__ class attributes. The above requirements are checked at runtime during class creation.

sent: Sentence the component belongs to.

tok: Head token object.

role: Syntactic role of the component head token.

sub: Tokens dependent on the head and not included in other token categories. They are not printed.

qmark: Question mark token.

exclam: Exclamation mark token.

intj: Interjection token.

neg: Negation token(s).

property idx: int: Index of the component head token.

property head: Token: Component head token.

property lead: Self: Head component of the lead phrase.

property is_lead: bool: Is the controlling phrase of the component a lead phrase.

property tokens: tuple[Token, ...]: Tokens sequence of the element.

property attrs: dict[str, Any]: Attributes dictionary.

classmethod from_data(doc: Doc, data: dict[str, Any]) → Self[source]: Construct from Doc and a data dict.

to_data() → dict[str, Any][source]: Dump to data dictionary.

classmethod get_comp_type(role: Role = None, pos: POS | None = None) → type[Self][source]: Get component type from role or POS tag.

to_str(*, color: bool = False, role: Role | None = None, **kwds: Any) → str[source]

Represent as a string.

Parameters:: role – Overrides head token role.

get_tid() → tuple[int, ...][source]: Get token tuple id.

iter_token_roles(*, role: Role | None = None, bg: bool = False) → Iterable[tuple[Token, Role | None]][source]

Iterate over token-role pairs.

Parameters:

role – Overrides head token role.
bg – Should tokens be marked as a background token (e.g. as a part of a subclause). This is used for graying out subclauses when printing.

is_comparable_with(other: Any) → None[source]: Are self and other comparable.

similarity(other: Self | Token) → float[source]: Cosine similarity to other component.

class segram.grammar.components.Verb(*args: Any, **kwds: Any)[source]

Bases: Component

Abstract base class for verb components.

All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare __slots__ = (). This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard __init_subclass__ interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. Components consists of a root token associated with (optional) additional subordinated tokens, e.g. a noun and its determiner. Default syntactic role assigned to the given component type can be defined using __role__ class attribute. This is a base class used for implementing concrete components classes. Names of all possible controlled tokens (e.g. neg for a negation token) must be defined in __tokens__ class attributes along the iheritance chain of concrete subclasses up to GrammarComponent`. Each controlled token name should be declared only once and all must be present also in __slots__. Component classes not defining any new controlled token slots have to define __tokens__ = (). The same rules apply to defining component attributes through __attrs__ class attributes. The above requirements are checked at runtime during class creation.

sent: Sentence the component belongs to.

tok: Head token object.

role: Syntactic role of the component head token.

sub: Tokens dependent on the head and not included in other token categories. They are not printed.

qmark: Question mark token.

exclam: Exclamation mark token.

intj: Interjection token.

neg: Negation token(s). neg Negation token.

Notes

It defines also tense attribute.

class segram.grammar.components.Noun(*args: Any, **kwds: Any)[source]

Bases: Component

Abstract base class for noun components.

All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare __slots__ = (). This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard __init_subclass__ interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. Components consists of a root token associated with (optional) additional subordinated tokens, e.g. a noun and its determiner. Default syntactic role assigned to the given component type can be defined using __role__ class attribute. This is a base class used for implementing concrete components classes. Names of all possible controlled tokens (e.g. neg for a negation token) must be defined in __tokens__ class attributes along the iheritance chain of concrete subclasses up to GrammarComponent`. Each controlled token name should be declared only once and all must be present also in __slots__. Component classes not defining any new controlled token slots have to define __tokens__ = (). The same rules apply to defining component attributes through __attrs__ class attributes. The above requirements are checked at runtime during class creation.

sent: Sentence the component belongs to.

tok: Head token object.

role: Syntactic role of the component head token.

sub: Tokens dependent on the head and not included in other token categories. They are not printed.

qmark: Question mark token.

exclam: Exclamation mark token.

intj: Interjection token.

neg: Negation token(s). mod Modifier tokens.

class segram.grammar.components.Prep(*args: Any, **kwds: Any)[source]

Bases: Component

Abstract base class for preposition components.

All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare __slots__ = (). This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard __init_subclass__ interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. Components consists of a root token associated with (optional) additional subordinated tokens, e.g. a noun and its determiner. Default syntactic role assigned to the given component type can be defined using __role__ class attribute. This is a base class used for implementing concrete components classes. Names of all possible controlled tokens (e.g. neg for a negation token) must be defined in __tokens__ class attributes along the iheritance chain of concrete subclasses up to GrammarComponent`. Each controlled token name should be declared only once and all must be present also in __slots__. Component classes not defining any new controlled token slots have to define __tokens__ = (). The same rules apply to defining component attributes through __attrs__ class attributes. The above requirements are checked at runtime during class creation.

sent: Sentence the component belongs to.

tok: Head token object.

role: Syntactic role of the component head token.

sub: Tokens dependent on the head and not included in other token categories. They are not printed.

qmark: Question mark token.

exclam: Exclamation mark token.

intj: Interjection token.

neg: Negation token(s). preps Chain of subsequent prepositions attached to the head token.

class segram.grammar.components.Desc(*args: Any, **kwds: Any)[source]

Bases: Component

Abstract base class for description components.

All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare __slots__ = (). This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard __init_subclass__ interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. Components consists of a root token associated with (optional) additional subordinated tokens, e.g. a noun and its determiner. Default syntactic role assigned to the given component type can be defined using __role__ class attribute. This is a base class used for implementing concrete components classes. Names of all possible controlled tokens (e.g. neg for a negation token) must be defined in __tokens__ class attributes along the iheritance chain of concrete subclasses up to GrammarComponent`. Each controlled token name should be declared only once and all must be present also in __slots__. Component classes not defining any new controlled token slots have to define __tokens__ = (). The same rules apply to defining component attributes through __attrs__ class attributes. The above requirements are checked at runtime during class creation.

sent: Sentence the component belongs to.

tok: Head token object.

role: Syntactic role of the component head token.

sub: Tokens dependent on the head and not included in other token categories. They are not printed.

qmark: Question mark token.

exclam: Exclamation mark token.

intj: Interjection token.

neg: Negation token(s). mod Modifier tokens.