segram.grammar.components module

Abstract base class for grammar components.

Grammar components are groups of associated tokens controlled by a root token, e.g. a verb with its auxiliary verbs.

class segram.grammar.components.Component(*args: Any, **kwds: Any)[source]

Bases: TokenElement

Abstract base class for grammar components.

All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare __slots__ = (). This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard __init_subclass__ interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. Components consists of a root token associated with (optional) additional subordinated tokens, e.g. a noun and its determiner. Default syntactic role assigned to the given component type can be defined using __role__ class attribute. This is a base class used for implementing concrete components classes. Names of all possible controlled tokens (e.g. neg for a negation token) must be defined in __tokens__ class attributes along the iheritance chain of concrete subclasses up to GrammarComponent`. Each controlled token name should be declared only once and all must be present also in __slots__. Component classes not defining any new controlled token slots have to define __tokens__ = (). The same rules apply to defining component attributes through __attrs__ class attributes. The above requirements are checked at runtime during class creation.

sent

Sentence the component belongs to.

tok

Head token object.

role

Syntactic role of the component head token.

sub

Tokens dependent on the head and not included in other token categories. They are not printed.

qmark

Question mark token.

exclam

Exclamation mark token.

intj

Interjection token.

neg

Negation token(s).

property idx: int

Index of the component head token.

property head: Token

Component head token.

property lead: Self

Head component of the lead phrase.

property is_lead: bool

Is the controlling phrase of the component a lead phrase.

property tokens: tuple[Token, ...]

Tokens sequence of the element.

property attrs: dict[str, Any]

Attributes dictionary.

classmethod from_data(doc: Doc, data: dict[str, Any]) Self[source]

Construct from Doc and a data dict.

to_data() dict[str, Any][source]

Dump to data dictionary.

classmethod get_comp_type(role: Role = None, pos: POS | None = None) type[Self][source]

Get component type from role or POS tag.

to_str(*, color: bool = False, role: Role | None = None, **kwds: Any) str[source]

Represent as a string.

Parameters:

role – Overrides head token role.

get_tid() tuple[int, ...][source]

Get token tuple id.

iter_token_roles(*, role: Role | None = None, bg: bool = False) Iterable[tuple[Token, Role | None]][source]

Iterate over token-role pairs.

Parameters:
  • role – Overrides head token role.

  • bg – Should tokens be marked as a background token (e.g. as a part of a subclause). This is used for graying out subclauses when printing.

is_comparable_with(other: Any) None[source]

Are self and other comparable.

similarity(other: Self | Token) float[source]

Cosine similarity to other component.

class segram.grammar.components.Verb(*args: Any, **kwds: Any)[source]

Bases: Component

Abstract base class for verb components.

All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare __slots__ = (). This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard __init_subclass__ interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. Components consists of a root token associated with (optional) additional subordinated tokens, e.g. a noun and its determiner. Default syntactic role assigned to the given component type can be defined using __role__ class attribute. This is a base class used for implementing concrete components classes. Names of all possible controlled tokens (e.g. neg for a negation token) must be defined in __tokens__ class attributes along the iheritance chain of concrete subclasses up to GrammarComponent`. Each controlled token name should be declared only once and all must be present also in __slots__. Component classes not defining any new controlled token slots have to define __tokens__ = (). The same rules apply to defining component attributes through __attrs__ class attributes. The above requirements are checked at runtime during class creation.

sent

Sentence the component belongs to.

tok

Head token object.

role

Syntactic role of the component head token.

sub

Tokens dependent on the head and not included in other token categories. They are not printed.

qmark

Question mark token.

exclam

Exclamation mark token.

intj

Interjection token.

neg

Negation token(s). neg Negation token.

Notes

It defines also tense attribute.

class segram.grammar.components.Noun(*args: Any, **kwds: Any)[source]

Bases: Component

Abstract base class for noun components.

All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare __slots__ = (). This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard __init_subclass__ interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. Components consists of a root token associated with (optional) additional subordinated tokens, e.g. a noun and its determiner. Default syntactic role assigned to the given component type can be defined using __role__ class attribute. This is a base class used for implementing concrete components classes. Names of all possible controlled tokens (e.g. neg for a negation token) must be defined in __tokens__ class attributes along the iheritance chain of concrete subclasses up to GrammarComponent`. Each controlled token name should be declared only once and all must be present also in __slots__. Component classes not defining any new controlled token slots have to define __tokens__ = (). The same rules apply to defining component attributes through __attrs__ class attributes. The above requirements are checked at runtime during class creation.

sent

Sentence the component belongs to.

tok

Head token object.

role

Syntactic role of the component head token.

sub

Tokens dependent on the head and not included in other token categories. They are not printed.

qmark

Question mark token.

exclam

Exclamation mark token.

intj

Interjection token.

neg

Negation token(s). mod Modifier tokens.

class segram.grammar.components.Prep(*args: Any, **kwds: Any)[source]

Bases: Component

Abstract base class for preposition components.

All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare __slots__ = (). This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard __init_subclass__ interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. Components consists of a root token associated with (optional) additional subordinated tokens, e.g. a noun and its determiner. Default syntactic role assigned to the given component type can be defined using __role__ class attribute. This is a base class used for implementing concrete components classes. Names of all possible controlled tokens (e.g. neg for a negation token) must be defined in __tokens__ class attributes along the iheritance chain of concrete subclasses up to GrammarComponent`. Each controlled token name should be declared only once and all must be present also in __slots__. Component classes not defining any new controlled token slots have to define __tokens__ = (). The same rules apply to defining component attributes through __attrs__ class attributes. The above requirements are checked at runtime during class creation.

sent

Sentence the component belongs to.

tok

Head token object.

role

Syntactic role of the component head token.

sub

Tokens dependent on the head and not included in other token categories. They are not printed.

qmark

Question mark token.

exclam

Exclamation mark token.

intj

Interjection token.

neg

Negation token(s). preps Chain of subsequent prepositions attached to the head token.

class segram.grammar.components.Desc(*args: Any, **kwds: Any)[source]

Bases: Component

Abstract base class for description components.

All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare __slots__ = (). This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard __init_subclass__ interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. Components consists of a root token associated with (optional) additional subordinated tokens, e.g. a noun and its determiner. Default syntactic role assigned to the given component type can be defined using __role__ class attribute. This is a base class used for implementing concrete components classes. Names of all possible controlled tokens (e.g. neg for a negation token) must be defined in __tokens__ class attributes along the iheritance chain of concrete subclasses up to GrammarComponent`. Each controlled token name should be declared only once and all must be present also in __slots__. Component classes not defining any new controlled token slots have to define __tokens__ = (). The same rules apply to defining component attributes through __attrs__ class attributes. The above requirements are checked at runtime during class creation.

sent

Sentence the component belongs to.

tok

Head token object.

role

Syntactic role of the component head token.

sub

Tokens dependent on the head and not included in other token categories. They are not printed.

qmark

Question mark token.

exclam

Exclamation mark token.

intj

Interjection token.

neg

Negation token(s). mod Modifier tokens.