segram.grammar.components module
Abstract base class for grammar components.
Grammar components are groups of associated tokens controlled by a root token, e.g. a verb with its auxiliary verbs.
- class segram.grammar.components.Component(*args: Any, **kwds: Any)[source]
Bases:
TokenElement
Abstract base class for grammar components.
All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare
__slots__ = ()
. This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard__init_subclass__
interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. Components consists of a root token associated with (optional) additional subordinated tokens, e.g. a noun and its determiner. Default syntactic role assigned to the given component type can be defined using__role__
class attribute. This is a base class used for implementing concrete components classes. Names of all possible controlled tokens (e.g.neg
for a negation token) must be defined in__tokens__
class attributes along the iheritance chain of concrete subclasses up toGrammarComponent`
. Each controlled token name should be declared only once and all must be present also in__slots__
. Component classes not defining any new controlled token slots have to define__tokens__ = ()
. The same rules apply to defining component attributes through__attrs__
class attributes. The above requirements are checked at runtime during class creation.- sent
Sentence the component belongs to.
- tok
Head token object.
- role
Syntactic role of the component head token.
- sub
Tokens dependent on the head and not included in other token categories. They are not printed.
- qmark
Question mark token.
- exclam
Exclamation mark token.
- intj
Interjection token.
- neg
Negation token(s).
- property idx: int
Index of the component head token.
- property lead: Self
Head component of the lead phrase.
- property is_lead: bool
Is the controlling phrase of the component a lead phrase.
- property attrs: dict[str, Any]
Attributes dictionary.
- classmethod from_data(doc: Doc, data: dict[str, Any]) Self [source]
Construct from
Doc
and a data dict.
- classmethod get_comp_type(role: Role = None, pos: POS | None = None) type[Self] [source]
Get component type from role or POS tag.
- to_str(*, color: bool = False, role: Role | None = None, **kwds: Any) str [source]
Represent as a string.
- Parameters:
role – Overrides head token role.
- iter_token_roles(*, role: Role | None = None, bg: bool = False) Iterable[tuple[Token, Role | None]] [source]
Iterate over token-role pairs.
- Parameters:
role – Overrides head token role.
bg – Should tokens be marked as a background token (e.g. as a part of a subclause). This is used for graying out subclauses when printing.
- class segram.grammar.components.Verb(*args: Any, **kwds: Any)[source]
Bases:
Component
Abstract base class for verb components.
All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare
__slots__ = ()
. This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard__init_subclass__
interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. Components consists of a root token associated with (optional) additional subordinated tokens, e.g. a noun and its determiner. Default syntactic role assigned to the given component type can be defined using__role__
class attribute. This is a base class used for implementing concrete components classes. Names of all possible controlled tokens (e.g.neg
for a negation token) must be defined in__tokens__
class attributes along the iheritance chain of concrete subclasses up toGrammarComponent`
. Each controlled token name should be declared only once and all must be present also in__slots__
. Component classes not defining any new controlled token slots have to define__tokens__ = ()
. The same rules apply to defining component attributes through__attrs__
class attributes. The above requirements are checked at runtime during class creation.- sent
Sentence the component belongs to.
- tok
Head token object.
- role
Syntactic role of the component head token.
- sub
Tokens dependent on the head and not included in other token categories. They are not printed.
- qmark
Question mark token.
- exclam
Exclamation mark token.
- intj
Interjection token.
- neg
Negation token(s). neg Negation token.
Notes
It defines also
tense
attribute.
- class segram.grammar.components.Noun(*args: Any, **kwds: Any)[source]
Bases:
Component
Abstract base class for noun components.
All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare
__slots__ = ()
. This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard__init_subclass__
interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. Components consists of a root token associated with (optional) additional subordinated tokens, e.g. a noun and its determiner. Default syntactic role assigned to the given component type can be defined using__role__
class attribute. This is a base class used for implementing concrete components classes. Names of all possible controlled tokens (e.g.neg
for a negation token) must be defined in__tokens__
class attributes along the iheritance chain of concrete subclasses up toGrammarComponent`
. Each controlled token name should be declared only once and all must be present also in__slots__
. Component classes not defining any new controlled token slots have to define__tokens__ = ()
. The same rules apply to defining component attributes through__attrs__
class attributes. The above requirements are checked at runtime during class creation.- sent
Sentence the component belongs to.
- tok
Head token object.
- role
Syntactic role of the component head token.
- sub
Tokens dependent on the head and not included in other token categories. They are not printed.
- qmark
Question mark token.
- exclam
Exclamation mark token.
- intj
Interjection token.
- neg
Negation token(s). mod Modifier tokens.
- class segram.grammar.components.Prep(*args: Any, **kwds: Any)[source]
Bases:
Component
Abstract base class for preposition components.
All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare
__slots__ = ()
. This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard__init_subclass__
interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. Components consists of a root token associated with (optional) additional subordinated tokens, e.g. a noun and its determiner. Default syntactic role assigned to the given component type can be defined using__role__
class attribute. This is a base class used for implementing concrete components classes. Names of all possible controlled tokens (e.g.neg
for a negation token) must be defined in__tokens__
class attributes along the iheritance chain of concrete subclasses up toGrammarComponent`
. Each controlled token name should be declared only once and all must be present also in__slots__
. Component classes not defining any new controlled token slots have to define__tokens__ = ()
. The same rules apply to defining component attributes through__attrs__
class attributes. The above requirements are checked at runtime during class creation.- sent
Sentence the component belongs to.
- tok
Head token object.
- role
Syntactic role of the component head token.
- sub
Tokens dependent on the head and not included in other token categories. They are not printed.
- qmark
Question mark token.
- exclam
Exclamation mark token.
- intj
Interjection token.
- neg
Negation token(s). preps Chain of subsequent prepositions attached to the head token.
- class segram.grammar.components.Desc(*args: Any, **kwds: Any)[source]
Bases:
Component
Abstract base class for description components.
All grammar classes must be defined as slots classes. This is necessary for ensuring low-memory footprint and better computational efficiency. Even classes with no new slots need to declare
__slots__ = ()
. This requirement is checked during class construction. Other class-specific requirements of this sort as well as their related validation checks may be implemented on specialized grammar classes using the standard__init_subclass__
interface. This allows abstract base classes further down the inheritance chain to check for more complex requirements as well as apply dynamic class customizations. Components consists of a root token associated with (optional) additional subordinated tokens, e.g. a noun and its determiner. Default syntactic role assigned to the given component type can be defined using__role__
class attribute. This is a base class used for implementing concrete components classes. Names of all possible controlled tokens (e.g.neg
for a negation token) must be defined in__tokens__
class attributes along the iheritance chain of concrete subclasses up toGrammarComponent`
. Each controlled token name should be declared only once and all must be present also in__slots__
. Component classes not defining any new controlled token slots have to define__tokens__ = ()
. The same rules apply to defining component attributes through__attrs__
class attributes. The above requirements are checked at runtime during class creation.- sent
Sentence the component belongs to.
- tok
Head token object.
- role
Syntactic role of the component head token.
- sub
Tokens dependent on the head and not included in other token categories. They are not printed.
- qmark
Question mark token.
- exclam
Exclamation mark token.
- intj
Interjection token.
- neg
Negation token(s). mod Modifier tokens.