segram.nlp.pipeline.base module

Base Segram module.

It implements the _Segram_ pipe component providing all main semantic grammar transformations and related auxiliary methods.

class segram.nlp.pipeline.base.Segram(nlp: Language, name: str, *, grammar: str, preprocess: Sequence[str], alias: str = 'segram', vectors: str | Language | None = None, store_data: bool = True)[source]

Bases: Pipe

Semantic grammar pipeline component.

It extends spacy token classes with semantic grammar methods and related functionalities such as custom preprocessing (e.g. merging and corrected lemmatization).

nlp

Language model object.

name

Name of the component.

extensions

Module defining custom spacy extensions.

grammar

Label of grammar implementation.

meta

Metadata dictionary with details on spacy and segram models being used.

__init__(nlp: Language, name: str, *, grammar: str, preprocess: Sequence[str], alias: str = 'segram', vectors: str | Language | None = None, store_data: bool = True) None[source]

Initialization method.

Parameters:
  • preprocess – List of segram pipeline components to use for preprocessing documents before applying the main segram pipe. If None then all available preprocessing components are used.

  • alias – Set spacy_alias in the global settings. It is used for namespacing extension attributes added by segram in order to avoid collision with other packages.

  • vectors – Vector table to use instead of the vectors provided by the main model. Must be provided by the name of a model or the model object itself, so the it is possible to keep track of the model name.

  • store_data – Should document data be stored automatically at the time of parsing.

property id: int

Hash id of the component.

static set_docattrs(doc: Doc, alias: str, meta: dict[str, Any]) None[source]

Set document attributes.

static import_extensions(grammar: str, lang: str, alias: str) SpacyExtensions[source]

Import NLP module from grammar label and language code.

Returns:

SpacyExtensions instance.

Return type:

extensions

init_extensions() None[source]

Initialize custom spacy attributes.

configure_pipeline(*components: str, **kwds: Any) None[source]

Configure secondary segram pipeline components.

Parameters:
  • *components – Pipeline component names.

  • **kwds – Passed to add_pipe().

normalize_pipe_name(pipe: str) str[source]

Normalize pipeline component name.

get_config() dict[source]

Get current config dictionary.

static get_model_name(nlp: Language) str[source]

Get language model name.

static get_model_info(nlp: Language) str[source]

Get language model information.