segram.nlp.pipeline.base module

Base Segram module.

It implements the _Segram_ pipe component providing all main semantic grammar transformations and related auxiliary methods.

class segram.nlp.pipeline.base.Segram(nlp: Language, name: str, *, grammar: str, preprocess: Sequence[str], alias: str = 'segram', vectors: str | Language | None = None, store_data: bool = True)[source]

Bases: Pipe

Semantic grammar pipeline component.

It extends spacy token classes with semantic grammar methods and related functionalities such as custom preprocessing (e.g. merging and corrected lemmatization).

nlp: Language model object.

name: Name of the component.

extensions: Module defining custom spacy extensions.

grammar: Label of grammar implementation.

meta: Metadata dictionary with details on spacy and segram models being used.

__init__(nlp: Language, name: str, *, grammar: str, preprocess: Sequence[str], alias: str = 'segram', vectors: str | Language | None = None, store_data: bool = True) → None[source]

Initialization method.

Parameters:

preprocess – List of segram pipeline components to use for preprocessing documents before applying the main segram pipe. If None then all available preprocessing components are used.
alias – Set spacy_alias in the global settings. It is used for namespacing extension attributes added by segram in order to avoid collision with other packages.
vectors – Vector table to use instead of the vectors provided by the main model. Must be provided by the name of a model or the model object itself, so the it is possible to keep track of the model name.
store_data – Should document data be stored automatically at the time of parsing.

property id: int: Hash id of the component.

static set_docattrs(doc: Doc, alias: str, meta: dict[str, Any]) → None[source]: Set document attributes.

static import_extensions(grammar: str, lang: str, alias: str) → SpacyExtensions[source]

Import NLP module from grammar label and language code.

Returns:: SpacyExtensions instance.
Return type:: extensions

init_extensions() → None[source]: Initialize custom spacy attributes.

configure_pipeline(*components: str, **kwds: Any) → None[source]

Configure secondary segram pipeline components.

Parameters:

*components – Pipeline component names.
**kwds – Passed to add_pipe().

normalize_pipe_name(pipe: str) → str[source]: Normalize pipeline component name.

get_config() → dict[source]: Get current config dictionary.

static get_model_name(nlp: Language) → str[source]: Get language model name.

static get_model_info(nlp: Language) → str[source]: Get language model information.