如何自学网站建设书籍,吉林省建设网站,电脑公司网站模板下载,网站空间可以自己做吗当使用Spacy进行自然语言处理时#xff0c;常见的用例包括文本分词、命名实体识别、词性标注、句法分析等。下面是一些常见的使用例子及相应的代码#xff1a;
文本分词#xff08;Tokenization#xff09;#xff1a;
将文本划分成单词或标点符号等基本单元。
import …
当使用Spacy进行自然语言处理时常见的用例包括文本分词、命名实体识别、词性标注、句法分析等。下面是一些常见的使用例子及相应的代码
文本分词Tokenization
将文本划分成单词或标点符号等基本单元。
import spacy# 加载英文模型
nlp spacy.load(en_core_web_sm)
# 文本分词
text This is a sample sentence.
doc nlp(text)# 输出分词结果
for token in doc:print(token.text)
运行结果
This
is
a
sample
sentence
.
命名实体识别Named Entity Recognition
识别文本中的命名实体如人名、地名、组织机构等。
import spacy# 加载英文模型
nlp spacy.load(en_core_web_sm)
# 文本
text Apple is a big company, headquartered in Cupertino, California.
# 处理文本
doc nlp(text)
# 提取命名实体
for ent in doc.ents:print(ent.text, ent.label_)
运行结果:
Apple ORG
Cupertino GPE
California GPE
词性标注Part-of-speech Tagging
标注文本中每个词的词性
import spacy# 加载英文模型
nlp spacy.load(en_core_web_sm)# 文本
text This is a sample sentence.# 处理文本
doc nlp(text)# 输出词性标注结果
for token in doc:print(token.text, token.pos_)
运行结果
This PRON
is AUX
a DET
sample NOUN
sentence NOUN
. PUNCT
句法分析Dependency Parsing
分析文本中单词之间的依赖关系。
import spacy# 加载英文模型
nlp spacy.load(en_core_web_sm)# 文本
text Apple is looking at buying U.K. startup for $1 billion# 处理文本
doc nlp(text)# 输出句法依赖关系
for token in doc:print(token.text, token.dep_, token.head.text, token.head.pos_,[child for child in token.children])
运行结果
Apple nsubj looking VERB []
is aux looking VERB []
looking ROOT looking VERB [Apple, is, at, startup]
at prep looking VERB [buying]
buying pcomp at ADP [U.K.]
U.K. dobj buying VERB []
startup dep looking VERB [for]
for prep startup NOUN [billion]
$ quantmod billion NUM []
1 compound billion NUM []
billion pobj for ADP [$, 1] 英文分句
import spacy
nlp spacy.load(en_core_web_sm)
nlp.add_pipe(sentencizer)
doc nlp(This is a sentence. This is another sentence.)
for sentence in doc.sents:print(sentence)
运行结果
This is a sentence.
This is another sentence.
关键字抽取
import spacynlp spacy.load(en_core_web_sm)
text Please ignore that NLLB is not made to translate this large number of tokens at once. Again, I am more interest in the computational limits I have.I already use torch.no_grad() and put the model in evaluation mode which I read online should safe some memory. My full code to run the inference looks like this:doc nlp(text)
keywords [token.text for token in doc if token.pos_ in [NOUN, PROPN]]
print(keywords)
运行结果
[NLLB, number, tokens, interest, limits, torch.no_grad, model, evaluation, mode, memory, code, inference]
句子相似度的比较
import spacy
nlp spacy.load(en_core_web_lg)doc1 nlp(uthe person wear red T-shirt)
doc2 nlp(uthis person is walking)
doc3 nlp(uthe boy wear red T-shirt)print(doc1.similarity(doc2))
print(doc1.similarity(doc3))
print(doc2.similarity(doc3))
运行结果
0.7003971105290047
0.9671912343259517
0.6121211244876517 Model Architectures · spaCy API Documentation