Name Entity Recognition using SpaCy
From given interactive voice response identify queries & corresponding action keys
Name Entity Recognition using SpaCy
Task: From given interactive voice response identify queries & corresponding action keys.
Following example will give clear idea about the task:
INPUT: Welcome in Airtel Customer Care!. To check your balance please press 1. to speak with our executive speak executive. for more information press three or speak more.
OUPUT:
1) query : To check your balance
interface : press
dtmf_key : 1
2)query: to speak with our executive
interface : speak
vocal_key : executive
3)query : more information
interface : press
dtmf_key : three
interface : speak
vocal_key : more
Notations :
- query: to show service or functionality
- interface : to denote type of action (press/enter/speak/say)
- dtmf_key : (0,9),('zero','nine') or # or any valid keypad key
- vocal_key : any word without having space
import spacy
import random
import sys
train_path = "/content/drive/MyDrive/SpaCy Project/"
sys.path.append(train_path)
import TrainingData
import re
Exploring Dataset
- The given dataset is created by ourself using GATE annotation tool.
- 323 Paragraphs is annotated with start index & end index of particular entity as below sample shows
- TraingData.py
TRAIN_DATA = TrainingData.TRAIN_DATA
print("Total Paragrahs:",len(TRAIN_DATA))
print("Sample Data\n",TRAIN_DATA[18])
nlp = spacy.blank("en")
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner)
# Add Entities label
ner.add_label("dtmf_key")
ner.add_label("interface")
ner.add_label("query")
ner.add_label("vocal_key")
nlp.begin_training()
# Loop through iterations
iterations=20
for itn in range(iterations):
print("#### iteration:{} ####".format(itn))
# Shuffle the training data
random.shuffle(TRAIN_DATA)
losses = {}
#specify batch size which will increase 4.0 to 32.0 by each step
batches = spacy.util.minibatch(TRAIN_DATA, size = spacy.util.compounding(4.0, 32.0, 1.001))
# Batch the examples and iterate over them
for batch in batches:
texts, annotations = zip(*batch)
nlp.update(
texts, # batch of texts
annotations, # batch of annotations
# drop=0.5,
#Drop out make it harder for model to memorize data
losses = losses,
)
print("Losses", losses)
print("Training Complete!")
test_para = '''
Welcome in Airtel Customer Care!.
To check your balance please press 1.
to speak with our executive speak executive.
for more information press three or speak more.
Have a good day!
'''
lines = test_para.split('.')
test_lines = []
for line in lines:
#remove new line, ", ' characters
line = re.sub(r"[\n\"\']",'',line)
test_lines.append(line)
test_lines
for line in test_lines:
#pass in model
doc = nlp(line)
print("INPUT:",line)
print("OUTPUT:")
#extract entities
for entity in doc.ents:
print(f"{entity.label_} : {entity.text}")
print()