Name Entity Recognition using SpaCy

Task: From given interactive voice response identify queries & corresponding action keys.
Following example will give clear idea about the task:
INPUT: Welcome in Airtel Customer Care!. To check your balance please press 1. to speak with our executive speak executive. for more information press three or speak more.

OUPUT:
1) query : To check your balance
interface : press
dtmf_key : 1

2)query: to speak with our executive
interface : speak
vocal_key : executive

3)query : more information
interface : press
dtmf_key : three
interface : speak
vocal_key : more

Notations :

  • query: to show service or functionality
  • interface : to denote type of action (press/enter/speak/say)
  • dtmf_key : (0,9),('zero','nine') or # or any valid keypad key
  • vocal_key : any word without having space
import spacy
import random
import sys
train_path = "/content/drive/MyDrive/SpaCy Project/"
sys.path.append(train_path)
import TrainingData
import re

Exploring Dataset

  • The given dataset is created by ourself using GATE annotation tool.
  • 323 Paragraphs is annotated with start index & end index of particular entity as below sample shows
  • TraingData.py
TRAIN_DATA = TrainingData.TRAIN_DATA
print("Total Paragrahs:",len(TRAIN_DATA))
print("Sample Data\n",TRAIN_DATA[18])
Total Paragrahs: 323
Sample Data
 ("To repeat the previous order press 2 Hi bogz happy to hear from you again! If this is regarding your previous purchase football press 1 Thank you for calling The Operations Tech Company where Technology and business come together If you would like to talk to the receptionist press '0' or stay on the line and one of our friendly staff members will assist you For Sales and Marketing press '4'", {'entities': [(3, 28, 'query'), (29, 34, 'interface'), (35, 36, 'dtmf_key'), (86, 127, 'query'), (128, 133, 'interface'), (134, 135, 'dtmf_key'), (250, 275, 'query'), (276, 281, 'interface'), (282, 285, 'dtmf_key'), (364, 383, 'query'), (384, 389, 'interface'), (390, 393, 'dtmf_key')]})

Creating SpaCy Model:

nlp = spacy.blank("en")
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner)

# Add Entities label 
ner.add_label("dtmf_key") 
ner.add_label("interface")
ner.add_label("query")
ner.add_label("vocal_key")

Training

nlp.begin_training()

# Loop through iterations
iterations=20
for itn in range(iterations):
    print("#### iteration:{} ####".format(itn))
    # Shuffle the training data
    random.shuffle(TRAIN_DATA)
    losses = {}
    #specify batch size which will increase 4.0 to 32.0 by each step
    batches = spacy.util.minibatch(TRAIN_DATA, size = spacy.util.compounding(4.0, 32.0, 1.001))
    # Batch the examples and iterate over them
    for batch in batches:
            texts, annotations = zip(*batch)
            nlp.update(
                texts,  # batch of texts
                annotations,  # batch of annotations
                # drop=0.5,  
                #Drop out make it harder for model to memorize data
                losses = losses,
            )
  
    print("Losses", losses)
print("Training Complete!")
#### iteration:0 ####
Losses {'ner': 4768.9891713176}
#### iteration:1 ####
Losses {'ner': 1486.7596520159605}
#### iteration:2 ####
Losses {'ner': 1074.1071048149015}
#### iteration:3 ####
Losses {'ner': 850.1767826321648}
#### iteration:4 ####
Losses {'ner': 752.1742569444432}
#### iteration:5 ####
Losses {'ner': 669.864118618759}
#### iteration:6 ####
Losses {'ner': 524.6499961741566}
#### iteration:7 ####
Losses {'ner': 471.66053010489964}
#### iteration:8 ####
Losses {'ner': 576.8265857268041}
#### iteration:9 ####
Losses {'ner': 514.8405227193435}
#### iteration:10 ####
Losses {'ner': 416.5718498184934}
#### iteration:11 ####
Losses {'ner': 373.2867328748377}
#### iteration:12 ####
Losses {'ner': 357.24039446139324}
#### iteration:13 ####
Losses {'ner': 274.7780986793721}
#### iteration:14 ####
Losses {'ner': 285.58639339756184}
#### iteration:15 ####
Losses {'ner': 238.52276656678885}
#### iteration:16 ####
Losses {'ner': 213.5858629838169}
#### iteration:17 ####
Losses {'ner': 268.84617166230623}
#### iteration:18 ####
Losses {'ner': 288.95139569898976}
#### iteration:19 ####
Losses {'ner': 201.44840153135783}
Training Complete!

Testing

test_para = '''
Welcome in Airtel Customer Care!.
To check your balance please press 1.
to speak with our executive speak executive.
for more information press three or speak more.
Have a good day!
'''
lines = test_para.split('.')
test_lines = []
for line in lines:
  #remove new line, ", ' characters
  line = re.sub(r"[\n\"\']",'',line)
  test_lines.append(line)
test_lines
['Welcome in Airtel Customer Care!',
 'To check your balance please press 1',
 'to speak with our executive speak executive',
 'for more information press three or speak more',
 'Have a good day!']
for line in test_lines:
  #pass in model
  doc = nlp(line)
  print("INPUT:",line)
  print("OUTPUT:")
  #extract entities
  for entity in doc.ents:
    print(f"{entity.label_} : {entity.text}")
  print()
INPUT: Welcome in Airtel Customer Care!
OUTPUT:

INPUT: To check your balance please press 1
OUTPUT:
query : check your balance
interface : press
dtmf_key : 1

INPUT: to speak with our executive speak executive
OUTPUT:
query : speak with our executive
interface : speak
vocal_key : executive

INPUT: for more information press three or speak more
OUTPUT:
query : more information
interface : press
dtmf_key : three
interface : speak
vocal_key : more

INPUT: Have a good day!
OUTPUT: