What is Lemmatization ?

sentence = 'went mice gone started best worst well feet universal universe boxes books geese striped coming'
  1. WordNet LEemmatizer with NLTK
  2. WordNet (with POS tag)
  3. TextBlob
  4. TextBlob (with POS tag)
  5. spaCy
  6. Pattern

1. Wordnet Lemmatizer with NLTK

  • It is one of the earliest and most commonly used lemmatizer technique.
  • NLTK offers an interface to it, but you have to download it first in order to use it. Follow the below instructions to install nltk.
import nltk
nltk.download('wordnet')
from nltk.stem import WordNetLemmatizer


lemmatizer = WordNetLemmatizer()

word_list = nltk.word_tokenize(sentence)
word_list


lemmatized_output = ' '.join([lemmatizer.lemmatize(w) for w in word_list])
lemmatized_output
sentence = 'went mice gone started best worst well feet universal universe boxes books geese striped coming'Out[58]: 'went mouse gone started best worst well foot universal universe box book goose striped coming'

2. Wordnet Lemmatizer (Part of Speech (POS ) tag)

import nltk
from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer


def get_wordnet_pos(word):
"""Map POS tag to first character lemmatize() accepts"""
tag = nltk.pos_tag([word])[0][1][0].upper()
tag_dict = {"J": wordnet.ADJ,
"N": wordnet.NOUN,
"V": wordnet.VERB,
"R": wordnet.ADV}

return tag_dict.get(tag, wordnet.NOUN)


# 1. Init Lemmatizer
lemmatizer = WordNetLemmatizer()

[lemmatizer.lemmatize(w, get_wordnet_pos(w)) for w in nltk.word_tokenize(sentence)]
print([lemmatizer.lemmatize(w, get_wordnet_pos(w)) for w in nltk.word_tokenize(sentence)])
sentence = 'went mice gone started best worst well feet universal universe boxes books geese striped coming'# Out[58]: ['go', 'mouse', 'go', 'start', 'best', 'bad', 'well', 'foot', 'universal', 'universe', 'box', 'book', 'geese', 'strip', 'come']

3. spaCy Lemmatization

import spacy

# ' spacy download en_core_web_sm 'use this code through terminal
nlp = spacy.load("en_core_web_sm")
doc = nlp(sentence)

# Extract the lemma for each token and join
" ".join([token.lemma_ for token in doc])

# sentence = 'went mice gone started best worst well feet universal universe boxes books geese striped coming'
# Out[75]: 'go mouse gone start well worst well foot universal universe box book geese stripe come'

4. TextBlob Lemmatizer

from textblob import TextBlob, Word
sent = TextBlob(sentence)
" ". join([w.lemmatize() for w in sent.words])
# sentence = 'went mice gone started best worst well feet universal universe boxes books geese striped coming'# Out[82]: 'went mouse gone started best worst well foot universal universe box book goose striped coming'

5. TextBlob Lemmatizer with appropriate POS tag(*)

from textblob import TextBlob, Worddef lemmatize_with_postag(sentence):
sent = TextBlob(sentence)
tag_dict = {"J": 'a',
"N": 'n',
"V": 'v',
"R": 'r'}
words_and_tags = [(w, tag_dict.get(pos[0], 'n')) for w, pos in sent.tags]
lemmatized_list = [wd.lemmatize(tag) for wd, tag in words_and_tags]
return " ".join(lemmatized_list)

# Lemmatize
lemmatize_with_postag(sentence)
# sentence = 'went mice gone started best worst well feet universal universe boxes books geese striped coming'Out[83]: 'go mouse go start best worst well foot universal universe box book geese strip come'

6.Pattern

# pip install pattern

import pattern
from pattern.en import lemma, lexeme

" ".join([lemma(wd) for wd in sentence.split()])
sentence = 'went mice gone started best worst well feet universal universe boxes books geese striped coming'Out[84]: 'go mice go start best worst well feet universal universe box book geese stripe come'

Comparison

  • We can see the above 6 methods in the comparison table for inflected words.
  • The marks in this table are not valid for all words of all related methods.
  • It has been prepared for general information purposes only.
  • For example, spaCy method could not convert geese to goose in his method, but converted feet to foot. (both of them are irregular plurals)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store