Tokenization using Python NLKT.

Tokenization is the process in which text is divided into smaller parts called tokens. Applications like Text classification, language translation, chatbot, sentimental analysis, Tokenization plays vital role.

Tokenize module has sub-modules as follow

word Tokenize
sentence Tokenize

word Tokenize - It used to divide the text into words.
sentence Tokenize - It used to divide the text into sentence.

from nltk.tokenize import word_tokenize

text_sample = " Hi this is nltk! It used for tokenization."

word = word_tokenize(text_sample)
sent = sent_tokenize(text_sample)

print(word)
print(sent)

Output:

['Hi', 'this', 'is', 'nltk', '!', 'It', 'used', 'for', 'tokenization', '.']
['Hi this is nltk!', 'It used for tokenization.']

Python

Tokenization using Python NLKT.

Search

Categories