How to validate kannada words?

In this article we will discuss the Unicode range validation using Python regular expressions and JavaScript regular expressions. For illustration i have used Kannada, same can be applied to other languages as well.

Where it can be used?

  • Suppose users can submit articles in your website and you need to check every article should have more than 60% Kannada words.
  • Validation for localized websites
  • and many more...

Kannada Unicode range as per the standard is u0C80 to u0CFF, refer Unicode.org website

Using Python

# file: validateKannadaWords.py
# -*- coding: utf-8 -*-
import re

# function checks the input word is Kannada word or not
# @params word - Input word to validate
# @returns True - If success, False - If failure
# @author Aravinda VK
# @date - 09-Nov-2008

def isKannadaWord(word):
    rangeStart = ur"\u0C80"
    rangeEnd = ur"\u0CFF"
    pattern = rangeStart + '-' + rangeEnd
    if re.match('^[' + pattern + ']+$',word) != None:
        return True
    else:
        return False

"ur" in above code refers to raw unicode.

How to use

myString = u'ಅರವಿಂದ ಒಲವು ಅವಳು ಮತ್ತು ನಾಳೆ abcd ಹೆಹೆ '
wordsList = myString.split()
for eachWord in wordsList:
    if isKannadaWord(eachWord) :
        print eachWord + ' is a Kannada word'
    else:
        print eachWord + ' is not a Kannada word'

Using Javascript

// function checks the input word is Kannada word or not
// @params word - Input word to validate
// @returns True - If success, False - If failure
// @author Aravinda VK
// @date - 09-Nov-2008

function isKannadaWord(word) {
    var re = new RegExp(/^[\u0C80-\u0CFF]+$/);
    if (word.match(re))  {
        return true;
    }
    else {
        return false;
    }
}

How to use

var inputwords = new Array("ಅರವಿಂದ", "ಒಲವು", "ಅವಳು", "ಮತ್ತು", "ನಾಳೆ", "abcd", "ಹೆಹೆ");
for(i=0;i<2;i++) {
    if (isKannadaWord(inputwords[i])) {
        document.write("<b>" + inputwords[i] + "</b> is a valid Kannada Word<br/>");
    }
    else
        document.write("<b>" + inputwords[i] + "</b> is Not a Kannada Word<br/>");
}

Comments !