In this article we will discuss the Unicode range validation using Python regular expressions and JavaScript regular expressions. For illustration i have used Kannada, same can be applied to other languages as well.
Where it can be used?
- Suppose users can submit articles in your website and you need to check every article should have more than 60% Kannada words.
- Validation for localized websites
- and many more...
Kannada Unicode range as per the standard is u0C80 to u0CFF, refer Unicode.org website
Using Python
# file: validateKannadaWords.py
# -*- coding: utf-8 -*-
import re
# function checks the input word is Kannada word or not
# @params word - Input word to validate
# @returns True - If success, False - If failure
# @author Aravinda VK
# @date - 09-Nov-2008
def isKannadaWord(word):
rangeStart = ur"\u0C80"
rangeEnd = ur"\u0CFF"
pattern = rangeStart + '-' + rangeEnd
if re.match('^[' + pattern + ']+$',word) != None:
return True
else:
return False
"ur" in above code refers to raw unicode.
How to use
myString = u'ಅರವಿಂದ ಒಲವು ಅವಳು ಮತ್ತು ನಾಳೆ abcd ಹೆಹೆ '
wordsList = myString.split()
for eachWord in wordsList:
if isKannadaWord(eachWord) :
print eachWord + ' is a Kannada word'
else:
print eachWord + ' is not a Kannada word'
Using Javascript
// function checks the input word is Kannada word or not
// @params word - Input word to validate
// @returns True - If success, False - If failure
// @author Aravinda VK
// @date - 09-Nov-2008
function isKannadaWord(word) {
var re = new RegExp(/^[\u0C80-\u0CFF]+$/);
if (word.match(re)) {
return true;
}
else {
return false;
}
}
How to use
var inputwords = new Array("ಅರವಿಂದ", "ಒಲವು", "ಅವಳು", "ಮತ್ತು", "ನಾಳೆ", "abcd", "ಹೆಹೆ");
for(i=0;i<2;i++) {
if (isKannadaWord(inputwords[i])) {
document.write("<b>" + inputwords[i] + "</b> is a valid Kannada Word<br/>");
}
else
document.write("<b>" + inputwords[i] + "</b> is Not a Kannada Word<br/>");
}
Comments !