Page 1 of 1

Email validation code using Pattern Matching

Posted: Mon Jul 30, 2012 7:33 pm
by Roland_Yonaba
I was looking into efficient but simple ways to handle e-mail validation using Lua's pattern matching.
I came into that :

Code: Select all

print(string.match(email,'[(%w+)%p*]+@[%w+%p*]+%.%a+$'))
But, as i'm not an expert in terms of pattern matching, I guess the code above is likely to validate wrong addresses, or invalidate a good one.
Any wise advises ?

Re: Email validation code using Pattern Matching

Posted: Mon Jul 30, 2012 9:05 pm
by Kadoba
Found this in string recipes on the lua wiki.

Code: Select all

email="alex@it-rfc.de"
if (email:match("[A-Za-z0-9%.%%%+%-]+@[A-Za-z0-9%.%%%+%-]+%.%w%w%w?%w?")) then
  print(email .. " is a valid email address")
end

Re: Email validation code using Pattern Matching

Posted: Mon Jul 30, 2012 9:08 pm
by flashkot
first: you should know how valid email address looks like
second: check Lua docs

5 minutes ago i had known nothing about Lua's pattern matching. Now, after reading this two short texts, what we can do with your email pattern?

Lets assume what you will never see monster-addreses like examples on wikipedia. Nothing more than roland.deschain@gilead.gov or Darth_Vader@DeathStar.mil
And also we will match whole string, right?

In this case, i think your pattern should be something like

Code: Select all

print(string.match(email,'^[%w+%.%-_]+@[%w+%.%-_]+%.%a%a+$'))

Re: Email validation code using Pattern Matching

Posted: Mon Jul 30, 2012 9:46 pm
by Roland_Yonaba
Kadoba wrote:Found this in string recipes on the lua wiki.

Code: Select all

email="alex@it-rfc.de"
if (email:match("[A-Za-z0-9%.%%%+%-]+@[A-Za-z0-9%.%%%+%-]+%.%w%w%w?%w?")) then
  print(email .. " is a valid email address")
end
Thanks Kadoba...Well, that's a bit complex to me...I get the general idea, but what does the "%.%%%+%-" part in the set "[A-Za-z0-9%.%%%+%-]+" stands for ?
I can see a sequence of alphanumeric characters, both upper and lower case (A-Za-z0-9), a dot character (%.)... But the remaining part looks unclear to me...

Re: Email validation code using Pattern Matching

Posted: Mon Jul 30, 2012 10:13 pm
by Nixola

Code: Select all

%. -- dot
%% -- % symbol
%+ -- + simbol
%- -- - symbol

Re: Email validation code using Pattern Matching

Posted: Mon Jul 30, 2012 10:47 pm
by Roland_Yonaba
Thanks Nixola. That makes sense.
This lecture was a lot helpul. It summarizes the standards and syntax email addresses should match.
For instance, I can notice that all the patterns given before (mine and Kadoba's link) validates addresses with consecutive dots...
Well I think I'll have to rewrite this to meet RFC standards.

Re: Email validation code using Pattern Matching

Posted: Tue Jul 31, 2012 12:17 am
by Inny
I come from the school that says anything@anything is legitimate, that the appearance of the @ symbol in the middle of the string somewhere is what makes it an email address. Since that's not exactly helpful, the better advice is to not reject data in the email field based on a regular expression, because fakeaddress@example.com would pass any reasonable regex, but not be a legitimate address. Instead, your address authentication code has to be written to accommodate very large latencies, i.e. make the account have an unverified state where they're limited in what they can do.

Re: Email validation code using Pattern Matching

Posted: Tue Jul 31, 2012 8:49 am
by Roland_Yonaba
@Inny: You're totally right. Well, I just need to validate the e-mail address just checking the syntax.

@flashkot:
flashkot wrote:

Code: Select all

print(string.match(email,'^[%w+%.%-_]+@[%w+%.%-_]+%.%a%a+$'))
Good point. It gets closer to what I proposed first.
Whatever, this regex will validate strings containing for instance two or more punctuation characters following themselves...
Something like Darth..Vader@DeathStar.mil

Re: Email validation code using Pattern Matching

Posted: Tue Jul 31, 2012 7:00 pm
by flashkot
Roland_Yonaba wrote:
flashkot wrote:

Code: Select all

print(string.match(email,'^[%w+%.%-_]+@[%w+%.%-_]+%.%a%a+$'))
Good point. It gets closer to what I proposed first.
Whatever, this regex will validate strings containing for instance two or more punctuation characters following themselves...
Something like Darth..Vader@DeathStar.mil
Unfortunately, patterns in Lua are not so powerful like RegEx.

So, i can suggest two solutions: add second check for repeated punctuation or create more complex pattern like:

Code: Select all

print(string.match(email,'^[%w+%-_]*%.?[%w+%-_]*%.?[%w+%-_]*%.?[%w+%-_]*%.?[%w+%-_]*%.?[%w+%-_]*%.?[%w+%-_]+@[%w+%-_]*%.?[%w+%-_]*%.?[%w+%-_]*%.?[%w+%-_]*%.?[%w+%-_]*%.?[%w+%-_]+%.%a%a+$'))
here i tries to emulate ([a-z0-9]\.)* from RegExp. (as i understand, in Lua we can't repeat part of pattern. Parenthesis only for capture here. But i didn't tested this)

But drawback of this - it only match as much of abcde. parts as how many times you repeated [%w+%-_]*%.?

Re: Email validation code using Pattern Matching

Posted: Thu Aug 02, 2012 5:51 pm
by Roland_Yonaba
Well, I took a look at it, then I finally came up with a set of rules.
My implementation does not meet with all of the rules stated in RFC standards, though. But I guess it can handle most of email addresses actually existing.
And that's enough, to me.
From the wikipedia page about email addresses, I considered just the following set of rules to be enough.
For the local - part:
  • Up to 64 characters long
  • Uppercase and lowercase English letters (a–z, A–Z) (ASCII: 65–90, 97–122)
  • Digits 0 to 9 (ASCII: 48–57)
  • Characters !#$%&'*+-/=?^_`{|}~ (ASCII: 33, 35–39, 42, 43, 45, 47, 61, 63, 94–96, 123–126)
  • Character . (dot, period, full stop) (ASCII: 46) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively (e.g. John..Doe@example.com is not allowed.).
For the domain-part :
The domain name part of an email address has to conform to strict guidelines: it must match the requirements for a hostname, consisting of letters, digits, hyphens and dots.

Code: Select all

function _.isEmail(str)
	local _,nAt = str:gsub('@','@') -- Counts the number of '@' symbol
	if nAt > 1 or nAt == 0 or str:len() > 254 or str:find('%s') then return false end
	local localPart = _.strLeft(str,'@') -- Returns the substring before '@' symbol
	local domainPart = _.strRight(str,'@') -- Returns the substring after '@' symbol
	if not localPart or not domainPart then return false end

	if not localPart:match("[%w!#%$%%&'%*%+%-/=%?^_`{|}~]+") or (localPart:len() > 64) then return false end
	if localPart:match('^%.+') or localPart:match('%.+$') or localPart:find('%.%.+') then return false end

	if not domainPart:match('[%w%-_]+%.%a%a+$') or domainPart:len() > 253 then return false end
	local fDomain = _.strLeftBack(domainPart,'%.') -- Returns the substring in the domain-part before the last (dot) character
	if fDomain:match('^[_%-%.]+') or fDomain:match('[_%-%.]+$') or fDomain:find('%.%.+') then return false end

	return true
end