Page 1 of 1
Email validation code using Pattern Matching
Posted: Mon Jul 30, 2012 7:33 pm
by Roland_Yonaba
I was looking into efficient but simple ways to handle e-mail validation using Lua's pattern matching.
I came into that :
Code: Select all
print(string.match(email,'[(%w+)%p*]+@[%w+%p*]+%.%a+$'))
But, as i'm not an expert in terms of pattern matching, I guess the code above is likely to validate wrong addresses, or invalidate a good one.
Any wise advises ?
Re: Email validation code using Pattern Matching
Posted: Mon Jul 30, 2012 9:05 pm
by Kadoba
Found this in
string recipes on the lua wiki.
Code: Select all
email="alex@it-rfc.de"
if (email:match("[A-Za-z0-9%.%%%+%-]+@[A-Za-z0-9%.%%%+%-]+%.%w%w%w?%w?")) then
print(email .. " is a valid email address")
end
Re: Email validation code using Pattern Matching
Posted: Mon Jul 30, 2012 9:08 pm
by flashkot
first: you should know
how valid email address looks like
second: check
Lua docs
5 minutes ago i had known nothing about Lua's pattern matching. Now, after reading this two short texts, what we can do with your email pattern?
Lets assume what you will never see monster-addreses like examples on wikipedia. Nothing more than
roland.deschain@gilead.gov or
Darth_Vader@DeathStar.mil
And also we will match whole string, right?
In this case, i think your pattern should be something like
Code: Select all
print(string.match(email,'^[%w+%.%-_]+@[%w+%.%-_]+%.%a%a+$'))
Re: Email validation code using Pattern Matching
Posted: Mon Jul 30, 2012 9:46 pm
by Roland_Yonaba
Kadoba wrote:Found this in
string recipes on the lua wiki.
Code: Select all
email="alex@it-rfc.de"
if (email:match("[A-Za-z0-9%.%%%+%-]+@[A-Za-z0-9%.%%%+%-]+%.%w%w%w?%w?")) then
print(email .. " is a valid email address")
end
Thanks Kadoba...Well, that's a bit complex to me...I get the general idea, but what does the "%.%%%+%-" part in the set "[A-Za-z0-9%.%%%+%-]+" stands for ?
I can see a sequence of alphanumeric characters, both upper and lower case (A-Za-z0-9), a dot character (%.)... But the remaining part looks unclear to me...
Re: Email validation code using Pattern Matching
Posted: Mon Jul 30, 2012 10:13 pm
by Nixola
Code: Select all
%. -- dot
%% -- % symbol
%+ -- + simbol
%- -- - symbol
Re: Email validation code using Pattern Matching
Posted: Mon Jul 30, 2012 10:47 pm
by Roland_Yonaba
Thanks Nixola. That makes sense.
This
lecture was a lot helpul. It summarizes the standards and syntax email addresses should match.
For instance, I can notice that all the patterns given before (mine and Kadoba's link) validates addresses with consecutive dots...
Well I think I'll have to rewrite this to meet
RFC standards.
Re: Email validation code using Pattern Matching
Posted: Tue Jul 31, 2012 12:17 am
by Inny
I come from the school that says anything@anything is legitimate, that the appearance of the @ symbol in the middle of the string somewhere is what makes it an email address. Since that's not exactly helpful, the better advice is to not reject data in the email field based on a regular expression, because
fakeaddress@example.com would pass any reasonable regex, but not be a legitimate address. Instead, your address authentication code has to be written to accommodate very large latencies, i.e. make the account have an unverified state where they're limited in what they can do.
Re: Email validation code using Pattern Matching
Posted: Tue Jul 31, 2012 8:49 am
by Roland_Yonaba
@Inny: You're totally right. Well, I just need to validate the e-mail address just checking the syntax.
@flashkot:
flashkot wrote:
Code: Select all
print(string.match(email,'^[%w+%.%-_]+@[%w+%.%-_]+%.%a%a+$'))
Good point. It gets closer to what I proposed first.
Whatever, this regex will validate strings containing for instance two or more punctuation characters following themselves...
Something like
Darth..Vader@DeathStar.mil
Re: Email validation code using Pattern Matching
Posted: Tue Jul 31, 2012 7:00 pm
by flashkot
Roland_Yonaba wrote:flashkot wrote:Code: Select all
print(string.match(email,'^[%w+%.%-_]+@[%w+%.%-_]+%.%a%a+$'))
Good point. It gets closer to what I proposed first.
Whatever, this regex will validate strings containing for instance two or more punctuation characters following themselves...
Something like
Darth..Vader@DeathStar.mil
Unfortunately, patterns in Lua are not so powerful like RegEx.
So, i can suggest two solutions: add second check for repeated punctuation or create more complex pattern like:
Code: Select all
print(string.match(email,'^[%w+%-_]*%.?[%w+%-_]*%.?[%w+%-_]*%.?[%w+%-_]*%.?[%w+%-_]*%.?[%w+%-_]*%.?[%w+%-_]+@[%w+%-_]*%.?[%w+%-_]*%.?[%w+%-_]*%.?[%w+%-_]*%.?[%w+%-_]*%.?[%w+%-_]+%.%a%a+$'))
here i tries to emulate
([a-z0-9]\.)* from RegExp. (as i understand, in Lua we can't repeat part of pattern. Parenthesis only for capture here. But i didn't tested this)
But drawback of this - it only match as much of
abcde. parts as how many times you repeated [%w+%-_]*%.?
Re: Email validation code using Pattern Matching
Posted: Thu Aug 02, 2012 5:51 pm
by Roland_Yonaba
Well, I took a look at it, then I finally came up with a set of rules.
My implementation does not meet with all of the rules stated in RFC standards, though. But I guess it can handle most of email addresses actually existing.
And that's enough, to me.
From the
wikipedia page about email addresses, I considered just the following set of rules to be enough.
For the local - part:
- Up to 64 characters long
- Uppercase and lowercase English letters (a–z, A–Z) (ASCII: 65–90, 97–122)
- Digits 0 to 9 (ASCII: 48–57)
- Characters !#$%&'*+-/=?^_`{|}~ (ASCII: 33, 35–39, 42, 43, 45, 47, 61, 63, 94–96, 123–126)
- Character . (dot, period, full stop) (ASCII: 46) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively (e.g. John..Doe@example.com is not allowed.).
For the domain-part :
The domain name part of an email address has to conform to strict guidelines: it must match the requirements for a hostname, consisting of letters, digits, hyphens and dots.
Code: Select all
function _.isEmail(str)
local _,nAt = str:gsub('@','@') -- Counts the number of '@' symbol
if nAt > 1 or nAt == 0 or str:len() > 254 or str:find('%s') then return false end
local localPart = _.strLeft(str,'@') -- Returns the substring before '@' symbol
local domainPart = _.strRight(str,'@') -- Returns the substring after '@' symbol
if not localPart or not domainPart then return false end
if not localPart:match("[%w!#%$%%&'%*%+%-/=%?^_`{|}~]+") or (localPart:len() > 64) then return false end
if localPart:match('^%.+') or localPart:match('%.+$') or localPart:find('%.%.+') then return false end
if not domainPart:match('[%w%-_]+%.%a%a+$') or domainPart:len() > 253 then return false end
local fDomain = _.strLeftBack(domainPart,'%.') -- Returns the substring in the domain-part before the last (dot) character
if fDomain:match('^[_%-%.]+') or fDomain:match('[_%-%.]+$') or fDomain:find('%.%.+') then return false end
return true
end