Roland_Yonaba wrote:when I see things like (.^$[%s]) I go mad...
Regular expressions are much harder to read than write.
Don't let scary ones intimidate you. You'll be able to write those quite easily yourself, long before (through practice) you're able to easily read them. There's almost nothing to them:
WHAT TO MATCH (atoms)
Most character matches themselves.
. matches
any character.
%s,
%w, et al. match a character in a
class of characters, such as whitespace or word characters
%S,
%W, et al. match a character NOT in
class, such as NON whitespace, or NON word characters
[x-y] matches a character in a given
set, such as any character between x and y
[^x-y] matches a character NOT in a given set, such as any character NOT between x and y
HOW MANY TO MATCH (qualifiers)
? says to match the previous atom 0 or 1 times
* says to match the previous atom 0 or more times
+ says to match the previous atom 1 or more times
ANCHORING
^ anchors a pattern to the start of the input
$ anchors a pattern to the end of the input
GROUPING/CAPTURES
Putting part of an expression in parenthesis "captures" a submatch, which can later be referred to by %1, %2, %3, ect.
That's it. The majority of regex (or in this case, Lua 'patterns'). There's more to know (greedy vs non-greedy matching, matching a specific number of atoms, etc.) and deeper features in some implementations, but that's most of what you use in most cases.
EXAMPLE
Say we wanted to find social security numbers in some input, and we know they're always formatted like XXX-XXXX-XXX.
So three digits, dash, four digits, dash, 3 digits: %d%d%d-%d%d%d%d-%d%d%d.
What if it was a part number, in a similar format, but the number of digits in each group is unknown? That's when we use a qualifiers. %d = single digit, %d+ = 1 or more digits, %d+-%d+-%d+ = three groups of 1 or more digits separated by dashes.
What if we need to parse out the three sections of the part number? Just put parenthesis around them to capture them: (%d+)-(%d+)-(%d+)
What if we're parsing lines that may have more than one part number, but the one we need to read is always at the end of the line? Just anchor the pattern to the end of the line: (%d+)-(%d+)-(%d+)$
Oops, it turns out the part number can start with a # character, but it's optional. That would be #? which means 0 or 1 # characters. #?(%d+)-(%d+)-(%d+)$
So on and so forth. You build up a pattern a bit at a time, getting parts of it to work then putting them together. You end up with a scary looking pattern, but the individual parts are all very simple.
trubblegum wrote:Don't be too disappointed when, once you've spent time and energy learning about regex, you end up finding out that there is usually a better way to do what you're trying to do.
You may find them a bad fit for the problem that motivated you to learn them, because you're not aware of their limitations.
Once you
know them, you find situations all the time where they are a perfect fit.
trubblegum wrote:"I can do that with regex" is not the same as "I need a lib the size of Utah to do this".
Fortunately most modern languages have native support or standard library support for regex (or in Lua's case, an ultra-minimal variant). They're just that useful.