andrewducker: (lady face)
[personal profile] andrewducker
This is the kind of thing I never, ever post.  But I wrote it for a few people at work, and if you've ever been scared by a regular expression, then you might find this unscares you, just a little bit.  Oh - as a note, the code is in C#, but that shouldn't really matter for most purpouses.

If you're anything like me, then regular expressions look scary as hell - large blobs of text stuck together with no real rhyme or reason to them.
They're one of those things that took me a while to really get to grips with, despite being sure that they could make my life easier.  Here are a couple of simple examples of how to extract data using them, so you can see how they can make your life easier.

In the following examples searchText is
[M K12345678]

You can also search a string for a particular substring, like so:
Match m = Regex.Match(searchText,"[K|k][0-9]{8,10}");
which searches the searchText for a K or a k, followed by between 8 and 10 values between 0 and 9.
So the square brackets surround different possibilities for a particular character, and the curly brackets can be used to say how many times a character group can repeat.

you can then follow it up with
Identifier = m.Value
to get the string out.

If you know the structure of the text you're looking at, but not what you're looking for (i.e. you know that you want what lies after the space and before the closing ], but not what form it's in) then you can use something like the following:

GroupCollection gc = Regex.Match(searchText,@"(\[)(?<Type>.)( )(?<Key>.*)(\])").Groups;

Each 'group' is enclosed in brackets, so the first one is:
\[
And the \ is there because [ is a special character in Regex expressions, and the \ means "treat it like a normal character"

The second one is
.
Which means "anything" - and the ?<Type> next to it means that the group is named "Type".

The third one is a space.

The fourth one is
.*
which means "anything" - "as many times as possible" - i.e. A* would mean "A" or "AA" or "AAAAAAAAA", etc.  This group is called "Key"

and finally
\]
which means ], the \ again meaning "treat this special character like an ordinary one"

I can then extract the Key out of it with:
Identifier = gc["Key"].Value;
which gets the entry in the GroupCollection called "Key" - in this case
"K12345678".

So, both of the two above regexes would have the same effect - i.e. they'd extract K12345678 out of searchText.

Date: 2005-10-22 10:50 am (UTC)
ext_58972: Mad! (Default)
From: [identity profile] autopope.livejournal.com
Wow, that's cumbersome!

I think I'll stick to Perl, if you don't mind. (All that stuff about needing to invoke methods to get matches out of target text is doing my head in. What's wrong with $target ~= /regexp/i; ...?)

Date: 2005-10-22 04:08 pm (UTC)
From: [identity profile] theferrett.livejournal.com
That explains much of C#, which actually makes it less frightening.

PHP's regex functions are quite often a huge pain in the ass.

Date: 2005-10-22 08:14 pm (UTC)
From: [identity profile] azalemeth.livejournal.com
$var = pregmatch($text,'\w{0,15}')

Next :P.

Date: 2005-10-22 08:17 pm (UTC)
From: [identity profile] theferrett.livejournal.com
a) It's preg_match().

b) That tells whether the pattern's present. Sorting through the infinite arrays when you put in a $matches array to out put can often be a huge pain in the ass to deal with, especially if you have multiple areas you're saving in the backreferences.

Date: 2005-10-22 08:28 pm (UTC)
From: [identity profile] azalemeth.livejournal.com
Yeah, preg_match (and co), sorry brain fart :P.

It's not that awkward imo. You can easily ( preg_match($string,$regex,$matches); $echo = $matches[2]; echo $echo) use it to grep stuff or change it - yeah, it's awkward, and I LOATH arrays too, but that's probably due to being a n00b at 16 :). There are loads of other ones, but yeah, it can be a pain.....

Date: 2005-10-22 12:30 pm (UTC)
From: [identity profile] odheirre.livejournal.com
This reminds me - have you used the Regulator? It's a great tool to look up and validate regular expressions.

Date: 2005-10-22 06:48 pm (UTC)
darkoshi: (Default)
From: [personal profile] darkoshi
That hasn't unscared me, unfortunately. @ before the string? a group named type? a group called key? wha...? So the first question mark in the string always means it's a group called "Type", and the 2nd question mark always means it's the group called "Key"? Why are things in parenthensis? What does it mean for a group to be called "Type"? EEEEKKK! ::runs away::

Date: 2005-10-23 05:08 pm (UTC)
darkoshi: (Default)
From: [personal profile] darkoshi
Did your example leave out the lines where you specified that the groups are named "Type" and "Key" ?.... Oh. You have to look at the page source to see that part, since it's in angle brackets. Now it makes more sense.

Date: 2005-10-23 05:16 pm (UTC)
darkoshi: (Default)
From: [personal profile] darkoshi
Heck. You understand regular expressions. Ergo you are not a complete idiot. :P

Date: 2005-10-22 08:17 pm (UTC)
From: [identity profile] azalemeth.livejournal.com
I find that regex's are dangerously good if you get used to them - it's like perl, dead easy to write, forms the duct tape of the universe, but impossible to read afterwards. I also don't like how you've had to do that :P. Perl, sed, bash, or php for me....:P

January 2026

S M T W T F S
     1 2 3
45 6 7 8 9 10
11 12 13 1415 16 17
18 19 20 21 22 23 24
25 262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 26th, 2026 06:10 pm
Powered by Dreamwidth Studios