Regex deciphering

Cidusii · August 2013

I was reading through the source code for the auto sipper in the HTML5 client to figure out how it works, and came across some of the match criteria in there:

/^Your mind feels stronger and more alert\.$/gm

I'm just wondering why the extra symbols are required for the string, since I've never dealt with regex before

.

Daklore · August 2013

^ is an anchor at the start of a trigger and $ is an anchor for the end. These force the trigger to -only- match this line in its entirety rather than as part of another sentence.

\ are an escape in regex to tell the regex to ignore this character as regex code. In regex the period [.] is a wildcard that can match anything. So you have to escape it [\.] in order to tell the regex that this is a period, not a "match any one character".

I assume the initial / and /gm were your adding, because I don't recognize their importance in regex, nor see any reason why they'd be added.

Likewise, \d+ matches one or more digits, \w+ matches one or more words, .+ matches just about anything(.* can also be used).

Antonius · August 2013

Everything between the two forward slashes (/) is the regular expression. The ^ means start of string/start of line. The \. is an escaped period (it's an actual period) since . in regular expressions matches pretty much anything. The $ means end of string/end of line.

The g and m at the end (after the second /) are regular expression flags; in particular, they're the global and multi-line flags respectively. This page has a brief explanation of what they do at the bottom: http://www.javascriptkit.com/javatutors/redev.shtml

Cidusii · August 2013

Aha! Thanks so much you two! That clears things up heaps!

Makarios · August 2013

To expand, in regex * means "0 or more", and + means "1 or more". This distinction can be important if you're attempting to construct more specific expressions later, so it is best to get into the habit of using them properly from the start. Hence \w+ actually means "one or more letters", and \d+ means "1 or more digits".

Cidusii · September 2013

I've noticed that sometimes the output produces multiple lines of text that seem to be part of the same message. (I had a problem before where I had the first line in an if loop and the second line part of an else if, since I thought they were separate messages, so it didn't register the second line, which is how I found that out) e.g.

You eat an aurum flake.
The mineral has no effect.

Is there a way to match this all in entirety, or can you only match it line by line despite being part of the same message?

Sena · September 2013

\n matches line breaks. So ^You eat an aurum flake\.\nThe mineral has no effect\.$ would match those two lines.

Although different clients may have a better way, or have certain quirks with multiline triggers. I'm not sure if this is recommended for the HTML5 client.

Cidusii · September 2013

Aha! Thanks for that. I'll try it out

.

Cidusii · September 2013

Another question!

I've noticed for mudlet people use the matches[2] or matches[3] command to find stuff in the regex text. Does it only match things placed in brackets, or how does it work? For HTML5 it looks like we use args[2] and args[3] instead, but from the looks it follows the same matching principles.

Say you have something like:

/(\w+) sends strands of sticky web flying at you as (he|she) touches (\w+) web tattoo\./

Would matches[1] be the first (\w+), matches[2] be the (he|she) and matches[3] be the second (\w+)?

Israphel · September 2013

matches[1] is the entire string that was matched against the regex. That is the entire text in your example.

matches[2] is the first matched group. That is the first \w+ in your example.

matches[3] is the second group, etc.

Cidusii · September 2013

Thanks, that helps so much

.

Jules · December 2014

Old thread, but so appropriate. Can someone once and for all help me understand why sometimes, you escape the period at the end of a sentence with \ but other times, especially if you have a capture right at the beginning of the trigger, using \ before the period at the end of the sentence will prevent the trigger from firing.

Keneanung · December 2014

A period in RegEx is a "match anything except a newline". Whenever you want to make sure you match an actual period and nothing else, you have to escape the period.

ETA: Can you show an example for the trigger not firing when you escape the period?

Antonius · December 2014

@Jules There's no reason that I can think of, and I've never noticed that being the case. If it's a literal period in the text you're trying to match, then \. should always work (though so would ., since that matches any character). There must be something else you're doing wrong with the trigger that's causing it; if you could provide some actual examples then maybe somebody could figure out what that is.

Jules · December 2014

That's why I'm so confused. I understand that the escape / makes the client interpret the . literally, and without an escape, the . just matches any single character (thanks to some careful, patient explanations), but I just absolutely don't understand why sometimes this is THE way to format the end of a regex trigger, and other times, it can keep that same trigger from firing at all.

Jules · December 2014

I will, it's so weird.

This trigger works: ^(\w+)\'s aura of weapons rebounding disappears.$

This one does not: ^(\w+)\'s aura of weapons rebounding disappears\.$

And for an example of a "regular" one that works with the escape at the end: ^Your aim augments the flight of your arrow\.$

I really feel I've seen this before, and my guess/intuition is that it has something to do with capturing stuff right after the newline thing. It's the kind of thing that drives someone not immersed in coding absolutely nuts.

Antonius · December 2014

You don't need to escape the ' since it's not a special character in regex, but it shouldn't make a difference (though could try removing it to see). My (working) pattern for rebounding dropping is: ^(\w+)'s aura of weapons rebounding disappears\.$

What does the trigger do? Unlikely to be the cause of the problem, but any details might help.

Jules · December 2014

Well, I'm using the one that does work for now. The one that doesn't work just doesn't do anything. I will go back in a bit and get someone to do aggressive actions with debugging turned on (if that might help). I'll post anything it does. Thanks.

Tael · December 2014

The one that doesn't work should absolutely work - try replacing the code it runs with print("test");. If that doesn't work, something very weird is going on. Also, the one thing I keep doing over and over again with the HTML5 client is forgeting to set the trigger type to regex.

You can use this to check your regex by the way:

http://regexpal.com/

Just copy and paste ^(\w+)\'s aura of weapons rebounding disappears\.$ into the top box and "Tael's aura of weapons rebounding disappears." into the second box. If it highlights, you know your regex captures that pattern.

Jules · December 2014

And now either one works... Hopefully this never happened in the first place, and I'm imagining that it's happened before.

Ernam · December 2014

I'm a little suprised that nobody has linked a reference, which is usually the first and last thing you'll need when learning and working with regex (although of course feel free to ask here or the mudlet and/or HTML5 clans (assuming one exists by now)).

Here is the MSDN regex pattern matching standard which does have a few exceptions with both Mudlet and the HTML5 client, which I don't have a solid list of, but are either advanced to the point that you likely won't encounter them, or have simple and commonly used substitutions.

One of the major things to note is that both clients handle pattern matching one line at a time. Even for multi-line messages that are typically sent "together" - since MUDS use Telnet (a data transmission protocol) these lines are still in actuality sent as seperate lines (but are sent together from the game before a prompt is sent, creating the "illusion" that they are in fact a single message).

What this means is that while "actual" regex can match multiple lines, or in fact, full pages, of text, this isn't really possible in a MUD, since (in all clients I've ever heard of), regex pattern matching is executed on each line sent from the game server completely independently.

While it is actually possible to circumvent this via complex workaround, it is far from wise to actually do this (and nobody does) due to much more simple methods of handling multi-line triggers generally being built into MUD clients.

In Mudlet's case, there are actually quite a few simple methods, the most simple of which being a basic multi-line trigger (which executes a boolean AND or OR statement based on the result of the last X (a number you specify) lines sent from the game. AND or OR is simply specified by selecting "Multiline / AND Trigger" in Mudlet's editor, for example.

In no way is it possible to match two separate lines* from the game in a single regex pattern, since Mudlet never evaluates more than a single line from the game at a time against all of your triggers (unless of course, you specifically tell it to in a script, by combining the two or more lines into a single actual string, then comparing that single string to your regex pattern, which would be one of the aforementioned workarounds).

Another simple, built-in method (again, in mudlet, but similar in other clients) is use of Filters, which work in a similar, but slightly different way (see the manual or PM me for more detail).

Yet another method (which is actually frequently the best/only option) is simply using multiple different triggers that use "flags" (a boolean value, usually).

ex:
Trigger 1: You eat an aurum flake.
Trigger 2: The mineral has no effect.
Trigger 3: <any prompt>

In trigger 1, you set a "flag" to true. In trigger 2, if the flag is true, then execute your actual code you want to run. Trigger 3 (a prompt) would set the flag back to false, if you only wanted the code to run if both lines have been sequentially received. This is a less efficient, but far more flexible and powerful method (as more complicated examples can do much more tricky things, such as handling things that might be in the middle of a multi-line trigger, like a Parry or the third-person paralysis message between slashes when you DSL someone with Curare, etc). [ In actuality, this is exactly what Mudlet does behind the scences for Multiline and filter type triggers, and is markedly more efficient. However, in some cases, they simply won't get the job done, which is when you'd do something like this yourself. ]

Another good reference for triggers in general would be Mudlet's tutorial on them, even if you're using HTML5, because it covers the core concepts of creating triggers, filters, trigger chains, etc, which are essentially identical, but with slightly different syntax, and a different editor. There's a video, as well.

Sena said:

\n matches line breaks. So ^You eat an aurum flake\.\nThe mineral has no effect\.$ would match those two lines. Although different clients may have a better way, or have certain quirks with multiline triggers. I'm not sure if this is recommended for the HTML5 client.

This is a good example of the common misconception that I was referring to above. This would actually not work, in any client that I am aware of, since at no point would Mudlet ever actually evaluate the entire pattern (both lines) together, as they are received as two completely independent messages from the game.

This makes sense if you think about it, because for a MUD client to actually catch multi-line messages in a single regex pattern, it would constantly have to evaluate every single possible combination of every line in the entire buffer.

If you wanted to match the multi-line trigger:

test
test
test
test
test

for example, that would require that your client check the current line for any regex matches, then also and independently compare the current line AND the previous line against all of your triggers, then check all three of the last lines against all of your triggers, and so on, until the sixth repetition. Theoretically, it would have to do this for every line received, all the way back to the top of your buffer (a setting that is virtually infinite if you choose). You can probably see how within even a few seconds, this gets exponentially more taxing, and thus slow, for your CPU.

Thus, it simply doesn't work that way.

However, one exception to this exception that's worth mentioning is word wrap. Both Achaea and most MUD clients (including Telnet client itself) perform word wrap. It is highly advisable to disable Achaea's word wrap setting (config screenwidth 0) so that it doesn't send arbitrary newlines ("\n") to your client. This can make multi-line patterns a nightmare.

Simply set your client's word wrap to whatever you wish, and it will not add in or evaluate any newlines in long patterns (such as a long sentence, attack, or paragraph, in Achaea), even if it does display them for readability.

It is vital and at first a bit tricky to understand the difference between word wrap and actual newlines, however.

Example:

You close your eyes momentarily and extend the range of your vision, seeking out the presence of
Gerwulf.
You see that Gerwulf is at At the base of the Targossian barracks.

Word wrap (in my client) inserted a newline between "of" and "Gerwulf", however since I have Achaea's word wrap disabled, the entire line was actually sent, received, and evaluated as a single line. The only actual "newline" in this text would be after "Gerwulf." before "You see".

Thus, you see three lines, but it is actually two distinct lines. You can tell when this is happening by resizing your window (in some clients) to see if it moves the "fake" newline around. Some clients also allow you to show special characters (such as newlines) for coding purposes.

If in doubt, you can also see the "real" lines received in debugger and log files. Typically, in these, each "line" received by your client will appear in a separate message.

If I overdid it, sorry, but multi-line triggers are a really common and particularly tricky obstacle for most people getting into MUD scripting, and I'm overshooting it a bit in hopes that it'll help others who run into it in the future. .. and of course, if you have any further questions, feel free to ask here or in game.

Sena · December 2014

Ernam said:

Sena said:

\n matches line breaks. So ^You eat an aurum flake\.\nThe mineral has no effect\.$ would match those two lines. Although different clients may have a better way, or have certain quirks with multiline triggers. I'm not sure if this is recommended for the HTML5 client.

This is a good example of the common misconception that I was referring to above. This would actually not work, in any client that I am aware of, since at no point would Mudlet ever actually evaluate the entire pattern (both lines) together, as they are received as two completely independent messages from the game.

It works in zMUD (though not without problems, like #GAG, #SUB, and similar things only applying to the last line)/CMUD and Mushclient.

Ernam · December 2014

It did work in zMud, which was probably the most mind-shattering thing for me to come to realize when I switched to Mudlet. This never made the Mudlet: Migrating page, although is arguably the most important difference when switching over, aside from the actual scripting language.

This is from the zMud Manual regarding using newlines ($) in zMud triggers: (note: it was preceded by a paragraph describing two "better" methods of creating multi-line triggers - primarily the use of #COND

#TRIGGER {Pattern1$Pattern2} {command}
Seems simpler, but this old multi-line trigger is much slower to process than the new Within syntax. Also, the Within syntax allows you to match more than just two lines. For example:
#TRIGGER {Pattern1} {}
#COND {Pattern2} {} {Within|Param=1}
#COND {Pattern3} {command} {Within|Param=1}

The reason this (this is pure conjecture) probably was allowed in zMud was that at the time, typical buffers were much smaller, and when it was developed, the fact that people would ultimately be generating hundreds, or even thousands, of regex patterns (triggers) that would need evaluated, was probably not fully anticipated (or at least, the impact of this kind of pattern matching was not foreseen).

(It's probably not a coincidence that zMUD's replacement began in the years immediately following the emergence/prevalence of complex systems like ACP & Acropolis - particularly since zuggsoft's clients (and Mudlet) are both specifically designed for use with IRE games.)

zMud was famous for getting really slow with big systems, which was why this method was depreciated, and ultimately, why it was replaced by cMUD (which also did not allow this syntax).

Keneanung · December 2014

Using MSDN for a referaence on PCRE (which is what at least mudlet uses) gets you only so far. The .Net emgine has a lot of differences if you do more than scratching the surface.

On the other side I have no good reference for PCRE on hand either

Ernam · December 2014

Yeah, but you'll never really encounter any of the differences, because they aren't applicable to MUD triggers - for the most part. The exceptions that are noteworthy are relatively obvious.

It's also not that hard to simply learn it by sifting through the ample existing mudlet (or HTML5) systems floating around, and referencing things when you simply can't figure it out from context.

Tael · December 2014

regular-expressions.info is where almost everyone learns from, or at least it used to be. The only real thing to note is that JavaScript doesn't have lookbehind.

Another way to deal with multi-line stuff is to enable/disable triggers/groups (trigger for first line enables trigger for second line, trigger for second line disables itself*), which is effectively what a filter does - but it's useful to think about it that way if you ever use another client like the HTML5 client that doesn't have explicit filters.

In Mudlet, you can also dynamically generate the second trigger with a temp trigger defined in the script for the first trigger, which I've seen a lot of people do. Technically that's slower, but in practice it's unlikely to ever matter much unless you're generating a ton of them at once for something.

*This doesn't totally work if there's a delay between line 1 and line 2. In that case, you might end up with two line 1s, both enable the trigger, then the first line 2 disables it and nothing fires on the second line 2. You have to build in some sort of counter variable to get around that (so first trigger enables second trigger and increments the counter, second trigger fires and decrements the counter, disabling itself if the counter is 0).