regex question - matching single whitepaces in range, but not multiple?

320gp      a copper-banded oak tun            Pestle & Mortar

This is just an example - but using multiple whitespaces as a boundary between capture groups is really really common and not just in muds if we're being honest. I feel that my way of doing it isn't 'right' even though it works. So I thought I'd ask if there is a better way.

Here are a couple of screenshots from regex101.com, the first illustrating the problem, and the second being how I get around it.














Best Answer

Answers

  • something like this, maybe?
    ^\d+[g][p]\s{2,}[A-Za-z-\s]+\s{2,}.+$
    \s{2,} makes it only note whitespaces in groups of two or more
  • Right. That's essentially what I was doing with \s\s though.

    I was thinking there might be a way of changing this part:
    [A-Za-z-\s]+

    so that it will only ever match if the spaces inside are singular?


  • Caled said:
    Right. That's essentially what I was doing with \s\s though.

    I was thinking there might be a way of changing this part:
    [A-Za-z-\s]+

    so that it will only ever match if the spaces inside are singular?
    Didn't see this while I was writing my post. It is possible to make that match only if it doesn't contain any double spaces, using a negative lookbehind for example:
    [A-Za-z\s-]+(?<!\s{2})
    The (?<!\s{2}) means to go back and check for \s{2}, and don't match if it's found. Keep in mind this will still match a single space at the end, so you'd get "an oaken vial " for example. And you still have to worry about the situation I mentioned where there aren't any double spaces.
  • Sorry for the triple post, but it's too late to edit. Another method of only matching single spaces in a phrase is to match any number of "word followed/preceded by a space". For example:
    ^\d+gp\s+((?: ?[a-zA-Z-]+)+)\s+
    In your oak tun example, this will capture the item name with no extra spaces at the beginning or end. The first \s+ matches all the leading spaces, the (?: ?[a-zA-Z-]+)+ matches one or more instance of "zero or one spaces followed by a word", and the last \s+ will mark the end of the item name because it will match the first space that isn't followed by a word.
  • Thanks so much @Sena
    The pattern in your first post solved my issue with directory matching. I hadn't seen the long descs when I posted, but I did soon after when I began looking for weapons. Using fixed lengths didn't occur to me. 

    The other two solutions will come in handy for other things in the future (including outside mudding, at work.) So thanks for the really comprehensive answers!

  • Great post! Always looking to practice and learn more about how to use regex effectively!

Sign In or Register to comment.