Jump to content
JockoDundee

Calling all RegEx Masters...

Recommended Posts

hi!

I have a string of chess moves that I need to clean-up before I can properly proccess them.  

Typically a computer chess move consists of a from square and a to square cat-ted together into a 4 char string, for exampe d2d4.

A game, or game fragment is just a series of these like so:

e2e4 e7e5 g1f3 b8c6 f1b5 a7a6 b5a4 g8f6 e1g1 f6e4 d2d4 b7b5 a4b3 d7d5 d4e5 c8e6 c1e3 f8c5 d1d3

This makes proccessing them quick since I can just use a fixed offset.

Even though the vast majority of strings contain 4 chars, sometimes, after a pawn is promoted to a queen, a move will consist of 5 chars; for instance, notice below in the three examples how a few of the strings contain 5 chars.  This 5th character is of no importance to me.

g1f1 h3h2 f1e2 h2h1q a1h1 h8h1 f2f3n h1h4
g1f3 h3h2 f1e2 h2h1n a1h1 h8h1 f4f1 h1h3
f2f8 e7f8 e4f2 h3h2 e3e4 d5e4 f2e4 h2h1q d4g1 h1e4 g1f2 g7g5

This obviously destroys my fixed offsets, so I have written code that forces each move to be limited to just four characters.  However, it is rather inelegant, looping thru the movelist  and fixing each one seperately.  So I was thinking one of you geniuses must know a way to do it better, possibly in a single search/replace statement?

tl;dr

To be clear, the challenge is to truncate all 5 char strings to 4 char strings, i.e. change something like this:

g1f1 h3h2 f1e2 h2h1q a1h1 h8h1 f2f3n h1h4
g1f3 h3h2 f1e2 h2h1n a1h1 h8h1 f4f1 h1h3
f2f8 e7f8 e4f2 h3h2 e3e4 d5e4 f2e4 h2h1q d4g1 h1e4 g1f2 g7g5

into this:

g1f1 h3h2 f1e2 h2h1 a1h1 h8h1 f2f3 h1h4
g1f3 h3h2 f1e2 h2h1 a1h1 h8h1 f4f1 h1h3
f2f8 e7f8 e4f2 h3h2 e3e4 d5e4 f2e4 h2h1 d4g1 h1e4 g1f2 g7g5

in one RegEx statement if possible.

Thanks!


Code hard, but don’t hard code...

Share this post


Link to post
Share on other sites

Thanks Guys!

I'm not sure which one to use :)

1:   \b\w{4}\K\w
2:   (?<=\w{4})\w

ORG: f4e5 d7d6 g1f3 d6e5 f3e5q f8d6 e5f3 g8f6 g2g3 f6g4 b1c3q h7h5 d2d4s h5h4 h1g1 b8c6 c1g5 f7f6 g5f4 h4g3 h2g3 g7g5 f4d6 
----------------------------------------------------------------------------------------------------------
1:   f4e5 d7d6 g1f3 d6e5 f3e5 f8d6 e5f3 g8f6 g2g3 f6g4 b1c3 h7h5 d2d4 h5h4 h1g1 b8c6 c1g5 f7f6 g5f4 h4g3 h2g3 g7g5 f4d6
2:   f4e5 d7d6 g1f3 d6e5 f3e5 f8d6 e5f3 g8f6 g2g3 f6g4 b1c3 h7h5 d2d4 h5h4 h1g1 b8c6 c1g5 f7f6 g5f4 h4g3 h2g3 g7g5 f4d6

ORG: d2d3 e5f4 c1f4q d7d5 e2e4 b8c6 b1c3 g8f6 g1f3
----------------------------------------------------------------------------------------------------------
1:   d2d3 e5f4 c1f4 d7d5 e2e4 b8c6 b1c3 g8f6 g1f3
2:   d2d3 e5f4 c1f4 d7d5 e2e4 b8c6 b1c3 g8f6 g1f3

ORG: f4f5 d7d5 d2d3 c8f5r b1c3 g8f6t g1f3 b8c6
----------------------------------------------------------------------------------------------------------
1:   f4f5 d7d5 d2d3 c8f5 b1c3 g8f6 g1f3 b8c6
2:   f4f5 d7d5 d2d3 c8f5 b1c3 g8f6 g1f3 b8c6

ORG: b2b4 e5f4 g1f3 f8b4 c2c3s b4e7 d1a4 g8f6
----------------------------------------------------------------------------------------------------------
1:   b2b4 e5f4 g1f3 f8b4 c2c3 b4e7 d1a4 g8f6
2:   b2b4 e5f4 g1f3 f8b4 c2c3 b4e7 d1a4 g8f6

ORG: h2h4 f8e7d g1f3s e5e4 f3g5 d7d5 e2e3 h7h6
----------------------------------------------------------------------------------------------------------
1:   h2h4 f8e7 g1f3 e5e4 f3g5 d7d5 e2e3 h7h6
2:   h2h4 f8e7 g1f3 e5e4 f3g5 d7d5 e2e3 h7h6

ORG: h2h3 d8h4 g2g3 h4g3
----------------------------------------------------------------------------------------------------------
1:   h2h3 d8h4 g2g3 h4g3
2:   h2h3 d8h4 g2g3 h4g3

ORG: g2g4e d8h4
----------------------------------------------------------------------------------------------------------
1:   g2g4 d8h4
2:   g2g4 d8h4

 


Code hard, but don’t hard code...

Share this post


Link to post
Share on other sites

I don't understand why, in certain cases, anyone would want to mess around with string regular expressions when a simple string expression is demonstrably faster and much easier to understand, especially to anyone trying to maintain someone else's code.

RexEx may have its place but not in this case.

$x = StringLeft($x, 4) ; this is simpler and WAY faster than
  $x = StringRegExpReplace($x, '(?<=\w{4})\w', "") ; this

Faster? Yes, in a million iterations on a test array the string expression comes in at 19.7 seconds while the RegEx takes 23.2 seconds.


Phil Seakins

Share this post


Link to post
Share on other sites

 

5 hours ago, pseakins said:

I don't understand why, in certain cases, anyone would want to mess around with string regular expressions when a simple string expression is demonstrably faster and much easier to understand...

Why indeed?  Maybe because this:

$x = StringLeft($x, 4)

cuts off the whole movelist except for the first, when used on the sample string:

f4e5 d7d6 g1f3 d6e5 f3e5q f8d6 e5f3 g8f6 g2g3 f6g4 b1c3q h7h5 d2d4s h5h4 h1g1 b8c6 c1g5 f7f6 g5f4 h4g3 h2g3 g7g5 f4d6

to your point, currently I am doing something like

$arrMoveList=StringSplit($sMoveList, " ",2)
For $sMove In $arrMoveList
   $sFixedMoveList&=StringLeft($sMove,4) & " "
Next

but that's not quite the same, right?

Anyway, I tried your test - there's good news and bad:  Good news - You're the fastest - Bad news - your output is off:

MoveList:
f4e5 d7d6 g1f3 d6e5 f3e5q f8d6 e5f3 g8f6 g2g3 f6g4 b1c3q h7h5 d2d4s h5h4 h1g1 b8c6 c1g5 f7f6 g5f4 h4g3 h2g3 g7g5 f4d6 d8d6 d1d2

Solutions:
1:   \b\w{4}\K\w
2:   (?<=\w{4})\w
3:   My Loop Code
4:   StringLeft($txt, 4)

Output:
1: f4e5 d7d6 g1f3 d6e5 f3e5 f8d6 e5f3 g8f6 g2g3 f6g4 b1c3 h7h5 d2d4 h5h4 h1g1 b8c6 c1g5 f7f6 g5f4 h4g3 h2g3 g7g5 f4d6 d8d6 d1d2
2: f4e5 d7d6 g1f3 d6e5 f3e5 f8d6 e5f3 g8f6 g2g3 f6g4 b1c3 h7h5 d2d4 h5h4 h1g1 b8c6 c1g5 f7f6 g5f4 h4g3 h2g3 g7g5 f4d6 d8d6 d1d2
3: f4e5 d7d6 g1f3 d6e5 f3e5 f8d6 e5f3 g8f6 g2g3 f6g4 b1c3 h7h5 d2d4 h5h4 h1g1 b8c6 c1g5 f7f6 g5f4 h4g3 h2g3 g7g5 f4d6 d8d6 d1d2
4: f4e5

Timings for 1000000:
1: 1000000 Runs 5.061
2: 1000000 Runs 6.291
3: 1000000 Runs 37.46
4: 1000000 Runs 1.145

Anyway, let me have the rest of the code you had in mind, and I can plug it in....


Code hard, but don’t hard code...

Share this post


Link to post
Share on other sites
On 12/30/2020 at 9:10 AM, JockoDundee said:

I'm not sure which one to use

The most important is to understand how it works, so you can to build your own next time :)

@pseakins
BTW if an array was needed as the output, regex is easier and faster indeed

#Include <Array.au3>

$txt = "g1f1 h3h2 f1e2 h2h1q a1h1 h8h1 f2f3n h1h4" & @crlf & _ 
    "g1f3 h3h2 f1e2 h2h1n a1h1 h8h1 f4f1 h1h3" & @crlf & _ 
    "f2f8 e7f8 e4f2 h3h2 e3e4 d5e4 f2e4 h2h1q d4g1 h1e4 g1f2 g7g5"

$res =  StringRegExp($txt, '\b\w{4}', 3)
 _ArrayDisplay($res)

 

Share this post


Link to post
Share on other sites
6 hours ago, pseakins said:

I don't understand why, in certain cases, anyone would want to mess around with string regular expressions when a simple string expression is demonstrably faster and much easier to understand, especially to anyone trying to maintain someone else's code.

RexEx may have its place but not in this case.

You don't understand because you modified the task. As jockodundee stated, the task in a large string is to shorten all 5-character words to 4-character words.
You modify the task so that all single moves are already separated in an array. But that was not the task.

Share this post


Link to post
Share on other sites
1 hour ago, AspirinJunkie said:

You don't understand because you modified the task

Agreed. I did not accept @JockoDundee's challenge and provide a solution. I was just trying to make a case that string expressions generally would be faster than regular expressions. I guess I picked the wrong battle.

As @JockoDundee pointed out, my expression when used on his short string would give the wrong result, which of course is obvious.


Phil Seakins

Share this post


Link to post
Share on other sites
16 hours ago, JockoDundee said:

Anyway, let me have the rest of the code you had in mind, and I can plug it in

My code would either use a loop, or if the 5th character is always a known value would use StringReplace(). I'm pretty sure this would be faster when working with a 10Mb string but with your 127 byte string there is no contest. '(?<=\w{4})\w' is significantly faster.

EDIT: Wrong, I misspoke. '(?<=\w{4})\w' is the slower of the two regex expressions.

Edited by pseakins
Corrected text

Phil Seakins

Share this post


Link to post
Share on other sites
10 hours ago, pseakins said:

I'm pretty sure this would be faster when working with a 10Mb string

It's not. I just ran a test. The second regex is way in front.

I should have done this 40 minutes ago, this is a bad start to the new year.

EDIT:  Actually the second regex is the slower of the two regex expressions

Edited by pseakins
Corrected text

Phil Seakins

Share this post


Link to post
Share on other sites
4 minutes ago, pseakins said:

I should have done this 40 minutes ago, this is a bad start to the new year.

It is not - it is the end of 2020 - a year anyone tries hard to forget. So nothing happend right now 😉

Or you live in Australia - than you're right indeed.

Share this post


Link to post
Share on other sites
2 hours ago, pseakins said:

It's not. I just ran a test. The second regex is way in front.

I should have done this 40 minutes ago, this is a bad start to the new year.

Can you show your test?


Code hard, but don’t hard code...

Share this post


Link to post
Share on other sites
6 hours ago, JockoDundee said:

Can you show your test?

My dyslexia may have misled you, I tend to reverse things as they come out of my head. '\b\w{4}\K\w' is the faster expression.

Here's my test code.

$sMovelist = "f4e5 d7d6 g1f3 d6e5 f3e5q f8d6 e5f3 g8f6 g2g3 f6g4 b1c3q h7h5 d2d4s h5h4 h1g1 b8c6 c1g5 f7f6 g5f4 h4g3 h2g3 g7g5 f4d6 d8d6 d1d2"

$iLoop = 50000
$hTimer = TimerInit()
For $i = 1 To $iLoop
  $sNewList = StringReplace($sMovelist, "q", "")
  $sNewList = StringReplace($sNewList, "s", "")
Next
$fDiff1 = TimerDiff($hTimer)

$hTimer = TimerInit()
For $i = 1 To $iLoop
  $sNewList = StringRegExpReplace($sMovelist, '\b\w{4}\K\w', "")
Next
$fDiff2 = TimerDiff($hTimer)

$hTimer = TimerInit()
For $i = 1 To $iLoop
  $sNewList = StringRegExpReplace($sMovelist, '(?<=\w{4})\w', "")
Next
$fDiff3 = TimerDiff($hTimer)
$sNewList = ""

ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $fDiff1=[' & $fDiff1 & '] Error code: ' & @error & @CRLF) ;### Debug Console
ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $fDiff2=[' & $fDiff2 & '] Error code: ' & @error & @CRLF) ;### Debug Console
ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $fDiff3=[' & $fDiff3 & '] Error code: ' & @error & @CRLF) ;### Debug Console

$x = ""
For $i = 1 To 78125 ; 1000000 / (127 + 1) = 78125
  $x &= $sMovelist & " " ; create 1mB list
Next
$sMovelist = StringTrimRight($x, 1) ; remove trailing space
$x = ""

$hTimer = TimerInit()
$sNewList = StringReplace($sMovelist, "q", "")
$sNewList = StringReplace($sNewList, "s", "")
$fDiff1 = TimerDiff($hTimer)

$sNewList = "" ; just in case memory collection skews timing test
$hTimer = TimerInit()
$sNewList = StringRegExpReplace($sMovelist, '\b\w{4}\K\w', "")
$fDiff2 = TimerDiff($hTimer)

$sNewList = ""
$hTimer = TimerInit()
$sNewList = StringRegExpReplace($sMovelist, '(?<=\w{4})\w', "")
$fDiff3 = TimerDiff($hTimer)

ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $fDiff1=[' & $fDiff1 & '] Error code: ' & @error & @CRLF) ;### Debug Console
ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $fDiff2=[' & $fDiff2 & '] Error code: ' & @error & @CRLF) ;### Debug Console
ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $fDiff3=[' & $fDiff3 & '] Error code: ' & @error & @CRLF) ;### Debug Console

; Using a modified form of the ALT-D Debug Console insertions. The output fits on one line and the variable is delineated.

Console output;

@@ Debug(24) : $fDiff1=[2496.65864193368] Error code: 0
@@ Debug(25) : $fDiff2=[357.503320831953] Error code: 0
@@ Debug(26) : $fDiff3=[490.190716918503] Error code: 0
@@ Debug(50) : $fDiff1=[2966.93059827757] Error code: 0
@@ Debug(51) : $fDiff2=[387.96407343657] Error code: 0
@@ Debug(52) : $fDiff3=[562.891279040785] Error code: 0

 


Phil Seakins

Share this post


Link to post
Share on other sites
Posted (edited)

while "remove any character if preceded by 4 characters" is solid

Is this task also:  "remove any letter that is followed by whitespace"...  maybe room to speed it up there?

$txt = "g1f1 h3h2 f1e2 h2h1q a1h1 h8h1 f2f3n h1h4" & @crlf & _
    "g1f3 h3h2 f1e2 h2h1n a1h1 h8h1 f4f1 h1h3" & @crlf & _
    "f2f8 e7f8 e4f2 h3h2 e3e4 d5e4 f2e4 h2h1q d4g1 h1e4 g1f2 g7g5"


msgbox (0, '' , StringRegExpReplace($txt , "\D\s" , " "))

 

Edited by iamtheky

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...