Jump to content

StringRegExp to recognize spaces, numbers and special characters


 Share

Go to solution Solved by jchd,

Recommended Posts

All,
 
I have a part of my code below, and what this does is take a specifically formatted text file, search for specific words or phrases to include or exclude( exclude code ).
 
The include and exclude are handled by separate txt files; one for each. I was able to finally understand how to use case insensitive, but I am having difficulty after several hours failing and looking up examples of how to include multiple words or phrases including spaces.
 
There seem to be a plethora of examples stripping spaces.
 
I need the spaces as this is what i am searching for 
 
For example if my include file has:
 
red
sugar
 
My exclude file has:
 
sugar coated
 
The output lines will be sentences ( or file lines ) with the words red, sugar contained in the line as included but if I am excluding the phrase 'sugar coated' it appears to only see the word 'sugar' and capture the entire content and display it ignoring the phrase 'sugar coated', thus excluding the entire line.
 
That is a perfect example, and here is a piece of code i threw together as a subroutine for the exclude portion:
Func verify1()
For $k = 1 To $d
$line66 = "(?i) " & FileReadLine($file6, $k)
If StringRegExp($line, $line66) = 1 Then
$Sno = 1
ExitLoop
EndIf
Next
$Sno = 0
EndFunc   ;==>verify1

the lines specifically are:

$line66 = "(?i) " & FileReadLine($file6, $k)
If StringRegExp($line, $line66) = 1 Then

 

This is what I am using to detect the exclude text. I have something similar for the include text. 
 
I need to understand better on how to allow for phrases, numbers, spaces and any special character regardless of how many words or spaces exist. This is not as straightforward as i thought
 
Thank You in Advance
Link to comment
Share on other sites

Is somewhat hard to understand what you're trying to archive, you just want to capture phrases with words present in your include.txt? And if there's some word/sentence of exclude.txt you simply ignore them?

I would say with regex you cannot do it in that way, for example, your include file will match all sugar phrases, even if coated is ahead, 

You will need to put all line matches of include, and then subtract the matchs of exclude ;)

Here's one only with one regex that looks for 'sugar' without coated being ahead, or for the word 'red'...

Local $hFile, $sLine, $l=1
$hFile = FileOpen(@DesktopDir&"\MyText.txt")
$xIncludeExp = 'sugar\s(?!coated)|red'
While 1
    $sLine = FileReadLine($hFile,$l)
    If @error Then ExitLoop
    If StringRegExp($sLine,'(?i)'&$xIncludeExp) = 1 Then ConsoleWrite($l&" OK"&@LF)
    $l+=1
WEnd
FileClose($hFile)
Exit

if you want to export matched lines, just replace If StringRegExp($sLine,'(?i)'&$xIncludeExp) = 1 Then ConsoleWrite($l&" OK"&@LF) with If StringRegExp($sLine,'(?i)'&$xIncludeExp) = 1 Then FileWriteLine(@DesktopDir&"YourCapturedPhrases.txt",$sLine)

Heroes, there is no such thing

One day I'll discover what IE.au3 has of special for so many users using it.
C'mon there's InetRead and WinHTTP, way better
happy.png

Link to comment
Share on other sites

Carm01, I don't really understand what you want to do...

Could you post some small examples of include/exclude files + sentences and the result you expect ?

It will be clearer for everyone

Link to comment
Share on other sites

Carm01,

Something like this?

local $exclude, $text, $pattern = '(?i)'

; simulate text file
$text &= 'I have a sugar coated brown fox.' & @CRLF
$text &= 'Now I want to grill it.  No, not ask questions of it, but cook it.' & @LF
$text &= 'Does anyone have a good brown fox^ recipe?'

;simulate exclude file
$exclude &= 'sugar' & @lf
$exclude &= 'grill' & @crlf
$exclude &= 'fox^ recipe' & @crlf

; handle SRE special chars, if needed
$exclude = stringregexpreplace($exclude,'[\^]','\\^')

local $aExclude = stringregexp($exclude,'(.*)\R',3)

for $1 = 0 to ubound($aExclude) - 1
    $pattern &= $1 < ubound($aExclude) - 1 ? $aExclude[$1] & ' |' : ' ' & $aExclude[$1]
Next

ConsoleWrite('' & @CRLF)
ConsoleWrite('! --- Pattern = [' & $pattern & ']' & @CRLF)
ConsoleWrite('' & @CRLF)
ConsoleWrite(stringregexpreplace($text,$pattern,'') & @CRLF)
ConsoleWrite('' & @CRLF)

kylomas

edit: added SRE char handling for "^"

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

kylomas,

You can put Q...E at good use to ignore special characters or subpatterns. Of course you could also guard against E occurring by itself inside the search word, but that may be unlikely enough to ignore it.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Carm01,

Something like this?

local $exclude, $text, $pattern = '(?i)'

; simulate text file
$text &= 'I have a sugar coated brown fox.' & @CRLF
$text &= 'Now I want to grill it.  No, not ask questions of it, but cook it.' & @LF
$text &= 'Does anyone have a good brown fox^ recipe?'

;simulate exclude file
$exclude &= 'sugar' & @lf
$exclude &= 'grill' & @crlf
$exclude &= 'fox^ recipe' & @crlf

; handle SRE special chars, if needed
$exclude = stringregexpreplace($exclude,'[\^]','\\^')

local $aExclude = stringregexp($exclude,'(.*)\R',3)

for $1 = 0 to ubound($aExclude) - 1
    $pattern &= $1 < ubound($aExclude) - 1 ? $aExclude[$1] & ' |' : ' ' & $aExclude[$1]
Next

ConsoleWrite('' & @CRLF)
ConsoleWrite('! --- Pattern = [' & $pattern & ']' & @CRLF)
ConsoleWrite('' & @CRLF)
ConsoleWrite(stringregexpreplace($text,$pattern,'') & @CRLF)
ConsoleWrite('' & @CRLF)

kylomas

edit: added SRE char handling for "^"

This is closer, but....

Instead of omitting the word out of the sentence, the entire sentence is ignored, thus nothing written

IF the sentence read 'I have a sugar coated brown fox.' then that entire sentence would be passed by and no lines written

If the sentence read 'I have a tar coated brown fox.' then the entire sentence would have been captured and exported to a new file and line.

I also have an include list as to what i am searching for.

If the include list has the word/phrase  'tar coated' it would find sentences with those words or phrases , BUT if it had anything from the exclude list, nothing would be written"

IF the sentence was: 

' I have a sugar coated brown fox and a tar coated brown fox' this would result in nothing being written as a word or phrase in the sentence is on the ban list. 

Should have included that part ..

caps and lower case should be treated equally.

Sorry for not making that clear. , and thank you all for your assistance with this 

I also included some example txt files, the raw text and the results. I am really looking how to use this 

StringRegExp  to get it to recognize caps, phrases, and lower case treated equally, include numbers, symbols, etc...

exclude.txt

include.txt

rawtext.txt

results.txt

Edited by Carm01
Link to comment
Share on other sites

  • Solution

Carm01,

I've hard time figuring out what you want exactly. The keyword here is "exactly". So let me ask some questions to make things clearer:

A/ Should input be treated literally? Input: "I'm digging for diamond, silver           and ----   gold" Should that match "silver and gold" from include list?

B/ Should include/exclude lists be treated literally? Input: "I'm digging for diamond, silver and gold" Should that match "silver   */and/*   gold" from include list?

C/ What is the typical size (in characters) of the include/exclude lists?

If the answers are:

A/ Yes. No.

B/ Yes. No.

C/ Small enough/small enough

then this should work for you:

#include <Array.au3>

Local $aText = FileReadToArray("include.txt")
Local $sInclude = "(?=.*(?:\Q" & _ArrayToString($aText, "\E|\Q") & "\E))"
$aText = FileReadToArray("exclude.txt")
Local $sExclude = "(?!.*(?:\Q" & _ArrayToString($aText, "\E|\Q") & "\E))"
Local $aText = 0
Local $sInput = FileRead("rawtext.txt")
Local $aResult = StringRegExp($sInput, "(?im)" & $sExclude & $sInclude & "^.*(?:$|\R)", 3)
_ArrayDisplay($aResult)
Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

 

Carm01,

I've hard time figuring out what you want exactly. The keyword here is "exactly". So let me ask some questions to make things clearer:

A/ Should input be treated literally? Input: "I'm digging for diamond, silver           and ----   gold" Should that match "silver and gold" from include list?

B/ Should include/exclude lists be treated literally? Input: "I'm digging for diamond, silver and gold" Should that match "silver   */and/*   gold" from include list?

C/ What is the typical size (in characters) of the include/exclude lists?

If the answers are:

A/ Yes. No.

B/ Yes. No.

C/ Small enough/small enough

then this should work for you:

#include <Array.au3>

Local $aText = FileReadToArray("include.txt")
Local $sInclude = "(?=.*(?:\Q" & _ArrayToString($aText, "\E|\Q") & "\E))"
$aText = FileReadToArray("exclude.txt")
Local $sExclude = "(?!.*(?:\Q" & _ArrayToString($aText, "\E|\Q") & "\E))"
Local $aText = 0
Local $sInput = FileRead("rawtext.txt")
Local $aResult = StringRegExp($sInput, "(?im)" & $sExclude & $sInclude & "^.*(?:$|\R)", 3)
_ArrayDisplay($aResult)

I like how easily this was done. 100% different from my method using file line reads, then file line writes, which was easy for me, but this little thing is stupid fast. 

technically this is what i am looking for for the searching part, but would like to output this to a file instead of an array box.

Link to comment
Share on other sites

Which is trivial to do. Look at a for to loop in the help file and learn about arrays.

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

Or reverse the idea and use StringRegExpReplace to void any line which does NOT satisfy the requirements.

Exercise left to the readers.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...