Jump to content

StringRegExpReplace Help!!??


Recommended Posts

I am needing to remove everything that is not a letter, number, non-breaking space or / or . or ? or : or -

I am trying this,

$result = StringRegExpReplace($string, "[^[:word:][:blank:]/.?:-]", "")

But not working. I am still getting all kinds of stuff like "@÷XÉ€ @·RÒð»3”%€ð‘ŽF" left in the string.

??

I could also use some assistance or pointers on replacing "/something.ext" with "/something.ext-cut-" so that I can trim after the extension in a url.

Edited by Graywalker
Link to comment
Share on other sites

It would help immensly if you would post a couple of example strings and what you expect to be returned.

The strings should be complete lines that contain the urls.

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

$string = "13123213@÷XÉ€ @·RÒð»3”%€ð34543erfTest‘ŽF"
ConsoleWrite(StringRegExpReplace($string, '[^a-zA-Z0-9/.?:-]', '') & @LF)

Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Link to comment
Share on other sites

Some of the lines I am trying to pull urls and text-only from cause problems when posting, but I will try...

Line would look like :

"®•mÁÖÌÅ`h”3@¼}3@¼}ï¾­Þres://ieframe.dll/background_gradient.jpg­Þbackground_gradient[2]Þï¾­Þï¾­Þï¾­Þï¾­Þï¾­Þï¾­Þï¾­Þï¾­Þï¾­Þï¾­Þï¾­Þï¾­Þï¾­Þï¾­Þï¾­Þï¾­Þï¾­Þï¾­Þï¾­Þï¾­Þï¾­ÞREDR€ˆ€­T2http://servername/Sumtotal/lang-en/management/registrationex/LMS_Registration_Post.aspae9ddabd­Þï¾­Þï¾­"

I want to pull "res://ieframe.dll/background_gradient.jpg" and "http://servername/Sumtotal/lang-en/management/registrationex/LMS_Registration_Post.asp" out of that line.

To trim the "_Post.aspae9ddabd­" down to just "_Post.asp" , I was hoping for something like :

$url = StringRegExpReplace($url, "/w*.[:alpha:]{0,4}", "$1-cut-")
$cut = stringinstr($url,"-cut-",1)
$len = stringlen($url)
$url = stringtrimright($url,$len-$cut)

Not that it is working as I would expect from reading the "help" file, but also, I could see having to put something to match .asp or .aspx or .html or .htm or .js or on and on...

EDIT:

$url = StringRegExpReplace($string, "(.)(jpg|gif|asp|htm|html|aspx|js|java|php|xhtml|xml|png)", "$1$2" & "-cut-")

Seems to be working fine for doing that. :)

Edited by Graywalker
Link to comment
Share on other sites

If you want to grep blanks too then you can replace

a-zA-Z0-9

by

w

Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Link to comment
Share on other sites

Or if you don't want the blanks you can use [:alnum:]

If you look in my sig there is a toolkit that will allow you to load a web page as html and test against it.

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

Is it me or do the urls have a character encoding issue? I suspect it would be safer to solve that root cause instead.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Is it me or do the urls have a character encoding issue? I suspect it would be safer to solve that root cause instead.

Quite possibly.

The whole script is finding the index.dat file in Temporary Internet Files and trying to pull the info from there.

I have it working fairly well, but I know its getting a LOT of info, but not pulling all the urls from the file and it is giving me some blank entries - even though I have

If StringInStr($ln, "://") And StringLen($ln) > 6 Then

before returning the entry. So, the entry has "://" and is longer than six characters, but its displaying in msgbox and reports (csv) as blank??

I haven't even tried for getting the time stamps...

It is eventually going to be a function in another script, but if anyone wants to play with index.dat and getting info from it, here is what I've got so far.

; Some sample remote IE History locations
;
;ThatComputerc$UsersSomeUserAppDataLocalMicrosoftWindowsTemporary Internet FilesContent.IE5index.dat
;TheOtherPuterc$Documents and SettingsSomeUserLocal SettingsTemporary Internet FilesContent.IE5index.dat
;
;Registry keys of use
;
;HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindowsCurrentVersionExplorerShell Folders - gives all user folders
;HKEY_USERS.DEFAULTSoftwareMicrosoftWindowsCurrentVersionExplorerUser Shell Folders - gives the paths for user folders
#include <Array.au3>
$strComputer = @ComputerName
$user = @UserName
; the User Name for the folder may have the domain attached to it in some rare cases, ie "UserName.mybiz"
Global $logfile = FileOpen("Report.csv", 2)
$rprefix = "" & $strComputer & ""
$ProfilesPath = RegRead($rprefix & "HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindowsCurrentVersionExplorerShell Folders", "Common Documents")
; This gives something like C:UsersPublicDocuments or C:Documents and SettingsAll UsersDocuments - we just want the first part.
; So, find the second "" and remove everything after that
$2slash = StringInStr($ProfilesPath, "", 0, 2)
$len = StringLen($ProfilesPath)
$ProfilesPath = StringTrimRight($ProfilesPath, $len - $2slash)
$ProfilesPath = StringReplace($ProfilesPath, ":", "$")
; get the IE history folder path
$historypath = RegRead($rprefix & "HKEY_USERS.DEFAULTSoftwareMicrosoftWindowsCurrentVersionExplorerUser Shell Folders", "Cache")
; This gives us something like %USERPROFILE%AppDataLocalMicrosoftWindowsTemporary Internet Files or
; %USERPROFILE%Local SettingsTemporary Internet Files
; Strip out the %USERPROFILE% and we're in business!
$historypath = StringReplace($historypath, "%USERPROFILE%", "")
$indexdatpath = $rprefix & $ProfilesPath & $user & $historypath & "content.ie5index.dat"
$test = FileExists($indexdatpath)
;MsgBox(0,"Index.dat Path",$indexdatpath & @CRLF & $test)
_ParseIndexdat($indexdatpath)
;_OBOD_ParseIndexdat($indexdatpath)
FileClose($logfile)
Exit

Func _ParseIndexdat($indexdatpath)
; Parse index.dat file for useable info
; The tools I've seen don't grab all the info I want :(
$Bindexdat = FileOpen($indexdatpath, 16)
$indexdat = FileRead($Bindexdat);$indexdatpath)
$strIndexdat = BinaryToString($indexdat, 1)
$strIndexdat = StringReplace($strIndexdat, @CRLF, @CR)
$strIndexdat = StringReplace($strIndexdat, @LF, @CR)
;$strIndexdat = StringRegExpReplace($strIndexdat,"f|t|e","")
;$strIndexdat = StringReplace($strIndexdat,Chr(0)," ")
$FileArray = StringSplit($strIndexdat, "URL", 1)
;This may get complex...
Dim $r = 1 ; to count the records
Dim $e = 0; to count the entries
Dim $urls, $record = "", $ln
Dim $ResultArray[2][5], $i
; Start reading from line 1
$i = 1
For $line In $FileArray
  ; Get the URLs
  $linearray = StringSplit($line, @CR, 1)
  For $ln In $linearray
   ;$ln = StringRegExpReplace($ln, '[^w/.?:-]', '') ;Enable this here and NOTHING is returned. ?
   ;$ln = StringStripWS($ln,7)
   $urls = StringRegExp($ln, "(http|https|res:|file|ftp)")
   Select
    Case $urls = 1
     $aurls = StringSplit($ln, "REDR", 1)
     If $aurls[0] > 1 Then
      For $url In $aurls
      
       $url = StringRegExpReplace($url, '[^w/.?:-]', '');"[^[:word:][:blank:]/.?:-]f", "")
       $url = StringRegExpReplace($url, "(http|https|res:|file|ftp)", "-Start-" & "$1")
      
       $httppos = StringInStr($url, "-start-", 0)
       $url = StringTrimLeft($url, $httppos + 6)
       If StringInStr($url, "?") Then
        $ques = StringInStr($url, "?")
        $len = StringLen($url)
        $url = StringTrimRight($url, $len - $ques + 1)
       EndIf
       If StringInStr($url, "://")  And StringLen($url) > 6 Then
        $url = StringRegExpReplace($url, "(.)(jpg|gif|asp|htm|html|aspx|js|java|php|xhtml|xml|png)", "$1$2" & "-cut-")
        $cut = StringInStr($url, "-cut-", 1)
        $len = StringLen($url)
        $url = StringTrimRight($url, $len - $cut + 1)
    
        $ResultArray[$i][1] = $ResultArray[$i][1] & $url & ","
       EndIf
      Next
     Else
      $ln = StringRegExpReplace($ln, '[^w/.?:-]', '');"[^[:word:][:blank:]/.?:-]f", "")
      $ln = StringRegExpReplace($ln, "(http|https|res:|file|ftp)", "-Start-" & "$1")
      If StringInStr($ln, "-Start-") Then
       $httppos = StringInStr($ln, "-Start-", 0)
       $ln = StringTrimLeft($ln, $httppos + 6)
      Else
       $httppos = StringInStr($ln, "http", 0)
       $ln = StringTrimLeft($ln, $httppos - 1)
      EndIf
      If StringInStr($ln, "?") Then
      $ques = StringInStr($ln, "?")
      $len = StringLen($ln)
      $url = StringTrimRight($ln, $len - $ques + 1)
       EndIf
      If StringInStr($ln, "://") And StringLen($ln) > 6 Then
       $ln = StringRegExpReplace($ln, "(.)(jpg|gif|asp|htm|html|aspx|js|java|php|xhtml|xml|png)", "$1$2" & "-cut-")
       $cut = StringInStr($ln, "-cut-", 1)
       $len = StringLen($ln)
       $ln = StringTrimRight($ln, $len - $cut + 1)
       $ResultArray[$i][1] = $ResultArray[$i][1] & $ln & ","
      EndIf
     EndIf
    Case StringInStr($ln, "Content-Type:")
     ; this is an entry I want
     $ln = StringStripWS($ln, 7)
     $ResultArray[$i][2] = $ln
    Case StringInStr($ln, "X-Powered-By:")
     ; this is an entry I want
     $ln = StringStripWS($ln, 7)
     $ResultArray[$i][3] = $ln
    Case StringInStr($ln, "~U:")
     ; this is an entry I want and it marks the end of a record
     $ln = StringReplace($ln, "~U:", "")
     $ln = StringStripWS($ln, 7)
     $ResultArray[$i][4] = $ln
     FileWriteLine($logfile, $ResultArray[$i][1] & "," & $ResultArray[$i][2] & "," & _
       $ResultArray[$i][3] & "," & $ResultArray[$i][4])
     $i = $i + 1
     ReDim $ResultArray[$i + 1][5]
     ;_ArrayDisplay($ResultArray)
    Case Else
     ; do nothing with the line
   EndSelect
  Next
Next
EndFunc   ;==>_ParseIndexdat
Edited by Graywalker
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...