Jump to content

Stripping Characters from Strings


4b0082
 Share

Recommended Posts

Is there other scenarios where the character length might vary?

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

Is there other scenarios where the character length might vary?

Not in this case, I'm pulling data from a JSON output and needed to delete a specific portion of one of the values.

Incidentally this might be related, so rather than making a new thread I'll post it here.

At the moment I'm looking at a problem like this this:

<div>
. . . a bunch of code . . .
<a href="http://somewebsite.com/123-some-random-numbers-and-text">Link Name</a>
. . . a bunch of code . . .
Item: 33 ;this is the line of code I was searching for in my string.
</div>

What I want to be able to do is from "33" read the code from right to left to find the closest instance of "<a," then copy the hyperlink into a variable. Is something like this possible?

Link to comment
Share on other sites

I just found out about StringReverse and it might help get me to my goal, it seems like the hyperlink is always a specific number of characters away, so I'm thinking I might be able to cut the characters left to the end of the hyperlink, then reverse my string and read it until the " appears, then re-reverse it and write it into a variable.

It seems like an ass-backwards way of handling the problem, but at least it's something.. Now, I just need to figure out how to get from theory to actually having the code work.  >_<

Link to comment
Share on other sites

I would use a regular expression, personally speaking.

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

Ah, sorry. Perhaps grasp the basics and then move forward with complicated programming concepts.

Edited by guinness

UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

  • Moderators

4b0082,

Regular Expressions are a very clever way of extracting small sections of data from larger sets when there is a suitable pattern to identify the required section. In this case you could do something like this:

#include <MsgBoxConstants.au3>

$sString = "<div>" & @CRLF & _
    '. . . a bunch of code . . .' & @CRLF & _
    '<a href="http://somewebsite.com/123-some-random-numbers-and-text">Link Name</a>' & @CRLF & _
    '. . . a bunch of code . . .' & @CRLF & _
    'Item: 33 ;this is the line of code I was searching for in my string.' & @CRLF & _
    '</div>'

$aExtract = StringRegExp($sString, '(?s)<div>.*href="(.*)".*Item:\s\d+.*<\/div>', 3)

MsgBox($MB_SYSTEMMODAL, "Extract", $aExtract[0])
which gets you the link. :)

The RegEx works like this:

(?s)          - Ignore line endings
<div>         - Look for '<div>'...
.*            - and then any number of characters until...
href="        - we find 'href="'.
(.*)          - Now capture the characters until...
"             - the next double-quote.
.*            - Then there will be other characters until...
Item:\s\d+    - we find 'Item: ' followed by one or more digits (you said this was important).
.*            - Then there will be other characters until...
<\/div>       - we find '</div>'

The "3" means put all matches into an array
As you can see RegExes are not for the faint-hearted - I rate them as one of the hardest things I have ever tried to learn and they still make my brain hurt. But they can be incredibly powerful and are well worth the effort to get at least a working knowledge of how they are constructed - the real gurus around here can do magic with them. :sorcerer:

I always recommend this site as a good place to get started. But if you are a novice programmer I would suggest you concentrate on the basics for a while yet. ;)

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

<snip> 

I get the basic concept of how to use this function, but I'm not sure I'm going to get it to work in the way I need it to. The hyperlink I'm trying to extract is one of several dozen other links in an extracted InetRead.

Local $dData = InetRead("http://marketplace.com/") ; Read page source code
Local $sData = BinaryToString($dData) ; Convert source from binary to characters
Local $sScan = StringInStr($sData, "3.00") ; Find the number "3.00"
Basically I'm searching a dynamically changing marketplace for the price of $3.00, and once I find it I need to open the hyperlink relevant to the price. This could be sandwiched several items down the page, though. I'm not having any problems finding the price which currently works to trigger a message box letting me know something's been found at my designated threshold, but I want to be able to provide a direct link using the ShellExecute command.

The hyperlink does follow a specific pattern, but the problem is that the other 9 links on the page also follow that same pattern with the difference being the specific item name, price, and hyperlink, which is why I was looking to flip the source code so I could locate the nearest "<a".

I might not be understanding the functionality of StringRegExp though, so maybe someone can tell me if this would work:

StringRegExp ( "test", "pattern" [, flag = 0 [, offset = 1]] )
I assume offset is the line number in the case of my extract source code, so would it be possible to detect the line number using FileReadLine (or a similar command), then subtract the number of lines back to the nearest anchor tag and then use StringRegExp?

 

Ah, sorry. Perhaps grasp the basics and then move forward with complicated programming concepts.

 

I'm trying, but I also like to get my hands dirty with complicated concepts too because it helps me learn more quickly. :)

Edited by Melba23
Huge quotes removed
Link to comment
Share on other sites

  • Moderators

4b0082,

When you reply, please use the "Reply to this topic" button at the top of the thread or the "Reply to this topic" editor at the bottom rather than the "Quote" button - we know what we wrote and it just pads the thread unnecessarily. ;)

I suggest posting a representative data dump from that site with an explanation of which link from within it you require to extract and what makes it the target - we might then be able to work out a suitable pattern and get it working for you. :)

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

Right now I'm trying to avoid using the site url because I don't really want my script available to people who might have similar ideas for it, but if it comes down to being unable to find a solution, I'll give my code. Sorry, I know, frustrating. Anyway, I'm trying to play around with StringRegExp and I'm getting an error when I try to run the code..

==> Subscript used on non-accessible variable.:
MsgBox($MB_SYSTEMMODAL, "Extract", $aExtract[0])
MsgBox($MB_SYSTEMMODAL, "Extract", $aExtract^ ERROR
>Exit code: 1    Time: 4.675
#include <MsgBoxConstants.au3>
#RequireAdmin

$pCurrent = "4"
$dData = InetRead("http://marketplace.com")
$sData = BinaryToString($dData)
$aExtract = StringRegExp($sData, '(?s)<a>.*href="(.*)".*\Q&#36;' & $pCurrent & '..*<\/a>', 3)
MsgBox($MB_SYSTEMMODAL, "Extract", $aExtract[0])
Edited by 4b0082
Link to comment
Share on other sites

  • Moderators

4b0082,

 

I'm trying to avoid using the site url

Then make a redacted copy of the data you get from the site and post that. All you need to do is make sure that the relevant parts (the links and the cost from what details you have already posted) are not too different from reality so that the pattern will still work. Otherwise it is going to be a bit difficult to help. ;)

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

After tinkering with the code more, I still can't figure out why I'm getting an error..
This is working code from a test.au3 I've made:

#include <MsgBoxConstants.au3>
#include <StringConstants.au3>

$pCurrent = "4"
Local $aArray = StringRegExp('<a href="hello">a&#36;4.94</a>', '(?i)<a.*href="(.*?)".*>.*&#36;' & $pCurrent & '..*</a>', $STR_REGEXPARRAYFULLMATCH)
For $i = 0 To UBound($aArray) - 1
    MsgBox($MB_SYSTEMMODAL, "RegExp Test with Option 2 - " & $i, $aArray[$i])
 Next

So I can't figure out why replacing "'<a href="hello">a&#36;4.94</a>'" with a $variable is giving me an error.

I think it has something to do with InetRead being broken into different lines; I'm doing all the tests I can think of and nothing is yielding an output.

Does using BinaryToString that outputs a string with several lines affect my array in some way?
I'm going nuts trying to get this to run.

Edit: The more I mess with this the more I'm certain that it's because my string uses multiple lines. Can someone help me find a fix for this?

Edited by 4b0082
Link to comment
Share on other sites

Regular expressions are tricky and need precision

If you don't post the concerned part of the source code as Melba asked you twice, it's nearly impossible to give a correct and appropriate answer

I'm to the point where I feel like giving up, so here's my source. I did contact Melba after I was asked, but he was unable to help any further for personal reasons and I'm still hitting a lot of walls. I think I'd like to approach it from a different angle by collecting all of the prices and names at one time and then processing them, rather than processing them as they're being searched.

I'm still unable to get StringRegExp to work the way I'm expecting it to, though. I found a solution to converting the entire source code into one line (didn't seem to help), and I was able to start getting results by removing important chunks of my search parameters (but that's not going to cut it).

At this point I just need something that works so I can move on. :(

 

Edited by 4b0082
Link to comment
Share on other sites

  • Moderators

4b0082,

As I mentioned when we exchanged PMs, I see no problem with what you are trying to do - the question is simply one of interaction with the content of a website and has nothing to do with gaming per se. So if anyone wants to help, please feel free to do so. :)

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

For some reason I can't get the code you posted to display an array. However, after using a post you made in another thread and at the advice of Melba, and beating my head on my desk, I FINALLY got something to work! It almost makes those 6 hours of frustration worth it! :P

At this point, I'm wondering if there's a way I can break my array results into a multidimensional array (10 rows 3 columns) so the data is coupled with it's relevant information. It's probably not necessary to do what I'm trying to do, but I might try to export data to an Excel document down the road and it might be more useful to have it arranged in that way.

#include <Array.au3>

 $sData = BinaryToString(InetRead("http://steamcommunity.com/market/"))
 $aItems = StringRegExp($sData, '(?sU)row_link.*href="(.*)".*&#36;(\d+\.\d{2}).*item_name".*>(.*)<\/span>.*<\/a>', 3)

 _ArrayDisplay($aItems)

Calling multidimensional array results is as simple as $aItems[1][1] to output the information stored in that spot, correct?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...