Jump to content

Manipulating text (delete a big chunk of text)


Recommended Posts

Okay guys, I'm scratching my head on this one. I know what I want to do, but just not sure how to approach it.

This code:

#include <GUIConstantsEx.au3>
#include <WindowsConstants.au3>
#include <IE.au3>
#include <array.au3>
#include <Filev1.au3>
#include <StaticConstants.au3>
#include <Clipboard.au3>


Global $fIni, $zip, $url, $oIE
$fIni = "c:\temp\zip.ini"
$count = 1
$zip = IniRead($fIni, "Zip", $count, "NotFound")
$url = "http://www.acehardware.com/mystore/storeLocator.jsp#"&$zip&"|50"

aceData()


Func aceData()
$fIni = "c:\temp\zip.ini"
$count = 27768
$zip = IniRead($fIni, "Zip", $count, "NotFound")
$url = "http://www.acehardware.com/mystore/storeLocator.jsp#"&$zip&"|50"

_IEErrorHandlerRegister()
Global $oIE = _IECreateEmbedded()

GUICreate("Ace Hardware Data Grabber", 1920, 1080, _
(@DesktopWidth - 1920) / 2, (@DesktopHeight - 1080) / 2, _
$WS_OVERLAPPEDWINDOW + $WS_VISIBLE + $WS_CLIPSIBLINGS + $WS_CLIPCHILDREN)
GUICtrlCreateObj($oIE, 10, 40, 900, 750)



GUISetState() ;Show GUI


_IENavigate($oIE, "http://www.acehardware.com/mystore/storeLocator.jsp#"&$zip&"|50")
_IELoadWait($oIE)
_IEPropertySet($oIE,"silent","true")
Do
_IENavigate($oIE, "http://www.acehardware.com/mystore/storeLocator.jsp#"&$zip&"|50")

_IELoadWait($oIE,"5",'')
_IEPropertySet($oIE,"silent","true")
_IELoadWait($oIE,"5",'')
_IEAction($oIE,"selectall")
$aCopy=_IEAction($oIE,"copy")
$aData=_ClipBoard_GetData()
$File="c:\temp\ace"&$count&".txt"
FileOpen($file,1)
FileWrite($file,$aData)
FileClose($file)
;~ Local $aArray
;~ _FileReadToArray($file,$aArray)
;~ _ArrayDisplay($aArray)
;~ _ArrayToString($aArray)
;~ _ArrayToClip($aArray)
;~ $bData=_ClipBoard_GetData()
;~ FileOpen($file,2)
;~ FileWrite($file,$bData)
;~ FileClose($file)
;~ _ArrayDisplay($aArray)


$count= $count + 1
$zip = IniRead($fIni, "Zip", $count, "NotFound")
Until $count=27769


EndFunc

Is going to give me a text file where all of the information I want is in the middle, and several (100+) lines of junk text before and after the text I really want. Whats the best way of getting rid of this text? The text I want to keep always starts at Line 165 of the document, but doesn't always end on the same line...Any help would be appreciated!

Edit..The commented out portion is where I have been trying to achieve this by manipulating the file within an array.

Edited by allSystemsGo
Link to comment
Share on other sites

If you use _FileReadToArray, you can access the lines you need by reading the array from $aArray[165], until you hit the end of the file. If you are looking to just open the file and then write a new one, you can use _FRTA and then use _FileWriteFromArray starting from element 165 until you get to the end of the data you need.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

allSystemsGo,

Don't have #include <filev1.au3> or your ini file so cannot run the posted code.

Can you provide a sample of the data that you are trying to manipulate?

Also:

Are you really going to iterate through 27,769 instances of IE?

If you are just getting data there is no need to show IE or a gui.

kylomas

edit: and what does the ini file look like?

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

@kylomas

When its all said and done, I'm going to go through about 43K lines on an .ini file. A quick way to get a sample of what I am getting would be to go to http://www.acehardware.com/mystore/storeLocator.jsp#10012|50 and do a select all and paste into a text editor. There will be a bunch of text from line 1-164, which I do not want. There, a list of stores start. Once that list is done, there will be some more text that I do not want. This particular site does not run tables or forms (that I can see) so its a bit more of a brute way to get the data. Im showing the GUI right now just for reference, as the way I am running it, it only iterates through the ini for one line.

Link to comment
Share on other sites

allSystemsGo,

Then do a BrewmanNH suggested and just grab the parts of the array that you want.

Is this a one time only kind of thing?

And does "going through 43K lines of an ini file" mean 43K instantiations of IE?

kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

@kylomas

No, its not going to create 43k instances of IE. I am trying what BrewManNH suggested.I guess I am doing it wrong....because nothing has changed.

$File="c:\temp\ace"&$count&".txt"
FileOpen($file,1)
FileWrite($file,$aData)
FileClose($file)
Local $aArray
_FileReadToArray($file,$aArray)
;~ _ArrayDisplay($aArray)
;~ _ArrayToClip($aArray[165])
;~ $bData=_ClipBoard_GetData()
;~ FileOpen($file,2)
;~ FileWrite($file,$bData)
;~ FileClose($file)
_FileWriteFromArray($file,$aArray[165])

And no, I am planning on it editing each text file.

Edited by allSystemsGo
Link to comment
Share on other sites

allSystemsGo,

You are using filewritefromarray wrong. It should be something more like this

_FileWriteFromArray($file, $aray, 165, 999)

kylomas

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

There are 43629 zip codes in the United States. I am tasked with finding every store in this certain chain, dumping them into individual txt files per zip code, so that my boss can run a program that he has coded which will then create a report for our sales staff of prospects that they can reach out to.

Link to comment
Share on other sites

allSystemsGo,

I see what you mean about difficulty getting data out of this web site. I created a file manually for the following. This may help you out as the number of stores returned varies by zip code. This regexp will parse these (except for some shit at the top that I can' get rid of).

#include <array.au3>
#include <file.au3>

; this file was created by going to the web site, CTRL-A, CTRL-S and save to this text file.

local $str = fileread(@scriptdir & '\ace stores.txt')

local $aStores = stringregexp($str,'(?ims)shop our ad(.*?)show on map',3)

FileDelete(@scriptdir & '\ace stores 010.txt')
_filewritefromarray(@scriptdir & '\ace stores 010.txt',$aStores)
shellexecute(@scriptdir & '\ace stores 010.txt')

You may be able to adapt this to your needs.

Good Luck,

kylomas

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

allSystemsGo,

Try this out.

#include <array.au3>
#include <ie.au3>

$url = 'http://www.acehardware.com/mystore/storeLocator.jsp#53214|null'

msgbox(0,'',return_addresses($url))



func return_addresses($url)

local $oIE = _IECreate($url,0,0), $str, $aADDR
_IELoadWait($oIE)

local $odivs = _IETagNameGetCollection($oIE,'div')

for $odiv in $odivs
$str &= $odiv.innertext & @lf
next

_IEQuit($oIE)
$oIE = 0

$aADDR = stringregexp($str,'(?ims)shop our ad(.*?)show on map',3)

$str = ''

for $1 = 1 to ubound($aADDR) - 1
$str &= $aADDR[$1] & @lf
Next

return $str

endfunc

kylomas

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

I have tried your suggestions...each one almost works. The weirdest thing I have found about this particular website is that instead of clicking 'Next' to get to the rest of the list, you can simply select the whole page and it will give you the whole list. I wonder what part of the html this is found in? Because I could see your second suggestion being used in that way, as that it only reads that portion of the code and writes it to a file.

Link to comment
Share on other sites

Yes, thought I had it but.....NOT!

What's weird is I can see the elements that i want to get using DebugBar and Developer Tools in IE9 but I can't get the innertext from them. The data returned from something like "$oDivs.innertext" is way more than the value of the element that I am trying to target. I suspect that I'm not referencing the element correctly. This happens even when I do something like

if $odiv.id = '"storeList1"' then consolewrite($odiv.outertext & @lf)

Incidentally, when there are more than 50 stores, only th first 50 are returned when you do the CTRL-A, CTRL-C sequence. I was testing this using Chicago zip codes.

The last thing that I was trying to make work was

#include <array.au3>
#include <ie.au3>
#include <userudfs.au3>
_IEErrorHandlerRegister("ERR")
$url =  'http://www.acehardware.com/mystore/storeLocator.jsp#53214|null'
return_addresses($url)
func return_addresses($url)
 local $str,$aADDR
    local $oIE = _IECreate($url,0,0)
    _IELoadWait($oIE)
    local $odivs = _IETagNameGetCollection($oIE,'div')
    for $odiv in $odivs
        if $odiv.id = '"storeList1"' then consolewrite($odiv.outertext & @lf)
    next
    _IEQuit($oIE)
    $oIE = 0
endfunc
Func ERR()
    consolewrite(StringStripWS($oIEErrorHandler.WinDescription, 2) & @lf)
    Return
EndFunc

but the same results (more text than is in the element that I think I'm addressing)

I'm sure that it is the way that I'm navigating the DOM tree but I don't know how to do it correctly.

kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

Is there a way to return only the data found in

<html>
<!-- results list -->
   <div id='resultListHolder'>
    <div id="ResultNumberHeader">
     <div></div>
    </div>
    <div id="ResultList">
     <div></div>
    </div>
   </div>
</html>

Possibly that would return only the store list?

Link to comment
Share on other sites

I'd load it up into an IE browser...

_IECreate

_IEDocWriteHTML

then you can use the other functions to grab the collection, and loop through them

Try it out, or provide an html doc with a couple stores included.

Edited by jdelaney
IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Link to comment
Share on other sites

@jdelaney - How does that differ from what I was doing in post #15?

I can get to the element like this

if $odiv.id = '"storeList1"' then consolewrite($odiv.innertext & @lf)

but it seems like the consolewrite is outputing a superset of the div object.

kylomas

edit: @allSystemsGo - that is what I was trying to do most of last night. ResultList has subordinate div's that are exactly what you are looking for but I can't figure out how to get just the individual element out.

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

oh, was just answering the one question he had, didn't check the historical input of others...you would need to use the $odiv you found, and loop through the .childNodes

Something like:

For $oChild in $oDiv.childNodes

Next

although, I would need the actual sample of the HTML....it's possible there is another layer of .childNodes you need to crawl through

Edited by jdelaney
IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...