Sign in to follow this  
Followers 0
Andrew

Embedded ctrl character?

14 posts in this topic

1) I am no programmer -- I am a novice AutoIT user that is struggling to learn all the things that can be done with this powerful tool and is somewhat intimidated by the knowledge here.

2) I have a problem that probably has a simple solution but is apparently beyond my expertise. I have looked at examples and searched the forum to try and come up with a solution but, again, it's hard when you don't know what you're doing :-(

3) The problem I have is that I am writing data elements from output on a web page to an array that I will later write to an excel spreadsheet. The data is of varying length and evidently can have CRLF (or similar) in them. As an example, a line of data may look like this:

DART80 - Sep 2009 - 11.11 DVD

AppDiscAgent A.04.01.01.107 HP Application Discovery Agent

Somewhere -- whether it's in the way I am capturing this or perhaps the data itself -- there seems to be a CRLF getting embedded after "DVD". What I need to do is remove this either as I write it to the array, or as I write it to the excel spreadsheet, so it looks like this:

DART80 - Sep 2009 - 11.11 DVD AppDiscAgent A.04.01.01.107 HP Application Discovery Agent

This is a snippet of my code:

If $result = 0 Then
        Dim $avArray[1]
        $count1 = 1
        $oDiv = _IEGetObjById($oIE, "indent")
        $oElements = _IETagNameAllGetCollection($oDiv)
        For $oElement In $oElements
            $sText = _IEPropertyGet($oElement, "innerText")
            If $count1 = 1 Then
                _ArrayAdd($avArray, $sText)

After I am done with the array, I write it to excel:

$sDartString = _ArrayToString($avArray, @CRLF , 1)
        _ExcelWriteCell($oExcel, $sDartString, $count2, 3)

NOTE: 1) I am writing the entire string to a single cell.

2) I *do* want a CRLF to delimit the multiple elements being written to excel -- I just don't want the ones apparently already embedded into each element.

I hope I have provided enough information that you understand what I am trying to do. Any suggestions / assistance would be appreciated.

Share this post


Link to post
Share on other sites



Try using StringRegExpReplace. you will need to know if you are dealing with @CR, @LF or @CRLF.

$newText = StringRegExpReplace($text, @CRLF, " ",1))

Will replace the first @CRLF in the $text with a space. For example:

$text = "Hello" & @CRLF & "my name is " & @CRLF & "Inigo Montoya"
MsgBox(0,"", $text)
$newText = StringRegExpReplace($text, @CRLF, " ", 1)
MsgBox(0,"", $newText)

Share this post


Link to post
Share on other sites

Thank you, ctyankeeinok, for responding. I tried StringRegExpReplace using @CR, @LF & @CRLF and none worked. The page source is the following:

<h2>Results of Search</h2><div id=indent>
  <div class=results>
     <div class=row_header>
        <span class=cdinfo>DART80 - Sep 2009 - 11.11</span>
        <span class=description>DVD</span>
     </div>
     <div class=row>
       <span class=product>AppDiscAgent</span>
       <span class=version>A.04.01.01.107</span>
       <span class=description>HP Application Discovery Agent</span>
     </div>
  </div>

Evidently the class=row is effecting the results being written into the array. Can you recommend a way to piece together the results -- like the 'cdinfo', 'product', 'version' and 'description' classes?

Thanks!

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

It's because you add the DVD to it's own index in the array as a result of enumerating the DIVs tags in the indent DIV, one of those if a span which contains the DVD inner text.

I think that to get your desired string format you should do something like, for each entry:

For $oElement In $oElements
   $sText &= _IEPropertyGet($oElement, "innerText") & " "
Next

_ArrayAdd($avArray, $sText)
Edited by Authenticity

Share this post


Link to post
Share on other sites

For $oElement In $oElements
   $sText &= _IEPropertyGet($oElement, "innerText") & " "
Next

_ArrayAdd($avArray, $sText)

Thank you Authenticity. I only wish I understood what you were saying :-(

I substituted the _IEPropertyGet above for the one I was using, and to simplify things I directed output to a text file. Unfortunately the results weren't what I'd hoped for:

DART80 - Sep 2009 - 11.11 DVD 
AppDiscAgent A.04.01.01.107 HP Application Discovery Agent  
DART80 - Sep 2009 - 11.11 DVD 
AppDiscAgent A.04.01.01.107 HP Application Discovery Agent  DART80 - Sep 2009 - 11.11 DVD  
DART80 - Sep 2009 - 11.11 DVD 
AppDiscAgent A.04.01.01.107 HP Application Discovery Agent  DART80 - Sep 2009 - 11.11 DVD  DART80 - Sep 2009 - 11.11 
DART80 - Sep 2009 - 11.11 DVD 
AppDiscAgent A.04.01.01.107 HP Application Discovery Agent  DART80 - Sep 2009 - 11.11 DVD  DART80 - Sep 2009 - 11.11 DVD 
DART80 - Sep 2009 - 11.11 DVD 
AppDiscAgent A.04.01.01.107 HP Application Discovery Agent  DART80 - Sep 2009 - 11.11 DVD  DART80 - Sep 2009 - 11.11 DVD AppDiscAgent A.04.01.01.107 HP Application Discovery Agent  
DART80 - Sep 2009 - 11.11 DVD 
AppDiscAgent A.04.01.01.107 HP Application Discovery Agent  DART80 - Sep 2009 - 11.11 DVD  DART80 - Sep 2009 - 11.11 DVD AppDiscAgent A.04.01.01.107 HP Application Discovery Agent  AppDiscAgent
and so on...

Any other suggestions?

Thanks!

Share this post


Link to post
Share on other sites

First you need to set $sText to "" because you're loop the results, I guess. Can you post the site, or the relevant complete code? It's not visible right now what is causing the extra @CRLF, it can be the because it's a span then the literal @CRLF is added to the innerText of the DIV, just guessing...

Share this post


Link to post
Share on other sites

Can you post the site, or the relevant complete code?

It is an internal site on our intranet, so that is not an option. I had hoped I posted the relevent html code as that is where the problem is occuring.

If it adds any clarity, it is a search page where I populate a couple of form fields and submit, and it returns results in the format I provided earlier. My purpose for using AutoIT is to automate submitting multiple queries and capture the results in an excel spreadsheet (as opposed to copy results into a notepad, massage the data, then paste to excel). The results must be written into individual cells so they can be later sorted, filtered, etc.

Below is my latest code (just for test purposes):

#include <IE.au3>

Dim $sText
$sInFile = "input.txt"
$sOutFile = "output.txt"
If FileExists($sOutFile) Then
    FileDelete($sOutFile)
EndIf

$File = FileOpen($sInFile, 0)
If $File = -1 Then
    MsgBox(0, "Error", "Unable to open input file.")
    Exit
EndIf

$url = "{internal url removed}"
$oIE = _IECreate()

While 1
    _IENavigate($oIE, $url)
    $sProduct = FileReadLine($File)
    If @error = -1 Then ExitLoop
    $oForm = _IEFormGetObjByName($oIE, "Form")
    $oText = _IEFormElementGetObjByName($oForm, "product_search")
    _IEFormElementSetValue($oText, $sProduct)
    _IEFormElementRadioSelect($oForm, "11.11", "os", 1, "byValue")
    _IEFormElementRadioSelect($oForm, "DVD.800", "media", 1, "byValue")
    $oSubmit = _IEGetObjByName($oIE, "submit")
    _IEAction($oSubmit, "click")
    _IELoadWait($oIE)
    $body = _IEBodyReadText($oIE)
    $result = StringInStr($body, "No results matched.")

    If $result = 0 Then
        $oDiv = _IEGetObjById($oIE, "indent")
        $oElements = _IETagNameAllGetCollection($oDiv)
        For $oElement In $oElements
            $sText = _IEPropertyGet($oElement, "innerText")
            FileWriteLine($sOutFile, $sText)
        Next
    EndIf
WEnd
_IEQuit($oIE)
FileClose($sInFile)
FileClose($sOutFile)
Exit

and output looks like this:

DART80 - Sep 2009 - 11.11 DVD 
AppDiscAgent A.04.01.01.107 HP Application Discovery Agent 
DART80 - Sep 2009 - 11.11 DVD 
DART80 - Sep 2009 - 11.11
DVD
AppDiscAgent A.04.01.01.107 HP Application Discovery Agent 
AppDiscAgent
A.04.01.01.107
HP Application Discovery Agent

which presumably coincides with

<h2>Results of Search</h2><div id=indent>
  <div class=results>
     <div class=row_header>
        <span class=cdinfo>DART80 - Sep 2009 - 11.11</span>
        <span class=description>DVD</span>
     </div>
     <div class=row>
       <span class=product>AppDiscAgent</span>
       <span class=version>A.04.01.01.107</span>
       <span class=description>HP Application Discovery Agent</span>
     </div>
  </div>

Is that any help?

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

I believe it is the carriage return right here:

<span class=cdinfo>DART80 - Sep 2009 - 11.11</span>
        <span class=description>DVD</span>

The innerText of the div class=results is stripped off the innerHTML but the literal format is still consistent.

Can you run another test and post the output:

Local $iLine = 0
 While 1
    _IENavigate($oIE, $url)
    $sProduct = FileReadLine($File)
    If @error = -1 Then ExitLoop
    $oForm = _IEFormGetObjByName($oIE, "Form")
    $oText = _IEFormElementGetObjByName($oForm, "product_search")
    _IEFormElementSetValue($oText, $sProduct)
    _IEFormElementRadioSelect($oForm, "11.11", "os", 1, "byValue")
    _IEFormElementRadioSelect($oForm, "DVD.800", "media", 1, "byValue")
    $oSubmit = _IEGetObjByName($oIE, "submit")
    _IEAction($oSubmit, "click")
    _IELoadWait($oIE)
    $body = _IEBodyReadText($oIE)
    $result = StringInStr($body, "No results matched.")

    If $result = 0 Then
        $oDiv = _IEGetObjById($oIE, "indent")
        $oElements = _IETagNameAllGetCollection($oDiv)
        For $oElement In $oElements
            $sText = $iLine & " => " & _IEPropertyGet($oElement, "innerText")
            FileWriteLine($sOutFile, $sText)
        Next
        $iLine += 1
    EndIf
WEnd

Edit: My bad, I've corrected the code.

Edited by Authenticity

Share this post


Link to post
Share on other sites

0 => DART80 - Sep 2009 - 11.11 DVD 
AppDiscAgent A.04.01.01.107 HP Application Discovery Agent 
0 => DART80 - Sep 2009 - 11.11 DVD 
0 => DART80 - Sep 2009 - 11.11
0 => DVD
0 => AppDiscAgent A.04.01.01.107 HP Application Discovery Agent 
0 => AppDiscAgent
0 => A.04.01.01.107
0 => HP Application Discovery Agent

Share this post


Link to post
Share on other sites

0 => DART80 - Sep 2009 - 11.11 DVD 
AppDiscAgent A.04.01.01.107 HP Application Discovery Agent 
0 => DART80 - Sep 2009 - 11.11 DVD 
0 => DART80 - Sep 2009 - 11.11
0 => DVD
0 => AppDiscAgent A.04.01.01.107 HP Application Discovery Agent 
0 => AppDiscAgent
0 => A.04.01.01.107
0 => HP Application Discovery Agent

maybe this will help.. i have a similar situation with an intranet site table that i retrieve and this is what i use to do it

#include <IE.au3>
#include <array.au3>

$oIE = _IECreate("Website url here", 0, 0, 1, 0)       ;No attatch, Not visible, Wait to load, Don't take focus
$oTable = _IETableGetCollection ($oIE, "2")            ;For my table the index was 2 but yours could be -1, 1, 3 just play with this number if you don't get the right results first.
$aTableData = _IETableWriteToArray ($oTable, True)     ;in my case i used true, this value could also be false
_ArrayDisplay($aTableData)
_IEAction($oIE,"quit")

it returns a multidimensial array so then you can eaither play with the data with $aTableData[0][0] would be the top left cell ect...

or probably just dump it directly to excel :)

don't feel bad cause i'm also farly new to autoit and before i found this, i had 50 lines of code doing the exact same thing :)

Share this post


Link to post
Share on other sites

Thanks timan12. I tried IETableCollection initially but there are no tables. :)

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

@Andrew, I think that the results are crystal clear. You need to remove any @CR\@LF from the string before you add it to the array. Alternatively, you can write a few results to the file and open it using a hex editor to see what are the bytes sequence following the DVD string. It can be 0x13 0x10 or other line break sequences.

Edited by Authenticity

Share this post


Link to post
Share on other sites

Thanks timan12. I tried IETableCollection initially but there are no tables. :)

haha that sucks :)

tables make everything easier

Share this post


Link to post
Share on other sites

Authenticity, your suggestion to use a hex editor was spot on! The results clearly indicated an embedded CRLF, so it was obvious I'd made a mistake when I tried earlier to do StringRegExpReplace. Second time around it worked like a charm.

Thanks very much for your assistance!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0