ConsultingJoe Posted February 10, 2006 Share Posted February 10, 2006 This what came out of an aim instant message. the text i need from this is just "dfg" I tried stringsplit but it just wont work right, please help <HTML><BODY BGCOLOR="#ffffff"><B><FONT COLOR="#0000ff" LANG="0">JoE DA HoE 6900<!-- (10:12:38 PM)--></B></FONT><FONT COLOR="#0000ff" BACK="#ffffff">:</FONT><FONT COLOR="#000000"> dfg</FONT><BR> <BODY BGCOLOR="#ffffff"><B><FONT COLOR="#ff0000">JoE DA HoE 6900<!-- (10:12:39 PM)--></B>:</FONT><FONT COLOR="#000000"> dfg</FONT></BODY></HTML> <HTML><BODY BGCOLOR="#ffffff"><B><FONT COLOR="#0000ff" LANG="0">JoE DA HoE 6900<!-- (10:12:38 PM)--></B></FONT><FONT COLOR="#0000ff" BACK="#ffffff">:</FONT><FONT COLOR="#000000"> dfg</FONT><BR> <BODY BGCOLOR="#ffffff"><B><FONT COLOR="#ff0000">JoE DA HoE 6900<!-- (10:12:39 PM)--></B>:</FONT><FONT COLOR="#000000"> dfg</FONT></BODY></HTML &Send &Warn Bloc&k Send &Instant Message... Get AIM E&xpressions Play Games Start Live Video &Talk &Chat Check out ConsultingJoe.com Link to comment Share on other sites More sharing options...
ACalcutt Posted February 10, 2006 Share Posted February 10, 2006 (edited) i wasn't able to get this working completely...and its only for one line...bit maybe it will help $file = '<HTML><BODY BGCOLOR="#ffffff"><B><FONT COLOR="#0000ff" LANG="0">JoE DA HoE 6900<!-- (10:12:38 PM)-->' $split = StringSplit($file, ">") #include <Array.au3> For $l = 1 to $split[0] If StringInStr ( $split[$l], "<") = 1 Then ; Else If StringInStr($split[$l], "<") Then $split[$l] = StringTrimRight ( $split[$l], StringInStr($split[$l], "<")) MsgBox(0, "", $split[$l]) EndIf Next _ArrayDisplay ( $split, "") Edited February 10, 2006 by ACalcutt Andrew Calcutt Http://www.Vistumbler.net Http://www.TechIdiots.net Its not an error, its a undocumented feature Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted February 10, 2006 Moderators Share Posted February 10, 2006 Rather Crude but this works:expandcollapse popup#include <file.au3> Local $aArray = '' Local $sFilePath = @DesktopDir & '\test..txt' Local $eXclude[3]; add to the array, the particular items you don't want to show on the return $eXclude[1] = 'JoE DA HoE 6900' $eXclude[2] = ':' _FileReadToArray($sFilePath, $aArray) GetNonHtml($aArray, $eXclude) Func GetNonHtml(ByRef $aArray, $eXclude = '') Local $nArray = '' For $i = 1 To UBound($aArray) - 1 $StringBetween = StripHtml($aArray[$i], '>', '<', $eXclude) If $StringBetween = 0 Then ContinueLoop For $x = 1 To UBound($StringBetween) - 1 $nArray = $nArray & $StringBetween[$x] & Chr(01) Next Next Local $spSp = StringSplit(StringTrimRight($nArray, 1), Chr(01)) Local $nfOpen = '' $nfOpen = FileOpen(StringTrimRight($sFilePath, 4) & '_stripped.txt', 9) For $x = 1 To UBound($spSp) - 1 FileWriteLine($nfOpen, $spSp[$x]) Next FileClose($nfOpen) EndFunc Func StripHtml($LineToRead, $start, $end, $eXclude = '1') Local $sPlit = StringSplit($LineToRead, $start) Local $nArray = '' For $i = 1 To UBound($sPlit) - 1 If StringMid($sPlit[$i], 1, 1) <> '' And StringMid($sPlit[$i], 1, 1) <> $end Then For $x = 1 To StringLen($sPlit[$i]) Local $SMid = StringMid($sPlit[$i], $x, 1) If $SMid = $end Then Local $fOund = StringLeft($sPlit[$i], $x - 1) Local $ErrorCheck = 0 For $y = 1 To UBound($eXclude) - 1 If $fOund = $eXclude[$y] Then $ErrorCheck = 1 ExitLoop EndIf Next If $ErrorCheck = 0 Then $nArray = $nArray & StringStripWS($fOund, 7) & Chr(01) ExitLoop EndIf EndIf Next EndIf Next If $nArray <> '' Then Return StringSplit(StringTrimRight($nArray, 1), Chr(01)) Else Return 0 EndIf EndFunc Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
Knight Posted February 10, 2006 Share Posted February 10, 2006 My attempt. It will search for the unique ">:<" substring and get it's location. Then it will count 32 characters ahead, where the text starts. It will then convert the string into an array, and search for the first "<" proceeding the beginning of the text. It will take all the elements of the array between those two points and create a single, new array from it. Finally it will convert that array into a string; resulting in the text! The script is made to loop through the string until no more text is found. May not be the best method, but its 2:14 AM and I whipped it up rather quickly. #include <array.au3> $String = '<HTML><BODY BGCOLOR="#ffffff"><B><FONT COLOR="#0000ff" LANG="0">JoE DA HoE 6900<!-- (10:12:38 PM)--></B></FONT><FONT COLOR="#0000ff" BACK="#ffffff">:</FONT><FONT COLOR="#000000"> dfg</FONT><BR><BODY BGCOLOR="#ffffff"><B><FONT COLOR="#ff0000">JoE DA HoE 6900<!-- (10:12:39 PM)--></B>:</FONT><FONT COLOR="#000000"> dfg</FONT></BODY></HTML><HTML><BODY BGCOLOR="#ffffff"><B><FONT COLOR="#0000ff" LANG="0">JoE DA HoE 6900<!-- (10:12:38 PM)--></B></FONT><FONT COLOR="#0000ff" BACK="#ffffff">:</FONT><FONT COLOR="#000000"> dfg</FONT><BR><BODY BGCOLOR="#ffffff"><B><FONT COLOR="#ff0000">JoE DA HoE 6900<!-- (10:12:39 PM)--></B>:</FONT><FONT COLOR="#000000"> dfg</FONT></BODY></HTML&Send&WarnBloc&kSend &Instant Message...Get AIM E&xpressionsPlay GamesStart Live Video&Talk&Chat' $Var = 1 While 1 $TextStart = StringInStr($String, ">:<", 0, $Var) + 32 If $TextStart = 32 Then ExitLoop $rString = StringSplit($String, "") $TextEnd = _ArraySearch($rString, "<", $TextStart) - 1 $New = _ArrayCreate("Success") For $T = $TextStart To $TextEnd _ArrayAdd($New, $rString[$T]) Next _ArrayDelete($New, 0) $New = _ArrayToString($New, "|") $New = StringReplace($New, "|", "") MsgBox(0, "Occurrence "&$Var&" of text!", "Text #"&$Var&": "&$New) $Var += 1 WEnd MsgBox(0, "Done", "Done: Found "&$Var-1&" occurrences of text.") Link to comment Share on other sites More sharing options...
Gene Posted February 11, 2006 Share Posted February 11, 2006 (edited) Hi @zerocool60544, This one is a brute force method, and and can be used as a general purpose HTML Code stripper too. It isn't elegant, but it is quick and writes the results to a timestamped log. It uses an INI file to take care of current Boiler Plate items and if any new ones turn up you can easily add them. There is a ZIP file attached. Gene Edit: I used the $char var in debugging and forgot to take it out. expandcollapse popupGlobal $sWorkVar, $sWorkVar2, $iCodeFlag, $var, $i, $char $char = 0 $sFilePath = FileOpenDialog ( "Select HTML file to strip.", "My Computer", "HTML (*.html)|HTM (*.htm)" , 1 ) $begin = TimerInit() $sFileContent = FileRead($sFilePath) $sWorkVar = $sFileContent While 1 If StringLeft($sWorkVar, 1) = "<" Then $iCodeFlag = 1 EndIf If StringLeft($sWorkVar, 1) = ">" Then $iCodeFlag = 0 $sWorkVar = StringTrimLeft($sWorkVar, 1) EndIf While $iCodeFlag = 1 If StringLeft($sWorkVar, 1) = ">" Then ExitLoop $sWorkVar = StringTrimLeft($sWorkVar, 1) WEnd While $iCodeFlag = 0 If StringLeft($sWorkVar, 1) = "<" Then ExitLoop If Not StringInStr($sWorkVar, ">") Then ExitLoop $sWorkVar2 = $sWorkVar2 & StringLeft($sWorkVar, 1) $sWorkVar = StringTrimLeft($sWorkVar, 1) WEnd If Not StringInStr($sWorkVar, ">") Then $sWorkVar2 = $sWorkVar2 & $sWorkVar ExitLoop EndIf WEnd $var = IniReadSection(@ScriptDir & "\Strip HTML.ini", "BoilerPlate") If @error Then MsgBox(4096, "", "Error occured, probably no INI file.") Else For $i = 1 To $var[0][0] $sWorkVar2 = StringReplace($sWorkVar2,$var[$i][1],"") ;StringReplace ( "string", "searchstring", "replacestring") Next While StringInStr($sWorkVar2,@CRLF) Or StringInStr($sWorkVar2,@CR) Or StringInStr($sWorkVar2,@LF) $sWorkVar2 = StringReplace($sWorkVar2,@CRLF,"") $sWorkVar2 = StringReplace($sWorkVar2,@CR,"") $sWorkVar2 = StringReplace($sWorkVar2,@LF,"") WEnd EndIf FileWrite ( @ScriptDir & "\Stripped.TXT", @YEAR & "/" & @MON & "/" & @MDAY & " " & @HOUR & ":" & @MIN & ":" & @SEC & $sWorkVar2 & @CRLF ) $dif = Round ( (TimerDiff($begin)/1000) , 4 ) MsgBox(0,"Time To Process The File",$dif & " seconds...", 5) MsgBox(0,"Result","The stripped data is " & $sWorkVar2, 5 ) Exit This what came out of an aim instant message. the text i need from this is just "dfg" I tried stringsplit but it just wont work right, please help <HTML><BODY BGCOLOR="#ffffff"><B><FONT COLOR="#0000ff" LANG="0">JoE DA HoE 6900<!-- (10:12:38 PM)--></B></FONT><FONT COLOR="#0000ff" BACK="#ffffff">:</FONT><FONT COLOR="#000000"> dfg</FONT><BR> <BODY BGCOLOR="#ffffff"><B><FONT COLOR="#ff0000">JoE DA HoE 6900<!-- (10:12:39 PM)--></B>:</FONT><FONT COLOR="#000000"> dfg</FONT></BODY></HTML> <HTML><BODY BGCOLOR="#ffffff"><B><FONT COLOR="#0000ff" LANG="0">JoE DA HoE 6900<!-- (10:12:38 PM)--></B></FONT><FONT COLOR="#0000ff" BACK="#ffffff">:</FONT><FONT COLOR="#000000"> dfg</FONT><BR> <BODY BGCOLOR="#ffffff"><B><FONT COLOR="#ff0000">JoE DA HoE 6900<!-- (10:12:39 PM)--></B>:</FONT><FONT COLOR="#000000"> dfg</FONT></BODY></HTML &Send &Warn Bloc&k Send &Instant Message... Get AIM E&xpressions Play Games Start Live Video &Talk &Chat Edited February 11, 2006 by Gene [font="Verdana"]Thanks for the response.Gene[/font]Yes, I know the punctuation is not right... Link to comment Share on other sites More sharing options...
Gene Posted February 12, 2006 Share Posted February 12, 2006 While I was making dinner I had a better idea. This one is a little more elegant and about a factor of 10 faster than the one I posted above. I considered adding code to eliminate duplicates of the text of interest, but decided against it. Gene A ZIP of the improved .Au3 and the INI file is attached. expandcollapse popupGlobal $sWorkVar, $sCodeStr, $var, $i $sFilePath = FileOpenDialog("Select HTML file to strip.", "My Computer", "HTML (*.html)|HTM (*.htm)", 1) $begin = TimerInit() $sFileContent = FileRead($sFilePath) $sWorkVar = $sFileContent $var = IniReadSection(@ScriptDir & "\Strip HTML.ini", "BoilerPlate") If @error Then MsgBox(4096, "", "Error occured, probably no INI file.") Else For $i = 1 To $var[0][0] $sWorkVar = StringReplace($sWorkVar, $var[$i][1], "") Next While StringInStr($sWorkVar, ">:<") $sWorkVar = StringReplace($sWorkVar, ">:<", "><") WEnd While StringInStr($sWorkVar, @CRLF) Or StringInStr($sWorkVar, @CR) Or StringInStr($sWorkVar, @LF) $sWorkVar = StringReplace($sWorkVar, @CRLF, "") $sWorkVar = StringReplace($sWorkVar, @CR, "") $sWorkVar = StringReplace($sWorkVar, @LF, "") WEnd EndIf While 1 $iBegin = StringInStr($sWorkVar, "<") $iEnd = StringInStr($sWorkVar, ">") + 1 If $iBegin = 0 And $iEnd = 1 Then ExitLoop EndIf $sCodeStr = StringMid($sWorkVar, $iBegin, $iEnd - $iBegin) While StringInStr($sWorkVar, $sCodeStr) $sWorkVar = StringReplace($sWorkVar, $sCodeStr, "") WEnd $sCodeStr = "" WEnd FileWrite(@ScriptDir & "\Stripped.TXT", @YEAR & "/" & @MON & "/" & @MDAY & " " & @HOUR & ":" & @MIN & ":" & @SEC & $sWorkVar & @CRLF) $dif = Round((TimerDiff($begin) / 1000), 4) MsgBox(0, "Time To Process The File", $dif & " seconds...", 5) MsgBox(0, "Result", "The stripped data is " & $sWorkVar, 5) Exit [font="Verdana"]Thanks for the response.Gene[/font]Yes, I know the punctuation is not right... Link to comment Share on other sites More sharing options...
neogia Posted February 12, 2006 Share Posted February 12, 2006 Here's a pretty general function that should work using regular expressions:$test = '<HTML><BODY BGCOLOR="#ffffff"><B><FONT COLOR="#0000ff" LANG="0">JoE DA HoE 6900<!-- (10:12:38 PM)--></B></FONT><FONT COLOR="#0000ff" BACK="#ffffff">:</FONT><FONT COLOR="#000000"> dfg</FONT><BR><BODY BGCOLOR="#ffffff"><B><FONT COLOR="#ff0000">JoE DA HoE 6900<!-- (10:12:39 PM)--></B>:</FONT><FONT COLOR="#000000"> dfg</FONT></BODY></HTML><HTML><BODY BGCOLOR="#ffffff"><B><FONT COLOR="#0000ff" LANG="0">JoE DA HoE 6900<!-- (10:12:38 PM)--></B></FONT><FONT COLOR="#0000ff" BACK="#ffffff">:</FONT><FONT COLOR="#000000"> dfg</FONT><BR><BODY BGCOLOR="#ffffff"><B><FONT COLOR="#ff0000">JoE DA HoE 6900<!-- (10:12:39 PM)--></B>:</FONT><FONT COLOR="#000000"> dfg</FONT></BODY></HTML' $id = "JoE DA HoE 6900" ;I assume this is your screenname, id, etc. $texts = "" ;You could easily transform this into an array $check = StripText($test, $id) While $check <> "" If $check <> $id Then ;If you got rid of this check, it would display all ids as entries $texts &= $check & @CRLF ;If you wanted to store each entry in an array you would do it here. EndIf $test = StringTrimLeft($test, StringInStr($test, $check) + StringLen($check) - 1) $check = StripText($test, $id) WEnd MsgBox(0, "Results", $texts) ;Will display all results Func StripText($test, $id = "") $results = StringRegExp($test, "(>)([a-zA-Z 0-9]+)(<)", 1) ;Note: will not capture blank entries If IsArray($results) Then If $results[1] <> "" Then Return $results[1] EndIf EndIf EndFunc ;==>StripText [u]My UDFs[/u]Coroutine Multithreading UDF LibraryStringRegExp GuideRandom EncryptorArrayToDisplayString"The Brain, expecting disaster, fails to find the obvious solution." -- neogia Link to comment Share on other sites More sharing options...
ConsultingJoe Posted February 12, 2006 Author Share Posted February 12, 2006 Thank you everyone, I liked them all but especialy the last two. thanks sorry I thought no one replyed because I was expecting a e-mail notification. again thanks I i might to try to make it into a udf for my program, it would use aim for remote control purposes but mostly text to speech. Check out ConsultingJoe.com Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now