Jump to content
Sign in to follow this  
myspacee

Remove unusefull text

Recommended Posts

myspacee

Hello,

extract text from an HTML for extracting text.

Now I want to remove all html codes between < and > symbols.

try to use, without success, stringRegExpReplace :

$mytext = stringRegExpReplace($mytext, "\<(.*)\>", "")

post here an input example

<DIV style="position:absolute;top:459;left:63"><nobr><span class="miostile"><b>di ciccio formaggio</b></span></nobr></DIV>
<DIV style="position:absolute;top:475;left:63"><nobr><span class="miostile"> PAVIA</span></nobr></DIV>
<DIV style="position:absolute;top:499;left:63"><nobr><span class="miostile">Arrestati con la droga al bar<br>Minerva. Tre sudamericani<br>sono stati bloccati con un et-<br>to di «fumo» e con 15 grammi<br>di cocaina. La droga, secondo<br>gli investigatori, era pronta<br>ad essere spacciata. L'arresto<br>è stato movimentato con tan-<br>to di bicchieri rotti e sedie ro-<br>

and output i want to obtain:

di ciccio formaggio
PAVIA
Arrestati con la droga al barMinerva. Tre sudamericani<br>sono stati bloccati con un et-to di «fumo» e con 15 grammidi cocaina. La droga, secondogli investigatori, era prontaad essere spacciata. L'arrestoè stato movimentato con tan-to di bicchieri rotti e sedie ro-

thank you for any help,

m.

Share this post


Link to post
Share on other sites
FireFox

Hi,

Try this :

Local $s = '<DIV style="position:absolute;top:459;left:63"><nobr><span class="miostile"><b>di ciccio formaggio</b></span></nobr></DIV>' & @CrLf & _
'<DIV style="position:absolute;top:475;left:63"><nobr><span class="miostile"> PAVIA</span></nobr></DIV>' & @CrLf & _
'<DIV style="position:absolute;top:499;left:63"><nobr><span class="miostile">Arrestati con la droga al bar<br>Minerva. Tre sudamericani<br>sono stati bloccati con un et-<br>to di «fumo» e con 15 grammi<br>di cocaina. La droga, secondo<br>gli investigatori, era pronta<br>ad essere spacciata. L''arresto<br>è stato movimentato con tan-<br>to di bicchieri rotti e sedie ro-<br>'

ConsoleWrite(StringRegExpReplace($s, "(?s)<(.*?)>", "") & @crlf)

Edit: Seems like the <br> tag does not want to be displayed raw.

Br, FireFox.

Edited by FireFox

 

OS : Win XP SP2 (32 bits) / Win 7 SP1 (64 bits) / Win 8 (64 bits) | Autoit version: latest stable / beta.
Hardware : Intel(R) Core(TM) i5-2400 CPU @ 3.10Ghz / 8 GiB RAM DDR3.

My UDFs : Skype UDF | TrayIconEx UDF | GUI Panel UDF | Excel XML UDF | Is_Pressed_UDF

My Projects : YouTube Multi-downloader | FTP Easy-UP | Lock'n | WinKill | AVICapture | Skype TM | Tap Maker | ShellNew | Scriptner | Const Replacer | FT_Pocket | Chrome theme maker

My Examples : Capture toolIP Camera | Crosshair | Draw Captured Region | Picture Screensaver | Jscreenfix | Drivetemp | Picture viewer

My Snippets : Basic TCP | Systray_GetIconIndex | Intercept End task | Winpcap various | Advanced HotKeySet | Transparent Edit control

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Similar Content

    • ViciousXUSMC
      By ViciousXUSMC
      So I ran into this crazy "program" that cant be uninstalled via WMI, MSIExec, etc.
      The only way to uninstall it was from Add/Remove programs manually... Or I found if you find it in the registry under HKCU and run the  uninstall string, it will also uninstall.
      However the string in the registry cant be run directly in a cmd window because of the format errors.
      It has spaces without quotations, it has invalid characters, etc, etc 
      I know things run different when executed in the registry, so maybe there is a way I can run the regsitry key just like how the system does?  If so chime in.
      Otherwise I did this a crude way using several stringregexpreplace() functions and have it working.
      The solution feels so barbaric and crude that I wanted to post it so some of you guys better than me can clean up the code, maybe offer alternative ways to do it, or reduce the number of times I process the string.
      Here is the string right out of the registry:
      c:\Program Files\Common Files\Microsoft Shared\VSTO\10.0\VSTOInstaller.exe /Uninstall file:///C:/Users/it022565/AppData/Local/Temp/OOBAXTOWordAddIn/ApplicationXtender.AXTO.Word.vsto Here is my cave man scripting to turn this into a run able string.
       
      Func _UninstallOld() For $i = 1 to 100 ;Enumerate Registry $sEnumBase = "HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\" ;Look in HKCU for the uninstall string for the old version $sEnum = RegEnumKey($sEnumBase, $i) If @Error Then Return If $iDebug = 1 Then MsgBox(0, "", $sEnum) If StringInStr(RegRead($sEnumBase & $sEnum, "DisplayName"), "Word Addin") Then ExitLoop Next If $iDebug = 1 Then MsgBox(0, "", $sEnum) $sKey = "HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\" & $sEnum $sKey2 = RegRead($sKey, "UninstallString") If $iDebug = 1 Then MsgBox(0, "Original Install Location", $sKey2) $sKey3 = StringRegExpReplace($sKey2, "(?i)(c:.*exe)", '"$1"') If $iDebug = 1 Then MsgBox(0, "", $sKey3) $sKey4 = StringRegExpReplace($sKey3, "(?i)file:///", "") If $iDebug = 1 Then MsgBox(0, "", $sKey4) $sKey5 = StringRegExpReplace($sKey4, "%20", " ") If $iDebug = 1 Then MsgBox(0, "", $sKey5) $sKey6 = StringRegExpReplace($sKey5, '(?i)((?<!")c:.*vsto)', '"$1"') If $iDebug = 1 Then MsgBox(0, "", $sKey6) RunWait(@ComSpec & ' /c ' & '"' & $sKey6 & ' /s"', "", @SW_HIDE) EndFunc Basically step by step I add quotations, strip bad characters, etc.  Kind of proud for using look behind for once
      Looking forward to what you guys come up with.
    • VIP
      By VIP
      Need help to make function better  with full infomation
      #include <Array.au3> #include <File.au3> _TEST(@ScriptFullPath) _TEST("A:") _TEST("A:\B.c") _TEST("D:\E\F\") _TEST("G:\H/../J.k/") _TEST("M:\N\k..J.k") _TEST("D:\E\F\..\G\G\I..J.K.M") Func _TEST($sFilePath) Local $sDrive = "", $sFullPathDir = "", $sDirPath = "", $sDirName = "", $sFileName = "", $sFileNameExt = "", $sExtension = "", $sExt = "" Local $aPathSplit = _PathSplitByRef($sFilePath, $sDrive, $sFullPathDir, $sDirPath, $sDirName, $sFileName, $sFileNameExt, $sExtension, $sExt) ConsoleWrite("!Path IN : " & $sFilePath & @CRLF) ; C:\Windows\System32\etc\hosts.exe ConsoleWrite("- Driver : " & $sDrive & @CRLF) ; C: ConsoleWrite("- DirPath : " & $sFullPathDir & @CRLF) ; C:\Windows\System32\etc\etc ConsoleWrite("- DirPath : " & $sDirPath & @CRLF) ; \Windows\System32\etc\ ConsoleWrite("- DirName : " & $sDirName & @CRLF) ; etc ConsoleWrite("- FileName : " & $sFileName & @CRLF) ; hosts ConsoleWrite("- FileNameExt: " & $sFileNameExt & @CRLF) ; hosts.exe ConsoleWrite("- Extension : " & $sExtension & @CRLF) ; .exe ConsoleWrite("- Ext : " & $sExt & @CRLF & @CRLF) ; exe ;~ ConsoleWrite("!Path IN : " & $aPathSplit[0] & @CRLF) ; C:\Windows\System32\etc\hosts.exe ;~ ConsoleWrite("- Driver : " & $aPathSplit[1] & @CRLF) ; C: ;~ ConsoleWrite("- DirPath : " & $aPathSplit[2] & @CRLF) ; C:\Windows\System32\etc\etc ;~ ConsoleWrite("- DirPath : " & $aPathSplit[3] & @CRLF) ; \Windows\System32\etc\ ;~ ConsoleWrite("- DirName : " & $aPathSplit[4] & @CRLF) ; etc ;~ ConsoleWrite("- FileName : " & $aPathSplit[5] & @CRLF) ; hosts ;~ ConsoleWrite("- FileNameExt: " & $aPathSplit[6] & @CRLF) ; hosts.exe ;~ ConsoleWrite("- Extension : " & $aPathSplit[7] & @CRLF) ; .exe ;~ ConsoleWrite("- Ext : " & $aPathSplit[8] & @CRLF) ; exe ;~ _ArrayDisplay($aPathSplit, "_PathSplit of " & $sFilePath) EndFunc ;==>_TEST Func _PathSplitByRef($sFilePath, ByRef $sDrive, ByRef $sFullPathDir, ByRef $sDirPath, ByRef $sDirName, ByRef $sFileName, ByRef $sFileNameExt, ByRef $sExtension, ByRef $sExt) If StringInStr($sFilePath,"..") Then $sFilePath=_PathFull($sFilePath) Local $aPartOfPath=StringRegExp($sFilePath, "^\h*((?:\\\\\?\\)*(\\\\[^\?\/\\]+|[A-Za-z]:)?(.*[\/\\]\h*)?((?:[^\.\/\\]|(?(?=\.[^\/\\]*\.)\.))*)?([^\/\\]*))$", $STR_REGEXPARRAYMATCH) ;~ If @error Then ReDim $aPartOfPath[9] ;~ $aPartOfPath[0] = $sFilePath ;~ EndIf $aPartOfPath[0] = $sFilePath ; C:\Windows\System32\etc\hosts.exe $sDrive = $aPartOfPath[1] ; C: $sFullPathDir = $aPartOfPath[1] & $aPartOfPath[2] ; C:\Windows\System32\etc If StringLeft($aPartOfPath[2], 1) == "/" Then $sDirPath = StringRegExpReplace($aPartOfPath[2], "\h*[\/\\]+\h*", "\/") Else $sDirPath = StringRegExpReplace($aPartOfPath[2], "\h*[\/\\]+\h*", "\\") EndIf $aPartOfPath[2] = $sFullPathDir ; C:\Windows\System32\etc $sDirName=StringReplace($sDirPath,"\","") $sDirName=StringReplace($sDirPath,"/","") $sFileName = $aPartOfPath[3] ; hosts $aPartOfPath[5] = $sFileName ; hosts $sExtension = $aPartOfPath[4] ; .exe $aPartOfPath[7] = $sExtension ; .exe $aPartOfPath[3] = $sDirPath ; \Windows\System32\etc\ $aPartOfPath[4] = $sDirName ; etc $aPartOfPath[6] = $sFileName & $sExtension ; hosts.exe $sFileNameExt = $aPartOfPath[6] ; hosts.exe $sExt = StringReplace($sExtension,".","") ; exe $aPartOfPath[8] = $sExt ; exe Return $aPartOfPath EndFunc ;==>_PathSplitByRef  
    • hawkair
      By hawkair
      Hi
      I am trying to insert line numbers in to a string
      with this script
      Func _MyInc () Static Local $i = 0 $i += 1 Return $i EndFunc Exit _InsertLines() Func _InsertLines()     $String = "A" & @CRLF & "B" & @CRLF & "C" & @CRLF & "D" $NewString =  Execute("'" & StringRegExpReplace($String,"[\r\n]*",  "' & _MyInc () & '\1" ) & "'") MsgBox (0, "", $NewString) EndFunc but I get this:
      1A23B45C67D8
      I never really could master how Execute works here and I always get some working example and make substitutions.
      But this is the closest i could get...
       
    • AutoBert
      By AutoBert
      The idea to use translation api:

      i used the script from @mikell to build this func:
      Func _Translate($sFrom, $from, $to) ;thanks to mikell (autoitscript.com) ;https://www.autoitscript.com/forum/topic/182893-prompt-me-how-to-see-the-text-in-the-translation-boxhttpstranslategooglecom/?do=findComment&comment=1313423 Local $url = "https://translate.googleapis.com/translate_a/single?client=gtx" $url &= "&sl=" & $from & "&tl=" & $to & "&dt=t&q=" & $sFrom Local $oHTTP = ObjCreate("Microsoft.XMLHTTP") $oHTTP.Open("POST", $url, False) $oHTTP.Send() Local $sData = $oHTTP.ResponseText $sData = StringRegExpReplace($sData, '.*?\["(.*?)"[^\[]*', "$1" & @CRLF) Return $sData EndFunc ;==>_Translate when i call this func with:
      $sText='AutoIt v3 is a freeware BASIC-like scripting language designed for automating the Windows GUI and general scripting. It uses a combination of simulated keystrokes, mouse movement and window/control manipulation in order to automate tasks in a way not possible or reliable with other languages (e.g. VBScript and SendKeys). AutoIt is also very small, self-contained and will run on all versions of Windows out-of-the-box with no annoying "runtimes" required!' MsgBox(64,'',_Translate($sText,'en','de')) nearly all is seeing here:

      only the "!" is wrong "\" but when using 'auto' instead of 'en' the result is:

      2 lines are appended. So my question is, is it possible to extend the pattern (i never worked with regex) and in best case setting @extended with the detected language?
      @Trong: as you can see yet i am returning translated text and don't use GuiCtrlSetData to assign it to a EditBox.
    • ViciousXUSMC
      By ViciousXUSMC
      I was working on something last night and decided to use StringRegExpReplace() for a config file, I never noticed that you cant just "overwrite" the file with the update so easily it required a few more pieces of code to work properly.
      Is this the simplest way (what I used) and while I searched for it and did not find it do we have or will we have a RegEx equivalent for _ReplaceStringInFile()?
      $sFile = FileRead(@ScriptDir & "\test.txt") $hFile = FileOpen(@ScriptDir & "\test.txt", 2) $sNewContent = StringRegExpReplace($sFile, "(test)", "new$1") FileWrite($hFile, $sNewContent) FileClose($hFile)  
×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.