Jump to content

Recommended Posts

hi guys i have  a script  i want catch from html page some descriptions  this is a html code

<p data-mce-style="margin-top: 5.0pt; margin-right: 1.5pt; margin-bottom: 5.0pt; margin-left: 37.55pt; mso-pagination: widow-orphan;" style="margin: 5pt 1.5pt 5pt 37.55pt; text-align: center;"><br>
</p>
<p style="text-align: center;">
<span style="font-size: 16px; font-family: Arial;"><strong>Caratteristiche:</strong></span><br>
<br>
<span style="font-size: 16px; font-family: Arial;"> Contenuto del
pacco:</span><br>
<br>
<span style="font-size: 16px; font-family: Arial;">PC empire , usato perfettamente funzionante con  windows 10 anche se ha coa windows 7 originale 
con lettore di schede di memoria mmc e altri formati con 8 porte usb 2.00 , con processore i5 -2400 e 8gb di ram
500 gb di hardisk e lettore masterizzatore dvd scheda lan on board e scheda video e audio onboard 
windows 10 64 bit appena installato , perfettamente funzionante &nbsp;</span><br>
<br>
<span style="font-size: 16px; font-family: Arial;"> 
l'acquisto e' fatto come visto e piaciuto
le immagini descrivono i prodotti e le condizioni dei tali
le descrizioni dei prodotti sono puramente indicative. 
Per ottenere una descrizione accurata si consiglia di visitare il sito del produttore.
non rispondiamo di eventuali differenze riscontrabili tra le caratteristiche indicate nelle descrizioni e le immagini</span><br>
<br>
<span style="font-size: 16px; font-family: Arial;"> 


</span><br>
<br>
<span style="font-size: 16px; font-family: Arial;"><span style="color: rgb(255, 51, 51);"> </span><br>
</span> </p>
<p style="text-align: center;"><span style="font-size: 16px; font-family: Arial;"><br>
</span> </p>
<p style="text-align: center;"><span style="font-size: 16px; font-family: Arial;"><br>
</span> </p>

i want  take only PC empire .....  bla  bla bla   to  descrizioni e le immagini

i use this regexp     <span style="font-size: 16px; font-family: Arial;">((?s).*)</span><br>

but not  exclude <span style="font-size: 16px; font-family: Arial;">    and  this   </span><br>

i use for test  regexbuddy  , someone can help me?  thankz

 

#include <MsgBoxConstants.au3>
#include <StringConstants.au3>
#include <Array.au3>


Local $sFilePath = "path html page"


Local $hFileOpen = FileOpen($sFilePath, $FO_READ)
If $hFileOpen = -1 Then
    MsgBox($MB_SYSTEMMODAL, "", "An error occurred when reading the file.")
    Return False
EndIf

; Read the contents of the file using the handle returned by FileOpen.
Local $sFileRead = FileRead($hFileOpen)

Local $aLDescriz = StringRegExp($sFileRead, '<span style="font-size: 16px; font-family: Arial;">((?s).*)</span><br>', 3)
ConsoleWrite($aLDescriz[0])
_ArrayDisplay($aLDescriz, "1106")

 

Link to post
Share on other sites

Not sure if I understood correctly, but from what I gathered you wanted a regex that would return the following, correct?

Quote

PC empire , usato perfettamente funzionante con  windows 10 anche se ha coa windows 7 originale 
con lettore di schede di memoria mmc e altri formati con 8 porte usb 2.00 , con processore i5 -2400 e 8gb di ram
500 gb di hardisk e lettore masterizzatore dvd scheda lan on board e scheda video e audio onboard 
windows 10 64 bit appena installato , perfettamente funzionante &nbsp;</span><br>
<br>
<span style="font-size: 16px; font-family: Arial;"> 
l'acquisto e' fatto come visto e piaciuto
le immagini descrivono i prodotti e le condizioni dei tali
le descrizioni dei prodotti sono puramente indicative. 
Per ottenere una descrizione accurata si consiglia di visitare il sito del produttore.
non rispondiamo di eventuali differenze riscontrabili tra le caratteristiche indicate nelle descrizioni e le immagini

If that's so, and provided that 'Contenuto del pacco:' string will be on every page you're using the regex for, you can try this:

https://regex101.com/r/JA2SXj/1/

If not, just let me know. I'll look more into it tomorrow. It all depends on the consistency of the inputs.

 

Edited by Seminko
grammar
Link to post
Share on other sites
6 hours ago, faustf said:

how you can do this  magic ?

By using regular expression power.  Learn regex and this power is yours as well, and is free.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...