Jump to content

problem using StringRegExp


Recommended Posts

Hello

It's very frustrating for me not knowing how to use the StringRegExp function !!!!!!!!!!!!!!!!

I am doing many test, but inconclusive...

I have this text

SIRET : 390 019 891 00014
Effectif : 2
Voie : Les Bouchats
Code postal : 71370
Ville : Saint Etienne En Bresse
Téléphone : +33 3 85 96 40 00 
Activité en clair : Exploitation De Biens Agricole 
Dénomination : Acle
Forme juridique : SARL (Société à Responsabilité Limitée)

SIRET : 448 026 153 00024
Voie : Le Gros Chigy
Code postal : 71220
Ville : Saint Andre Le Desert
Téléphone : +33 3 85 59 48 71 
Activité en clair : Platrerie Peinture Pose Revetement De Sols Et Murs Plafonds Divers Travaux De Renovation 
Dénomination : Galimi Samuel
Forme juridique : EI (Entreprise Individuelle)

SIRET : 489 314 963 00013
Effectif : 5
Voie : 30 Rue Bernard Renault
Code postal : 71400
Ville : Autun
Téléphone : +33 3 85 52 45 93 
Activité en clair : Platrerie Peinture Pour Le Batiment Travaux D'Isolation Ravalement De Facades Revetements Sols Et Murs Negoce De Produits Lies Au Batiment 
Dénomination : Garcia
Forme juridique : SARL (Société à Responsabilité Limitée)

SIRET : 340 904 416 00013
Effectif : 2
Voie : 40 Rue De Creteuil Le Bas
Code postal : 71150
Ville : Chaudenay
Téléphone : +33 3 85 87 32 04 
Activité en clair : Peinture Pose Revetem. Sols Murs Pose Plaq De Platre Nettoyage Vehic. Locaux Tissus Stores Tendues Demoussage & Protection De Surface Transp Meubles 
Dénomination : Garnaud Daniel
Forme juridique : EI (Entreprise Individuelle)

And I want read each paragraph (between SIRET and the last line of each paragraph (Forme juridique), and in each paragraph read the line "Code postal and test the number (71150, 71300, ...) with a list of numbers

If the number is not in the list, i want delte the paragraph, and to do this, i must use the function StringRegExp and StringRegExpReplace but I don't know how to proceed :(

Qui ose gagneWho Dares Win[left]CyberExploit[/left]

Link to comment
Share on other sites

I could give you the answer, but that wouldn't be much of a learning experience for you. If you are having trouble specifically with formatting the patterns, here is an excellent tool (free) that helps me create regex patterns. I hope this helps you out. It really helped me out.

Edited by dantay9
Link to comment
Share on other sites

You are going to want to split things up a bit to make things easier. I agree with dantay9, learning it is much better.

That being said, debugging more than one thing at a time is a pain, so I made a framework for you to test in.

Global $REGULAR_Expression = ''
;This is just formatting so I can test with it. I used CRLF because that's what your data appeared
;to have, it's possible that the website had something to do with it
Local $LogText = "SIRET : 390 019 891 00014" & @CRLF & _ 
"Effectif : 2" & @CRLF & _ 
"Voie : Les Bouchats" & @CRLF & _ 
"Code postal : 71370" & @CRLF & _ 
"Ville : Saint Etienne En Bresse" & @CRLF & _ 
"Téléphone : +33 3 85 96 40 00 " & @CRLF & _ 
"Activité en clair : Exploitation De Biens Agricole " & @CRLF & _ 
"Dénomination : Acle" & @CRLF & _ 
"Forme juridique : SARL (Société à Responsabilité Limitée)" & @CRLF & _ 
@CRLF & _ 
"SIRET : 448 026 153 00024" & @CRLF & _ 
"Voie : Le Gros Chigy" & @CRLF & _ 
"Code postal : 71220" & @CRLF & _ 
"Ville : Saint Andre Le Desert" & @CRLF & _ 
"Téléphone : +33 3 85 59 48 71 " & @CRLF & _ 
"Activité en clair : Platrerie Peinture Pose Revetement De Sols Et Murs Plafonds Divers Travaux De Renovation " & @CRLF & _ 
"Dénomination : Galimi Samuel" & @CRLF & _ 
"Forme juridique : EI (Entreprise Individuelle)" & @CRLF & _ 
@CRLF & _ 
"SIRET : 489 314 963 00013" & @CRLF & _ 
"Effectif : 5" & @CRLF & _ 
"Voie : 30 Rue Bernard Renault" & @CRLF & _ 
"Code postal : 71400" & @CRLF & _ 
"Ville : Autun" & @CRLF & _ 
"Téléphone : +33 3 85 52 45 93 " & @CRLF & _ 
"Activité en clair : Platrerie Peinture Pour Le Batiment Travaux D'Isolation Ravalement De Facades Revetements Sols Et Murs Negoce De Produits Lies Au Batiment " & @CRLF & _ 
"Dénomination : Garcia" & @CRLF & _ 
"Forme juridique : SARL (Société à Responsabilité Limitée)" & @CRLF & _ 
@CRLF & _ 
"SIRET : 340 904 416 00013" & @CRLF & _ 
"Effectif : 2" & @CRLF & _ 
"Voie : 40 Rue De Creteuil Le Bas" & @CRLF & _ 
"Code postal : 71150" & @CRLF & _ 
"Ville : Chaudenay" & @CRLF & _ 
"Téléphone : +33 3 85 87 32 04 " & @CRLF & _ 
"Activité en clair : Peinture Pose Revetem. Sols Murs Pose Plaq De Platre Nettoyage Vehic. Locaux Tissus Stores Tendues Demoussage & Protection De Surface Transp Meubles " & @CRLF & _ 
"Dénomination : Garnaud Daniel" & @CRLF & _ 
"Forme juridique : EI (Entreprise Individuelle)"

MsgBox (0, "Results:", ParseText ($LogText))

Func ParseText ($text)
    Local $ZipArray[2] = [71150, 71400]
    Local $LogArray = StringSplit ($text, @CRLF & @CRLF, 1)
    
    For $index = 1 to $LogArray[0]
        Local $zipCode = StringRegExp ($LogArray[$index], $REGULAR_Expression, 1)
        If Not InList ($zipCode[0], $ZipArray) Then
            $LogArray[$index] = ""
        EndIf
    Next
    
    Local $ReturnString = ""
    For $index = 1 to $LogArray[0]
        If $LogArray[$index] then $ReturnString &= $LogArray[$index] & @CRLF & @CRLF
    Next
    
    Return $ReturnString
EndFunc

Func InList ($zip_code, $compare_array)
    For $code in $compare_array
        If $code = $zip_code Then Return True
    Next
    Return False
EndFunc

Basically it splits things into paragraphs so you only have to work on one at a time.

Good luck :(

Edited by Fulano

#fgpkerw4kcmnq2mns1ax7ilndopen (Q, $0); while ($l = <Q>){if ($l =~ m/^#.*/){$l =~ tr/a-z1-9#/Huh, Junketeer's Alternate Pro Ace /; print $l;}}close (Q);[code] tag ninja!

Link to comment
Share on other sites

Salut compatriote,

Try to go with something like:

$str = FileRead("boites.txt")
;; beware, there are still html character codes inside, like &amp;

Local $ofs, $res

While 1
    $res = StringRegExp($str, "(?is)SIRET : .*?Code postal : (\d{5}).*?juridique.*?\r\n", 2, $ofs)
    If @error Then ExitLoop
    $ofs = @extended
    If CPbonPourTraitement($res[1]) Then OnEnvoitLeSpam($res[0])
WEnd

Is that clear?

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

And another example.

Local $LogText = "SIRET : 390 019 891 00014" & @CRLF & _
        "Effectif : 2" & @CRLF & _
        "Voie : Les Bouchats" & @CRLF & _
        "Code postal : 71370" & @CRLF & _
        "Ville : Saint Etienne En Bresse" & @CRLF & _
        "Téléphone : +33 3 85 96 40 00 " & @CRLF & _
        "Activité en clair : Exploitation De Biens Agricole " & @CRLF & _
        "Dénomination : Acle" & @CRLF & _
        "Forme juridique : SARL (Société à Responsabilité Limitée)" & @CRLF & _
        @CRLF & _
        "SIRET : 448 026 153 00024" & @CRLF & _
        "Voie : Le Gros Chigy" & @CRLF & _
        "Code postal : 71220" & @CRLF & _
        "Ville : Saint Andre Le Desert" & @CRLF & _
        "Téléphone : +33 3 85 59 48 71 " & @CRLF & _
        "Activité en clair : Platrerie Peinture Pose Revetement De Sols Et Murs Plafonds Divers Travaux De Renovation " & @CRLF & _
        "Dénomination : Galimi Samuel" & @CRLF & _
        "Forme juridique : EI (Entreprise Individuelle)" & @CRLF & _
        @CRLF & _
        "SIRET : 489 314 963 00013" & @CRLF & _
        "Effectif : 5" & @CRLF & _
        "Voie : 30 Rue Bernard Renault" & @CRLF & _
        "Code postal : 71400" & @CRLF & _
        "Ville : Autun" & @CRLF & _
        "Téléphone : +33 3 85 52 45 93 " & @CRLF & _
        "Activité en clair : Platrerie Peinture Pour Le Batiment Travaux D'Isolation Ravalement De Facades Revetements Sols Et Murs Negoce De Produits Lies Au Batiment " & @CRLF & _
        "Dénomination : Garcia" & @CRLF & _
        "Forme juridique : SARL (Société à Responsabilité Limitée)" & @CRLF & _
        @CRLF & _
        "SIRET : 340 904 416 00013" & @CRLF & _
        "Effectif : 2" & @CRLF & _
        "Voie : 40 Rue De Creteuil Le Bas" & @CRLF & _
        "Code postal : 71150" & @CRLF & _
        "Ville : Chaudenay" & @CRLF & _
        "Téléphone : +33 3 85 87 32 04 " & @CRLF & _
        "Activité en clair : Peinture Pose Revetem. Sols Murs Pose Plaq De Platre Nettoyage Vehic. Locaux Tissus Stores Tendues Demoussage & Protection De Surface Transp Meubles " & @CRLF & _
        "Dénomination : Garnaud Daniel" & @CRLF & _
        "Forme juridique : EI (Entreprise Individuelle)"

;Or
;Local $LogText = FileRead("LogText.txt")

Local $sRetString, $aPara, $sCheckREPattern
Local $sCheckNums = "71150, 71300, 71220"

$sCheckREPattern = StringReplace(StringStripWS($sCheckNums, 8), ",", "|")

$aPara = StringSplit($LogText, @CRLF & @CRLF, 1)
For $i = 1 To $aPara[0]
    If StringRegExp($aPara[$i], "(?i)code\h*postal\h*:\h*" & $sCheckREPattern) Then $sRetString &= $aPara[$i] & @CRLF & @CRLF
Next

ConsoleWrite($sRetString & @CRLF)
Edited by Malkey
Link to comment
Share on other sites

Hi, Malkey.

Your code will raise false positives if ever it matches a wanted ZIP code inside the last part of Siret number, or if the phone number is formated differently. Quite still possible with low "region code" (first 2 digits of Zipcode, from 01..95 and 97..98) and high "establishment" number (last 5 digits of SIRET #), who knows?

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

jchd

With reference to my example script @ Post #5 :-

Within the For-Next loop, I have changed the RE pattern in the StringRegExp() from

$sCheckREPattern

to

"(?i)code\h*postal\h*:\h*" & $sCheckREPattern

This specifically targets the line "Code postal : nnnnn" in each paragraph. So now false positive matches should not occur from matching the numeric postcode only from other locations in the paragraph.

Malkey

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...