Jump to content

[SOLVED] Regexp Expert: How to exctract number


Recommended Posts

Hi RegExp expert :)

I need to extract numbers from some sentences that seems to have no exact pattern. I think using regexp will solve my problem.

Example:

This sentence:

Cotton +- 400, 600, 1000, 3000, Sugar +- 800, 1800, 2700,Move Car +- 5000, SIX, Top: 0813.7771.2114 (TS)

I need to extract this number: 081377712114 (remove the dot between them)

This sentence: Ls.Nevergone M.Silky Expensive car less nop, AXD,is.begin.72m @ 1.6km,APP: 7211122, 2000051

I need to extract this number: 7211122 and 2000051

This sentence: Si Cheaper Good No.1(more Gd Papaya)Call ads less bulky Lt 479mUUTnomorethan0890-111219

I need to extract this number: 0890-111219

This sentence: ASX Top.Bulk Exclsv Ultimo dbl hik paper TL BD, B:599m2 (10x20x3). OOP: 0815141151(P)

I need to extract this number: 0815141151

This sentence: ATP Tirte Golf 100-300m; SX price market less; 46530181 Jen0811001918 Koles33119221

I need to extract this number: 46530181 and 0811001918 and 33119221

The rules are:

1. Take numbers that begin with 08 and at least has 9 digits

2. If the number not begin with 08 then take it only if it has at least 7 digits

How to do it in AutoIT? Try few days and found no solution :idiot:

Thanks in advance ;)

Edited by michaelslamet
Link to comment
Share on other sites

Your rules are not enough to work in all the cases above.

Removing of dots will have to be done separately.

The closest we can get is: (?m)(08[\d\.]{9,}|[-\d]{7,})

Applied to the text

Cotton +- 400, 600, 1000, 3000, Sugar +- 800, 1800, 2700,Move Car +- 5000, SIX, Top: 0813.7771.2114 (TS)
Ls.Nevergone M.Silky Expensive car less nop, AXD,is.begin.72m @ 1.6km,APP: 7211122, 2000051
Si Cheaper Good No.1(more Gd Papaya)Call ads less bulky Lt 479mUUTnomorethan0890-111219
ASX Top.Bulk Exclsv Ultimo dbl hik paper TL BD, B:599m2 (10x20x3). OOP: 0815141151(P)
ATP Tirte Golf 100-300m; SX price market less; 46530181 Jen0811001918 Koles33119221

gives

0813.7771.2114
7211122
2000051
0890-111219
0815141151
100-300
46530181
0811001918
33119221

The problem is with the unwanted capture of 100-300 in the last input line. Perhaps could you be a little more specific about the allowed charset and/ornumber of chars between dashes.

Another way to process input could be to first remove dots and dashes, so the number of digits will be fully significant.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Wow, thanks a lot jchd!! You're hero!! ;)

This is all I need, I will work to enhanced it. Your Regexp is working most of the time, I only need to modify it a little to suit my need.

Many thanks again, jchd :)

Your rules are not enough to work in all the cases above.

Removing of dots will have to be done separately.

The closest we can get is: (?m)(08[\d\.]{9,}|[-\d]{7,})

Applied to the text

Cotton +- 400, 600, 1000, 3000, Sugar +- 800, 1800, 2700,Move Car +- 5000, SIX, Top: 0813.7771.2114 (TS)
Ls.Nevergone M.Silky Expensive car less nop, AXD,is.begin.72m @ 1.6km,APP: 7211122, 2000051
Si Cheaper Good No.1(more Gd Papaya)Call ads less bulky Lt 479mUUTnomorethan0890-111219
ASX Top.Bulk Exclsv Ultimo dbl hik paper TL BD, B:599m2 (10x20x3). OOP: 0815141151(P)
ATP Tirte Golf 100-300m; SX price market less; 46530181 Jen0811001918 Koles33119221

gives

0813.7771.2114
7211122
2000051
0890-111219
0815141151
100-300
46530181
0811001918
33119221

The problem is with the unwanted capture of 100-300 in the last input line. Perhaps could you be a little more specific about the allowed charset and/ornumber of chars between dashes.

Another way to process input could be to first remove dots and dashes, so the number of digits will be fully significant.

Link to comment
Share on other sites

Try removing dots and hyphen first:

$SimpleText = StringRegExpReplace($text, '\.-', '')

then apply a simpler regexp to grab your pretty numbers

$nums = StringRegExp($SimpleText, '(?m)(08\d{9,}|\d{7,})', 3)

Should work better.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

I format the result as your advice and it look great :)

But I found there is situation where this RegExp give a wrong result.

This text: Computer Easy Clean Closed Always love car 50he &amx; Less 700hi 0808 1191 7722

I should get 0808 1191 7722 as a result, but the RegExp give nothing as result.

Can you please modify the RegExp to capture this situation?

Thanks a lot ;)

Try removing dots and hyphen first:

$SimpleText = StringRegExpReplace($text, '\.-', '')

then apply a simpler regexp to grab your pretty numbers

$nums = StringRegExp($SimpleText, '(?m)(08\d{9,}|\d{7,})', 3)

Should work better.

Link to comment
Share on other sites

Hi,

try this:

#include<Array.au3>
Global $str = 'Cotton +- 400, 600, 1000, 3000, Sugar +- 800, 1800, 2700,Move Car +- 5000, SIX, Top: 0813.7771.2114 (TS)' & @CRLF & _
        'This sentence: Ls.Nevergone M.Silky Expensive car less nop, AXD,is.begin.72m @ 1.6km,APP: 7211122, 2000051' & @CRLF & _
        'This sentence: Si Cheaper Good No.1(more Gd Papaya)Call ads less bulky Lt 479mUUTnomorethan0890-111219' & @CRLF & _
        'This sentence: ASX Top.Bulk Exclsv Ultimo dbl hik paper TL BD, B:599m2 (10x20x3). OOP: 0815141151(P)' & @CRLF & _
        'This sentence: ATP Tirte Golf 100-300m; SX price market less; 46530181 Jen0811001918 Koles33119221' & @CRLF & _
        'This text: Computer Easy Clean Closed Always love car 50he &amx; Less 700hi 0808 1191 7722'

;~ 'I need to extract this number: 081377712114 (remove the dot between them)' & @CRLF
;~ 'I need to extract this number: 7211122 and 2000051' & @CRLF
;~ 'I need to extract this number: 0890-111219' & @CRLF
;~ 'I need to extract this number: 0815141151' & @CRLF
;~ 'I need to extract this number: 46530181 and 0811001918 and 33119221'
;~ I should get 0808 1191 7722 as a result, but the RegExp give nothing as result

;~ The rules are:
;~ 1. Take numbers that begin with 08 and at least has 9 digits
;~ 2. If the number not begin with 08 then take it only if it has at least 7 digits
$nstr = StringRegExpReplace($str, '[\.-\h]+', '')
ConsoleWrite($nstr & @CRLF)
$re = StringRegExp($nstr, '08\d{7,}|\d{7,}', 3)
_ArrayDisplay($re)

Mega

Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Link to comment
Share on other sites

Hi michaelslamet

Xenobiologist made it for you, this case was easy to deal with.

You see, to build a working RegExp, you have first to determine exactly what your rules are. You need to do so very precisely, since forgetting one case and later changing what seems a marginal part of a condition can demand a complete rewrite of the patern.

One very powerful way to test RegExp is using the RegExp toolkit offered by GeoSoft: locate a post of him and look in his signature. Have fun!

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Mega and jchd,

Thanks a lot for your help. The code running well and the problem solved! :)

I combine RegExp from jchd and Mega and it's work great.

Thanks a lot for you both!! ;)

Hi michaelslamet

Xenobiologist made it for you, this case was easy to deal with.

You see, to build a working RegExp, you have first to determine exactly what your rules are. You need to do so very precisely, since forgetting one case and later changing what seems a marginal part of a condition can demand a complete rewrite of the patern.

One very powerful way to test RegExp is using the RegExp toolkit offered by GeoSoft: locate a post of him and look in his signature. Have fun!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...