michaelslamet Posted February 14, 2011 Share Posted February 14, 2011 (edited) Hi RegExp expert I need to extract numbers from some sentences that seems to have no exact pattern. I think using regexp will solve my problem. Example: This sentence: Cotton +- 400, 600, 1000, 3000, Sugar +- 800, 1800, 2700,Move Car +- 5000, SIX, Top: 0813.7771.2114 (TS) I need to extract this number: 081377712114 (remove the dot between them) This sentence: Ls.Nevergone M.Silky Expensive car less nop, AXD,is.begin.72m @ 1.6km,APP: 7211122, 2000051 I need to extract this number: 7211122 and 2000051 This sentence: Si Cheaper Good No.1(more Gd Papaya)Call ads less bulky Lt 479mUUTnomorethan0890-111219 I need to extract this number: 0890-111219 This sentence: ASX Top.Bulk Exclsv Ultimo dbl hik paper TL BD, B:599m2 (10x20x3). OOP: 0815141151(P) I need to extract this number: 0815141151 This sentence: ATP Tirte Golf 100-300m; SX price market less; 46530181 Jen0811001918 Koles33119221 I need to extract this number: 46530181 and 0811001918 and 33119221 The rules are: 1. Take numbers that begin with 08 and at least has 9 digits 2. If the number not begin with 08 then take it only if it has at least 7 digits How to do it in AutoIT? Try few days and found no solution Thanks in advance Edited February 17, 2011 by michaelslamet Link to comment Share on other sites More sharing options...
jchd Posted February 14, 2011 Share Posted February 14, 2011 Your rules are not enough to work in all the cases above. Removing of dots will have to be done separately. The closest we can get is: (?m)(08[\d\.]{9,}|[-\d]{7,}) Applied to the text Cotton +- 400, 600, 1000, 3000, Sugar +- 800, 1800, 2700,Move Car +- 5000, SIX, Top: 0813.7771.2114 (TS) Ls.Nevergone M.Silky Expensive car less nop, AXD,is.begin.72m @ 1.6km,APP: 7211122, 2000051 Si Cheaper Good No.1(more Gd Papaya)Call ads less bulky Lt 479mUUTnomorethan0890-111219 ASX Top.Bulk Exclsv Ultimo dbl hik paper TL BD, B:599m2 (10x20x3). OOP: 0815141151(P) ATP Tirte Golf 100-300m; SX price market less; 46530181 Jen0811001918 Koles33119221 gives 0813.7771.2114 7211122 2000051 0890-111219 0815141151 100-300 46530181 0811001918 33119221 The problem is with the unwanted capture of 100-300 in the last input line. Perhaps could you be a little more specific about the allowed charset and/ornumber of chars between dashes. Another way to process input could be to first remove dots and dashes, so the number of digits will be fully significant. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
michaelslamet Posted February 14, 2011 Author Share Posted February 14, 2011 Wow, thanks a lot jchd!! You're hero!! This is all I need, I will work to enhanced it. Your Regexp is working most of the time, I only need to modify it a little to suit my need. Many thanks again, jchd Your rules are not enough to work in all the cases above. Removing of dots will have to be done separately. The closest we can get is: (?m)(08[\d\.]{9,}|[-\d]{7,}) Applied to the text Cotton +- 400, 600, 1000, 3000, Sugar +- 800, 1800, 2700,Move Car +- 5000, SIX, Top: 0813.7771.2114 (TS) Ls.Nevergone M.Silky Expensive car less nop, AXD,is.begin.72m @ 1.6km,APP: 7211122, 2000051 Si Cheaper Good No.1(more Gd Papaya)Call ads less bulky Lt 479mUUTnomorethan0890-111219 ASX Top.Bulk Exclsv Ultimo dbl hik paper TL BD, B:599m2 (10x20x3). OOP: 0815141151(P) ATP Tirte Golf 100-300m; SX price market less; 46530181 Jen0811001918 Koles33119221 gives 0813.7771.2114 7211122 2000051 0890-111219 0815141151 100-300 46530181 0811001918 33119221 The problem is with the unwanted capture of 100-300 in the last input line. Perhaps could you be a little more specific about the allowed charset and/ornumber of chars between dashes. Another way to process input could be to first remove dots and dashes, so the number of digits will be fully significant. Link to comment Share on other sites More sharing options...
jchd Posted February 14, 2011 Share Posted February 14, 2011 Try removing dots and hyphen first: $SimpleText = StringRegExpReplace($text, '\.-', '') then apply a simpler regexp to grab your pretty numbers $nums = StringRegExp($SimpleText, '(?m)(08\d{9,}|\d{7,})', 3) Should work better. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
michaelslamet Posted February 16, 2011 Author Share Posted February 16, 2011 I format the result as your advice and it look great But I found there is situation where this RegExp give a wrong result.This text: Computer Easy Clean Closed Always love car 50he &amx; Less 700hi 0808 1191 7722I should get 0808 1191 7722 as a result, but the RegExp give nothing as result.Can you please modify the RegExp to capture this situation?Thanks a lot Try removing dots and hyphen first:$SimpleText = StringRegExpReplace($text, '\.-', '')then apply a simpler regexp to grab your pretty numbers$nums = StringRegExp($SimpleText, '(?m)(08\d{9,}|\d{7,})', 3)Should work better. Link to comment Share on other sites More sharing options...
Xenobiologist Posted February 16, 2011 Share Posted February 16, 2011 Hi, try this: #include<Array.au3> Global $str = 'Cotton +- 400, 600, 1000, 3000, Sugar +- 800, 1800, 2700,Move Car +- 5000, SIX, Top: 0813.7771.2114 (TS)' & @CRLF & _ 'This sentence: Ls.Nevergone M.Silky Expensive car less nop, AXD,is.begin.72m @ 1.6km,APP: 7211122, 2000051' & @CRLF & _ 'This sentence: Si Cheaper Good No.1(more Gd Papaya)Call ads less bulky Lt 479mUUTnomorethan0890-111219' & @CRLF & _ 'This sentence: ASX Top.Bulk Exclsv Ultimo dbl hik paper TL BD, B:599m2 (10x20x3). OOP: 0815141151(P)' & @CRLF & _ 'This sentence: ATP Tirte Golf 100-300m; SX price market less; 46530181 Jen0811001918 Koles33119221' & @CRLF & _ 'This text: Computer Easy Clean Closed Always love car 50he &amx; Less 700hi 0808 1191 7722' ;~ 'I need to extract this number: 081377712114 (remove the dot between them)' & @CRLF ;~ 'I need to extract this number: 7211122 and 2000051' & @CRLF ;~ 'I need to extract this number: 0890-111219' & @CRLF ;~ 'I need to extract this number: 0815141151' & @CRLF ;~ 'I need to extract this number: 46530181 and 0811001918 and 33119221' ;~ I should get 0808 1191 7722 as a result, but the RegExp give nothing as result ;~ The rules are: ;~ 1. Take numbers that begin with 08 and at least has 9 digits ;~ 2. If the number not begin with 08 then take it only if it has at least 7 digits $nstr = StringRegExpReplace($str, '[\.-\h]+', '') ConsoleWrite($nstr & @CRLF) $re = StringRegExp($nstr, '08\d{7,}|\d{7,}', 3) _ArrayDisplay($re) Mega Scripts & functions Organize Includes Let Scite organize the include files Yahtzee The game "Yahtzee" (Kniffel, DiceLion) LoginWrapper Secure scripts by adding a query (authentication) _RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...) Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc. MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times Link to comment Share on other sites More sharing options...
jchd Posted February 16, 2011 Share Posted February 16, 2011 Hi michaelslamet Xenobiologist made it for you, this case was easy to deal with. You see, to build a working RegExp, you have first to determine exactly what your rules are. You need to do so very precisely, since forgetting one case and later changing what seems a marginal part of a condition can demand a complete rewrite of the patern. One very powerful way to test RegExp is using the RegExp toolkit offered by GeoSoft: locate a post of him and look in his signature. Have fun! This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
michaelslamet Posted February 17, 2011 Author Share Posted February 17, 2011 Mega and jchd,Thanks a lot for your help. The code running well and the problem solved! I combine RegExp from jchd and Mega and it's work great.Thanks a lot for you both!! Hi michaelslametXenobiologist made it for you, this case was easy to deal with.You see, to build a working RegExp, you have first to determine exactly what your rules are. You need to do so very precisely, since forgetting one case and later changing what seems a marginal part of a condition can demand a complete rewrite of the patern.One very powerful way to test RegExp is using the RegExp toolkit offered by GeoSoft: locate a post of him and look in his signature. Have fun! Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now