Jump to content
Sign in to follow this  
davidlee

StringRegExp cause crash exit

Recommended Posts

davidlee

ques.txt file:

 

<div class="quesinfobox"><div><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td>题号:1230540,题型:选择题,难度:较易</td><td><a class="start_yellow"></a><a class="start_yellow"></a><a class="start_gray"></a><a class="start_gray"></a><a class="start_gray"></a></td></tr></tbody></table></div><div>标题/来源:<span class="questitle">2011-2012学年湖北省蕲春县刘河中学七年级上学期期中考试历史试题(带解析)</span>,日期:2012/8/6</div><div class="quesmenu"><a id="quesselect1230540" class="addques" quesid="1230540" guid="22a518c1-d60a-4d4a-9480-71d28e3ffc37" childnum="1" questitle="2011-2012学年湖北省蕲春县刘河中学七年级上学期期中考试历史试题(带解析)" categories="初中历史综合库》中国古代史》国家的产生和社会的变革》夏、商、西周的兴亡###初中历史北师大版》七年级上》第二单元 国家的产生和社会变革》第5课 夏商西周的更迭###初中历史人教版》七年级上》第二单元 国家的产生和社会的变革》第4课 夏、商、西周的兴亡" qyid="2" qyname="选择题" qyisselect="true" qdid="2" qdname="较易"></a><a class="addques2"></a><a id="fav1230540" class="fav" title="收藏试题"></a><a id="comment1230540" class="comment" title="评价试题"></a></div></div><div class="quesdiv" id="quesdiv1230540" oncopy="return false;" style="-moz-user-select:none;"><div class="quesbody"><div>【题文】阅读下列材料:(6分)<br/>材料一:如图,《荀子》记载西周初年71国中姬姓诸侯国比例<br/><img src="中国古代史选择599image005[2].png" style="vertical-align:middle;"><br/>材料二:西周初年主要诸侯国<br/><table border="1" cellpadding="0" cellspacing="0"><tbody><tr style="height:.9pt"><td style="width:50.4pt;border:solid windowtext 1.0pt; padding:0cm 5.4pt 0cm 5.4pt;height:.9pt" valign="top" width="67">诸侯国<br/></td><td style="width:81.0pt;border:solid windowtext 1.0pt; border-left:none;padding:0cm 5.4pt 0cm 5.4pt;height:.9pt" valign="top" width="108">类别<br/></td><td style="width:72.0pt;border:solid windowtext 1.0pt; border-left:none;padding:0cm 5.4pt 0cm 5.4pt;height:.9pt" valign="top" width="96">地理位置(今)<br/></td><td style="width:40.9pt;border:solid windowtext 1.0pt; border-left:none;padding:0cm 5.4pt 0cm 5.4pt;height:.9pt" valign="top" width="55">贫富<br/></td></tr><tr style="height:.6pt"><td style="width:50.4pt;border:solid windowtext 1.0pt; border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="67">晋<br/></td><td style="width:81.0pt;border-top:none;border-left: none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="108">同姓<br/></td><td style="width:72.0pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="96">今山西<br/></td><td style="width:40.9pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="55">较富<br/></td></tr><tr style="height:.6pt"><td style="width:50.4pt;border:solid windowtext 1.0pt; border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="67">卫<br/></td><td style="width:81.0pt;border-top:none;border-left: none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="108">同姓<br/></td><td style="width:72.0pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="96">今河南<br/></td><td style="width:40.9pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="55">较富<br/></td></tr><tr style="height:.6pt"><td style="width:50.4pt;border:solid windowtext 1.0pt; border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="67">鲁<br/></td><td style="width:81.0pt;border-top:none;border-left: none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="108">同姓<br/></td><td style="width:72.0pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="96">今山东北<br/></td><td style="width:40.9pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="55">富裕<br/></td></tr><tr style="height:.6pt"><td style="width:50.4pt;border:solid windowtext 1.0pt; border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="67">齐<br/></td><td style="width:81.0pt;border-top:none;border-left: none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="108">功臣<br/></td><td style="width:72.0pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="96">今山东南<br/></td><td style="width:40.9pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="55">富裕<br/></td></tr><tr style="height:1.2pt"><td style="width:50.4pt;border:solid windowtext 1.0pt; border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:1.2pt" valign="top" width="67">宋<br/></td><td style="width:81.0pt;border-top:none;border-left: none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0cm 5.4pt 0cm 5.4pt;height:1.2pt" valign="top" width="108">商代后裔<br/></td><td style="width:72.0pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0cm 5.4pt 0cm 5.4pt;height:1.2pt" valign="top" width="96">今河南<br/></td><td style="width:40.9pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0cm 5.4pt 0cm 5.4pt;height:1.2pt" valign="top" width="55">贫瘠<br/></td></tr><tr style="height:.6pt"><td style="width:50.4pt;border:solid windowtext 1.0pt; border-top:none;padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="67">燕<br/></td><td style="width:81.0pt;border-top:none;border-left: none;border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="108">同姓<br/></td><td style="width:72.0pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="96">今北京<br/></td><td style="width:40.9pt;border-top:none;border-left:none; border-bottom:solid windowtext 1.0pt;border-right:solid windowtext 1.0pt; padding:0cm 5.4pt 0cm 5.4pt;height:.6pt" valign="top" width="55">贫瘠<br/></td></tr></tbody></table> <br/>结合材料和所学知识回答:<br/>(1)西周初年实行分封制,被分封做诸侯的主要是哪些人?(3分)<br/>(2)从材料一看,诸侯国的构成有何特点?材料二看,同姓诸侯国多半分布在什么位置?(2分)<br/>(3)西周的分封制巩固了国家统一,促进了各地经济文化交流,但是到了春秋战国时期,诸侯国之间出现了怎样的局面?(说出春秋战国最明显特征即可)(1分)</div></div><div class="quesanswer"><div><font color="#ff0000">【答案】</font>(1)主要有:子弟、亲戚和功臣。<br/>(2)同姓诸侯国较多(或同姓诸侯国比异性诸侯国多)从地理来看,同姓诸侯国封地多在富庶之地,异性诸侯国封地集中在边远地区。<br/>(3)局面:出现长期争霸战争、兼并战争(或战乱频繁)</div></div><div class="quesparse"><div><font color="#ff0000">【解析】</font>本
题考查的是西周的分封制。西周初年实行分封制,被分封做诸侯的主要是同姓子弟、亲戚和功臣。从材料一看,诸侯国的构成的特点是同姓诸侯国较多(或同姓诸侯
国比异性诸侯国多)从地理来看,同姓诸侯国封地多在富庶之地,异性诸侯国封地集中在边远地区。西周的分封制巩固了国家统一,促进了各地经济文化交流,但是
到了春秋战国时期,诸侯国之间出现了长期争霸战争、兼并战争的局面。</div></div></div>

 

and use this code to read and regexp the text:

$qasource=fileread('ques.txt')

$m="<div[^>]*>(?:(?:<div[^>]*/>)|(?:(?!</?div).)|(?R))*</div>"
$a=StringRegExp($qasource,$m,2,929)
If not @error Then ...

And it return error:

!>23:14:38 AutoIt3.exe ended.rc:-1073741571
>Exit code: -1073741571 Time: 24.337

And If replace all <br/> to <br>, then StringRegExp can run normally.

Share this post


Link to post
Share on other sites
jchd

Smells similar to ticket #2274 to me.

Anyway I strongly advise against regexp for extracting stuff from html pages, particularly from large/complex pages. Favor _IE functions.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
DXRW4E

not RegExp bugs but limitations of windows, need to do attention with the pattern (because they create thousands and thousands of pointer, endless, I believe that Windows puts a limit ect ect)

example, solves everything and is very much faster

;;$m="<div[^>]*>(?:(?:<div[^>]*/>)|(?:(?!\f).)|(?R))*\f"
;;in
$qasource = StringRegExpReplace($qasource, "\Q</div>\E", Chr(10))
$m="<div[^>]*>([^\f]*)\f"

Ciao.

Edited by DXRW4E

apps-odrive.pngdrive_app_badge.png box-logo.png new_logo.png MEGA_Logo.png

Share this post


Link to post
Share on other sites
davidlee

I want to collect the div code,and my regexp pattern can due for most of html codes.

I think, StringRegExp.... Function should seterror nor only simple exit abnormally.

BTW:

How to use _IE function collect div code? Can you give me example code?

Share this post


Link to post
Share on other sites
jchd

This kind of patterns where there are many elements with * repeater tend to explode the PCRE stack space, due to more than extreme backtracking. There is nothing AutoIt can do against such misuse.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
davidlee

How to catch these exit error in the code?:

>Exit code: -1073741571 

I want to use code to catch the error and remove the text which will cased stringregexp error.

Share this post


Link to post
Share on other sites
trancexx

This kind of patterns where there are many elements with * repeater tend to explode the PCRE stack space, due to more than extreme backtracking. There is nothing AutoIt can do against such misuse.

Actually, AutoIt can avoid crashes in situations like this. PCRE can use stack or can use heap, depending on flags.

AutoIt should never crash internally, no matter what.


♡♡♡

.

eMyvnE

Share this post


Link to post
Share on other sites
jchd

I know PCRE can be compiled to use heap but only at the price of a very significant slowdown in all routine operations, which regular users may be reluctant to pay just to catch over-the-bound use cases.

Nonetheless I tend to say that it's really best to avoid extreme input/pattern obviously leading to failure. You know in advance that what you're going to process is bulky and prone to -- in this case -- awful backtracking. At any rate, using IE functions in this context is both more reliable and easier to code and maintain, so why insist on borderline methods?

That the failure results in @error or process crash is essentially the same. If the application developper has a B-plan to handle memory or resource shortage already coded in the application, he should use that in the first place. If there is no B-plan, the app won't succeed anyway.

But yes, I agree that this is not the best implementation decision, but it's also very easy to abuse components.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×