Jump to content

Conditional Use of _StringBetween Command


Recommended Posts

Hello Guys,

I build a autoit script to save source codes using give URL, then use _StringBetween Command to extract my desired content from saved source code's text file. I done everything upto my desires but I want to use _StringBetween command on conditional basis like text the extract from source codes having some kind for HTML parameters and URLs like <<img alt="image" src="https://en.wiki***.org/*****.jpg">><br><br><b> and similar other parameters like <div></div><p></p>, etc...

I want to only copy text file source file after removing all the URL and other html based parameters but don't want to remove <b> these type of parameters because I want to save text with its original like breaks so <b> will help me to keep it in text to use to give breaks in my desired text...

waiting for kind response...Thanks in advance...

Sample text from _Stringbetween is looking like this;

<img alt="image" src="//imghost.com/id_rk3FyZ.jpg"><br><br><b>Philosophy of Education and Its Importance</b><br><br>Behind every school and every teacher is a set of related
beliefs--a philosophy of education--that influences what and how students are taught.<br><br>A philosophy of education represents answers to questions about the purpose of schooling, a teacher's role, and what should be taught and by what methods.<br> <br><span style="color: red;"><b>Note</b></span></b></span><br><br>As an academic field, philosophy of education is "the philosophical study of education and its problems...its central subject matter is education, and its methods are those of philosophy". "The philosophy of education may be either the philosophy of the process of education or the philosophy of the discipline of education.<br><br><a class="ajaxLink"href="https://en.wikipe***.org/wiki/Philosophy_of_edu****" rel="nofollow">Homepage</a></div>

 

Codes to to generate specified data from source code's file is given hereunder;

;Starting fetching source codes from a given URL (URL placed in excel sheet at cell "C1")
$IE = _IECreate(_Excel_RangeRead ($oWorkbook1, Default, "C1"), 0, 0 )
$source = _IEDocReadHTML($IE)
FileWrite($file, $source)
; Extracting text from saved source code file and save it in a separate file "text01.txt"
$target_source = _StringBetween($source, 'desc">', '<h2>')
If Not @error Then
FileWrite (@scriptdir & "\text01.txt", $target_source[0])

 

Link to comment
Share on other sites

Could something like this be correct ?

$txt = '<img alt="image" src="//imghost.com/id_rk3FyZ.jpg"><br><br><b>Philosophy of Education and Its Importance</b><br><br>Behind every school and every teacher is a set of related beliefs--a philosophy of education--that influences what and how students are taught.<br><br>A philosophy of education represents answers to questions about the purpose of schooling, a teacher''s role, and what should be taught and by what methods.<br> <br><span style="color: red;"><b>Note</b></span></b></span><br><br>As an academic field, philosophy of education is "the philosophical study of education and its problems...its central subject matter is education, and its methods are those of philosophy". "The philosophy of education may be either the philosophy of the process of education or the philosophy of the discipline of education.<br><br><a class="ajaxLink"href="https://en.wikipe***.org/wiki/Philosophy_of_edu****" rel="nofollow">Homepage</a></div>'

$txt = StringReplace($txt, "<br>", @crlf)
$txt = StringRegExpReplace($txt, '(?s)<a.*?</a>|<.*?>', "")
Msgbox(0,"", $txt)

 

Link to comment
Share on other sites

9 hours ago, mikell said:

Could something like this be correct ?

$txt = '<img alt="image" src="//imghost.com/id_rk3FyZ.jpg"><br><br><b>Philosophy of Education and Its Importance</b><br><br>Behind every school and every teacher is a set of related beliefs--a philosophy of education--that influences what and how students are taught.<br><br>A philosophy of education represents answers to questions about the purpose of schooling, a teacher''s role, and what should be taught and by what methods.<br> <br><span style="color: red;"><b>Note</b></span></b></span><br><br>As an academic field, philosophy of education is "the philosophical study of education and its problems...its central subject matter is education, and its methods are those of philosophy". "The philosophy of education may be either the philosophy of the process of education or the philosophy of the discipline of education.<br><br><a class="ajaxLink"href="https://en.wikipe***.org/wiki/Philosophy_of_edu****" rel="nofollow">Homepage</a></div>'

$txt = StringReplace($txt, "<br>", @crlf)
$txt = StringRegExpReplace($txt, '(?s)<a.*?</a>|<.*?>', "")
Msgbox(0,"", $txt)

 

Thanks mate it's working super fine.

But It show results as text in text box but when i tried to save it using Filewrite command it'll only show text with lot of TAB based spaces...

1 more thing i want to remove everything from my text like URLs, <b>,<div>,<img> tags except <br> because whenever i used this saved text through source codes it'll not showing where is line ends and it isn't.

So please guide how can it possible to remove all html codes and URLs and only keep <br> tag where ever it is placed in my text source code file + plus removal of TAB based spaces from our saved text....

Saved Text file with extra spaces saved by our command is attached herewith for ready reference;

 

html.txt

Link to comment
Share on other sites

40 minutes ago, AutoBert said:

The solution of mikell work's fine:

56dc08b3c4ef0_30_test.txt-Editor.jpg.ec9  

 

Yeah, I know that codes shared by mikell are working fine but my questions is that end results saved in the form of text contains several TAB Spaces and it's only plain text saved in text file I'm looking for text that should be on HTML format only having <br> tags so new line can easily b traced through HTML compiler when it used through HTML source.

Please read my above post carefully, in this way, I think you can understand whole scenario and my point of view.

Link to comment
Share on other sites

And where's the difficult to replace back the @CrLf to <br>?

$txt = '<img alt="image" src="//imghost.com/id_rk3FyZ.jpg"><br><br><b>Philosophy of Education and Its Importance</b><br><br>Behind every school and every teacher is a set of related beliefs--a philosophy of education--that influences what and how students are taught.<br><br>A philosophy of education represents answers to questions about the purpose of schooling, a teacher''s role, and what should be taught and by what methods.<br> <br><span style="color: red;"><b>Note</b></span></b></span><br><br>As an academic field, philosophy of education is "the philosophical study of education and its problems...its central subject matter is education, and its methods are those of philosophy". "The philosophy of education may be either the philosophy of the process of education or the philosophy of the discipline of education.<br><br><a class="ajaxLink"href="https://en.wikipe***.org/wiki/Philosophy_of_edu****" rel="nofollow">Homepage</a></div>'

$txt = StringReplace($txt, "<br>", @crlf)
$txt = StringRegExpReplace($txt, '(?s)<a.*?</a>|<.*?>', "")
$txt = StringReplace($txt, @crlf, "<br>")
Msgbox(0,"replaced back", $txt)

 

Link to comment
Share on other sites

2 hours ago, AutoBert said:

And where's the difficult to replace back the @CrLf to <br>?

$txt = '<img alt="image" src="//imghost.com/id_rk3FyZ.jpg"><br><br><b>Philosophy of Education and Its Importance</b><br><br>Behind every school and every teacher is a set of related beliefs--a philosophy of education--that influences what and how students are taught.<br><br>A philosophy of education represents answers to questions about the purpose of schooling, a teacher''s role, and what should be taught and by what methods.<br> <br><span style="color: red;"><b>Note</b></span></b></span><br><br>As an academic field, philosophy of education is "the philosophical study of education and its problems...its central subject matter is education, and its methods are those of philosophy". "The philosophy of education may be either the philosophy of the process of education or the philosophy of the discipline of education.<br><br><a class="ajaxLink"href="https://en.wikipe***.org/wiki/Philosophy_of_edu****" rel="nofollow">Homepage</a></div>'

$txt = StringReplace($txt, "<br>", @crlf)
$txt = StringRegExpReplace($txt, '(?s)<a.*?</a>|<.*?>', "")
$txt = StringReplace($txt, @crlf, "<br>")
Msgbox(0,"replaced back", $txt)

Everything working fine...Thank you so much everyone...

 

Link to comment
Share on other sites

This (removing links and all tags except <br>) could be done using a single regex  :)

$txt = '<img alt="image" src="//imghost.com/id_rk3FyZ.jpg"><br><br><b>Philosophy of Education and Its Importance</b><br><br>Behind every school and every teacher is a set of related beliefs--a philosophy of education--that influences what and how students are taught.<br><br>A philosophy of education represents answers to questions about the purpose of schooling, a teacher''s role, and what should be taught and by what methods.<br> <br><span style="color: red;"><b>Note</b></span></b></span><br><br>As an academic field, philosophy of education is "the philosophical study of education and its problems...its central subject matter is education, and its methods are those of philosophy". "The philosophy of education may be either the philosophy of the process of education or the philosophy of the discipline of education.<br><br><a class="ajaxLink"href="https://en.wikipe***.org/wiki/Philosophy_of_edu****" rel="nofollow">Homepage</a></div>'

$txt = StringRegExpReplace($txt, '(?s)(<br>)|<a.*?</a>|<.*?>', "$1")
Msgbox(0,"", $txt)

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...