Regex reminders

Many dialects (don't care until i hit a bug),
BRE, ERE for basic versus extended. Javascript vs Perl (vs. python?, vs my software?)

Seen Intro
"https://en.wikipedia.org/wiki/Regular_expression", Regular expression - Wikipedia
"https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions",
Regular expressions - JavaScript - MDN Web Docs
and also for quick reference (liked)
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Cheatsheet

Seen practical and interactive
"https://regexr.com/", RegExr: Learn, Build, & Test RegEx
"https://regex101.com/", regex101: build, test, and debug regex

Unseen practical and interactive
"https://regexlearn.com/", Regex interactive tutorial
"https://extendsclass.com/regex-tester.html", Regex visualizer

Unseen (lately at least)
"https://docs.python.org/3/library/re.html", re — Regular expression operations — Python 3.10.2 ...
https://docs.python.org/3/howto/regex.html#regex-howto, (easier treatment)
"https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex",
Regex Class (System.Text.RegularExpressions) | Microsoft Docs
"https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference",
Regular Expression Language - Quick Reference - Microsoft ...
"https://www.w3schools.com/python/python_regex.asp", Python RegEx - W3Schools
"https://www.regular-expressions.info/", Regular-Expressions.info - Regex Tutorial, Examples and ...
"https://www.rexegg.com/regex-quickstart.html", Quick-Start: Regex Cheat Sheet - RexEgg
https://www.rexegg.com/regex-uses.html

Many dialects (don't care until i hit a bug), BRE, ERE for basic versus extended. Javascript vs Perl (vs. python?, vs my software?) Seen Intro "https://en.wikipedia.org/wiki/Regular_expression", Regular expression - Wikipedia "https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions", Regular expressions - JavaScript - MDN Web Docs and also for quick reference (liked) https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Cheatsheet Seen practical and interactive "https://regexr.com/", RegExr: Learn, Build, & Test RegEx "https://regex101.com/", regex101: build, test, and debug regex Unseen practical and interactive "https://regexlearn.com/", Regex interactive tutorial "https://extendsclass.com/regex-tester.html", Regex visualizer Unseen (lately at least) "https://docs.python.org/3/library/re.html", re — Regular expression operations — Python 3.10.2 ... https://docs.python.org/3/howto/regex.html#regex-howto, (easier treatment) "https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex", Regex Class (System.Text.RegularExpressions) | Microsoft Docs "https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference", Regular Expression Language - Quick Reference - Microsoft ... "https://www.w3schools.com/python/python_regex.asp", Python RegEx - W3Schools "https://www.regular-expressions.info/", Regular-Expressions.info - Regex Tutorial, Examples and ... "https://www.rexegg.com/regex-quickstart.html", Quick-Start: Regex Cheat Sheet - RexEgg https://www.rexegg.com/regex-uses.html

dboing

edited

candidates:
https://en.wikipedia.org/wiki/Regular_expression#Delimiters (not really)

... functionality includes lazy matching, backreferences, named capture groups, and recursive patterns.
https://en.wikipedia.org/wiki/Regular_expression#backreferences (not really)
The wiki is trying to brush an extensive pictures of all the dialects and caveats, drowning its introductory value to the impatient.... Mozilla seems more practical in its cheat-sheet layout.

https://developer.mozilla.org/en-/docs/Web/JavaScript/Guide/Regular_Expressions/Cheatsheet#groups_and_ranges
Capturing group is the syntax description keyword.

(x) Capturing group: Matches x and remembers the match.
In results, matches to capturing groups typically in an array whose members are in the same order as the left parentheses in the capturing group. ... Matches are accessed using the index of the result's elements ([1], ..., [n]) or from the predefined RegExp object's properties ($1, ..., $9).

while if wanting the grouping without the recall (memory load?):

(?:x) Non-capturing group: Matches "x" but does not remember the match

SOLVED for now

current problem. in my pdf software search and replace tool, there is limited regex and "/" expression syntax, not sure what i need is regex versus that software quirks. how in the interactive input field do I signal an expression to tag, to be reused in the replace field? there may be many tags in one of those 2 fields which allows to interesting manips. candidates: https://en.wikipedia.org/wiki/Regular_expression#Delimiters (not really) > ... functionality includes lazy matching, backreferences, named capture groups, and recursive patterns. https://en.wikipedia.org/wiki/Regular_expression#backreferences (not really) The wiki is trying to brush an extensive pictures of all the dialects and caveats, drowning its introductory value to the impatient.... Mozilla seems more practical in its cheat-sheet layout. https://developer.mozilla.org/en-/docs/Web/JavaScript/Guide/Regular_Expressions/Cheatsheet#groups_and_ranges Capturing group is the syntax description keyword. > (x) Capturing group: Matches x and remembers the match. > In results, matches to capturing groups typically in an array whose members are in the same order as the left parentheses in the capturing group. ... Matches are accessed using the index of the result's elements ([1], ..., [n]) or from the predefined RegExp object's properties ($1, ..., $9). while if wanting the grouping without the recall (memory load?): > (?:x) Non-capturing group: Matches "x" but does not remember the match SOLVED for now

dboing

http://metadataconsulting.blogspot.com/2019/06/Notepad-Control-Characters-Explained.html (npp)
has nice explanation table about

Standard ASCII Control Characters
which for FF links to.
https://en.wikipedia.org/wiki/Page_break#Form_feed
(says deprecated for modern printers however see Practical below)

https://en.wikipedia.org/wiki/Unicode_control_characters#Category_%22Cc%22_control_codes_(C0_and_C1)

Category "Cc" control codes can serve a variety of purposes, not limited to format effectors: for example, the default ASCII C0 set includes six format effectors (BS, HT, LF, VT, FF and CR), ten transmission controls, four device controls, four information separators and eight other control codes.[4] Most of these characters play no explicit role in Unicode text handling, and are used only by higher-level protocols such as those used by terminal emulators. Certain characters are commonly used for formatting or sentinel purposes: ...Unicode only specifies semantics for U+0009—U+000D, U+001C—U+001F, and U+0085 (the ASCII format effectors except for BS, plus the ASCII information separators and the C1 NEL)

https://en.wikipedia.org/wiki/Unicode_control_characters#Unicode_introduced_separators

Unicode introduces its own newline characters to separate either lines or paragraphs: U+2028 LINE SEPARATOR (abbreviated LS or LSEP) and U+2029 PARAGRAPH SEPARATOR (abbreviated PS or PSEP). Like CR and LF, LS and PS are effectors for text formatting; unlike CR and LF, they are not treated as "control codes" for ....

https://tex.stackexchange.com/questions/209103/why-using-character-ff-before-every-chapter-in-the-texbook-source
(one answering gives a historical outlook around the new page notions in text files)

https://community.notepad-plus-plus.org/topic/18006/pagebreaks-in-notepad
(npp, discussion, about making it work as intended, however see below for better)

https://www.asciihex.com/character/control/12/0x0C/ff-form-feed (in ascii itself, the web page, meh!)

googling "ff symbol in text file" yields interesting stuff. not regex. but not first time i chose wrong thread title... an old pagebreak ascii code. possibly only linux now. (notepad++ = npp, not ms-notepad, i use PN = programmers notepad, very fast) http://metadataconsulting.blogspot.com/2019/06/Notepad-Control-Characters-Explained.html (npp) has nice explanation table about > Standard ASCII Control Characters which for FF links to. > https://en.wikipedia.org/wiki/Page_break#Form_feed (says deprecated for modern printers however see Practical below) https://en.wikipedia.org/wiki/Unicode_control_characters#Category_%22Cc%22_control_codes_(C0_and_C1) >Category "Cc" control codes can serve a variety of purposes, not limited to format effectors: for example, the default ASCII C0 set includes six format effectors (BS, HT, LF, VT, FF and CR), ten transmission controls, four device controls, four information separators and eight other control codes.[4] Most of these characters play no explicit role in Unicode text handling, and are used only by higher-level protocols such as those used by terminal emulators. Certain characters are commonly used for formatting or sentinel purposes: ...Unicode only specifies semantics for U+0009—U+000D, U+001C—U+001F, and U+0085 (the ASCII format effectors except for BS, plus the ASCII information separators and the C1 NEL) https://en.wikipedia.org/wiki/Unicode_control_characters#Unicode_introduced_separators >Unicode introduces its own newline characters to separate either lines or paragraphs: U+2028 LINE SEPARATOR (abbreviated LS or LSEP) and U+2029 PARAGRAPH SEPARATOR (abbreviated PS or PSEP). Like CR and LF, LS and PS are effectors for text formatting; unlike CR and LF, they are not treated as "control codes" for .... https://tex.stackexchange.com/questions/209103/why-using-character-ff-before-every-chapter-in-the-texbook-source (one answering gives a historical outlook around the new page notions in text files) https://community.notepad-plus-plus.org/topic/18006/pagebreaks-in-notepad (npp, discussion, about making it work as intended, however see below for better) https://www.asciihex.com/character/control/12/0x0C/ff-form-feed (in ascii itself, the web page, meh!)

dboing

Practical: (kind of SOLVING)
https://www.tenforums.com/software-apps/163601-page-break-regular-text-files-notepad.html
from last post in that forum (dated 2020)

** update: Having tested it with Microsoft Print to PDF (only), it would appear that "form feeds" are disregarded completely in text files when printing from Notepad, whereas printing from Notepad++ instead of page breaks you get the double FF character. Unless as you say, you open the text file in Wordpad or Word. Opening a txt file in Word for example, into which you had already previously introduced the FF characters, the page breaks both appear and print as you want them, but trying to re-save the txt file again, you get the warning that all formatting will be lost, whereas doing the same in Wordpad, you can't see where you inserted the page breaks, but the breaks are there and they print as you would want them, and they get preserved when you resave the txt file.

https://npp-user-manual.org/docs/searching/
(software more evolved than mine, but same flavour of text file editor).
(using same kind of search/replace patternning in user input fields, based on regex).

\f -- The FF control character 0x0C (form feed).

https://stackoverflow.com/questions/27104226/remove-symbols-from-text-in-notepad

You can use Find&Replace with RegEx mode. "FF" symbol is ASCII character 12 (you can see it in Notepad++'s ASCII table), so you can match it in a RegEx with \x0C (0C is 12 in hexadecimal). To remove it, search "\x0C" and replace it with "" (nothing).
To replace it with a line break, replace it with "\r\n" on Windows ("\n" on Linux).

Practical: (kind of SOLVING) https://www.tenforums.com/software-apps/163601-page-break-regular-text-files-notepad.html from last post in that forum (dated 2020) > ** update: Having tested it with Microsoft Print to PDF (only), it would appear that "form feeds" are disregarded completely in text files when printing from Notepad, whereas printing from Notepad++ instead of page breaks you get the double FF character. Unless as you say, you open the text file in Wordpad or Word. Opening a txt file in Word for example, into which you had already previously introduced the FF characters, the page breaks both appear and print as you want them, but trying to re-save the txt file again, you get the warning that all formatting will be lost, whereas doing the same in Wordpad, you can't see where you inserted the page breaks, but the breaks are there and they print as you would want them, and they get preserved when you resave the txt file. https://npp-user-manual.org/docs/searching/ (software more evolved than mine, but same flavour of text file editor). (using same kind of search/replace patternning in user input fields, based on regex). > \f -- The FF control character 0x0C (form feed). https://stackoverflow.com/questions/27104226/remove-symbols-from-text-in-notepad >You can use Find&Replace with RegEx mode. "FF" symbol is ASCII character 12 (you can see it in Notepad++'s ASCII table), so you can match it in a RegEx with \x0C (0C is 12 in hexadecimal). To remove it, search "\x0C" and replace it with "" (nothing). To replace it with a line break, replace it with "\r\n" on Windows ("\n" on Linux).

This topic has been archived and can no longer be replied to.