What is Regex (Regular Expression)?

Have you ever heard of Regular Expression or Regex? It's a structure that is found in almost all modern programming languages, with the same syntax, and allows for a quick and flexible identification of a character string, which usually consists of letters and is defined according to specified rules.

AA

In this article, we will provide detailed explanations of the Regex expressions that are frequently used in terminal commands, PHP, JS, SQL, Python, R, WordPress, Google Analytics, Atom text and code processing. We will also cover the purpose of using regular expressions and the working principle behind them.

Regular Expression

Regular Expression (Regex or Regexp) is a structure that is found in almost all modern programming languages, with the same syntax1 and allows for a quick and flexible identification of a character string2, which usually consists of letters and is defined according to specified rules.

Purpose of Using Regular Expressions

  • Extracting the necessary information from a large data set,
  • Checking the input provided by the user,
  • Formatting the data to be suitable for the intended use.

Working Principle

During the process, a pattern definition is made for the searched string (assignment). After the process, the expected result is to return the matches of this string. With the help of special characters, this structure enables fast and effective operations such as search and replace. Below are examples of the special characters and functions used for defining matching and replacement. Let's start with a practical example for Visual Studio Code. Open any file and press Ctrl+F (Cmd+F) to display the Find and Replace field. Then, paste the following expression into the Find section: ^.*$, and our expression to be used in the Replace section is "$0",. When we select the Use Regular Expression (.*) option and apply it to our file, we can see that all the lines are enclosed in quotation marks. Let's continue with the definitions and examine the scope of the expressions.

Regex Special Character Definitions

In addition to the explanations of the functions of special characters below, I will also provide examples. regex1013 and regexr4 sites will be useful for the usage of these examples.

For example, to access the passwords stored in the /root/.digitalocean_password document in DigitalOcean server operations, we can follow this practical method:

# example 1
cat /root/.digitalocean* | grep '.*_mysql_pass'

# example 2
cat /root/.digitalocean_password | grep -E '^(username|password)' | awk '{print $NF}'

Escaped Characters

“.“

The . (period) character represents any character except for a newline at the end of a page or paragraph. For example, the expression "k.re" will match "kure", "kare", "kore", and "kere".

Regex
“$“

The $ character represents the end of the string or line in the matched expression. Spaces and special objects at the beginning of a paragraph are ignored. For example, the expression iner$ will match "diner" and "iner", but not "inert". This way, it is possible to find and replace paragraph endings using regular expressions.

Regex
“^“

The ^ (caret) character, on the other hand, represents the beginning of the string or line in the matched expression. If the term is only at the beginning of a paragraph, this expression will find the matching term. For example, the expression ^Sabahleyin will match "Sabahleyin" in the sentence "Sabahleyin kahvaltı yaptım", but not in the sentence "Bugün sabahleyin hava çok güzeldi". If no boundary is specified, the match will continue for phrases like "Sabahleyin, sabahleyin...".

Regex

When the ^ (caret) character is used inside square brackets [], it indicates that the character or group following it should not be present. For example, the expression [^abc] matches any character that is not a, b, or c.

“*“

The * (asterisk) character, when placed after a character or a group, matches 0 or more occurrences of that character or group. For example, the expression .* matches any number of characters, while the expression a*t matches "t", "tt", and "at".

Regex
“[ ]“

Square brackets [] match any one of the characters enclosed within them. For example, the expression S[ai]z matches "Saz" and "Siz".

Regex
“[c1-c2]“

The hyphen - character is used to specify a range of characters to match. For example, the expression [0-9] matches any digit. Another example is the expression [A-Za-z], which matches any uppercase or lowercase letter.

Regex
“[^c1-c2]“

The caret ^ character, when used immediately after the opening square bracket [, indicates that the expression should match any character not in the specified range. For example, the expression [^123a-z] matches any character except for the numbers 1, 2, 3 and all lowercase letters.

Regex
“( )“

Parentheses () are used to group parts of the expression and to capture the matched patterns.

Up to 9 patterns can be stored with corresponding references in the expression using the \1, \2, etc. syntax. For example, if the text contains the dates 1988, 1980, 1999, 1898, and 1919, searching for the regular expression (8)9\1 will match "898". The parentheses capture the "8", which is referenced later as \1, and the following "9", which is matched and referenced as \1 in the second part of the expression.

Regex 6

Parentheses () can be used to group terms together. For example, the regular expression a(bc)?d will match both "ad" and "abcd". The term (bc) is optional because it is enclosed in parentheses and followed by the ? character, which matches 0 or 1 occurrences of the preceding term.

Regex
“|“

The vertical bar | character is used to match one of two or more expressions. It acts as an OR operator, matching either of the expressions on its left or right side. For example, the expressionk(a|u)le matches both "kale" and "kule". The (a|u) part of the expression specifies that the letter "a" or "u" can be matched in that position.

Regex
“+“

The + (plus) character matches one or more occurrences of the preceding term. For example, the expression z+ matches "z", "zz", "zzz", and so on.

Regex
“?“

The ? (question mark) character matches zero or one occurrences of the preceding term. For example, the expression colou?r matches both "color" and "colour".

Regex
“{ }“

The {} (curly braces) syntax is used to match a specified number of occurrences of the preceding term. For example, the expression a[0-5]{2} matches any string starting with the letter "a" followed by exactly two digits between 0 and 5, such as "a12", "a24", or "a14".

Regex
“{i,j}“

The {i,j} syntax is used to match a specified range of occurrences of the preceding term. For example, the expression [0-9]{4,6} matches any string consisting of digits, with a length between 4 and 6 inclusive. This would match strings like "1234", "56789", or "987654". The {i,j} syntax allows for a minimum of i and a maximum of j occurrences of the preceding term to be matched.

Regex ornek 0

Character Classes

“\d“

Represents all digit characters. It has the same meaning as [0-9].

“\D“

Represents all non-digit characters. It has the same meaning as [^0-9].

“\w“

Represents word characters. This includes all digits (0-9), letters (a-z, A-Z), and the underscore character (_). It has the same meaning as [a-zA-Z0-9_].

“\W“

It represents all characters that are not Word characters. These are characters such as special characters and whitespace characters. It is equivalent to [^a-zA-Z0-9_].

“\s“

[ \t\n\r\f] matches any whitespace character, including space, tab, newline, carriage return, and form feed characters.

“\S“

[^ \t\n\r\f] represents any character that is not a whitespace character, including space, tab, newline, carriage return, and form feed.

It may seem confusing at first in terms of usage, but after a little practice, it can be seen that it is actually quite easy and functional to use. It has a wide range of practical applications and can be used in many different types of operations. I will continue to provide explanations and examples of its usage in various contexts and topics.

Conclusion

In conclusion, Regular Expression (Regex) is a powerful tool that can help simplify and speed up many programming tasks. It enables developers to extract necessary information from large datasets, check user input, and format data to be suitable for the intended use. With the help of special characters and functions, this structure enables fast and effective operations such as search and replace. We hope this article has provided you with a better understanding of Regex and its applications in various programming languages and tools.