regex 正则

发表于 2022-09-30 更新于 2024-11-01 分类于 develop 本文字数： 2.1k 阅读时长 ≈ 2 分钟

resource

regex101: build, test, and debug regular expressions.

`?<=` 及 `?=`

?<= : 匹配以字符串开始，捕获 (存储)

?=: 匹配以字符串结束，捕获 (存储)

正则表达式：

(?<=(href=")).{1,200}(?=(">)): 捕获以 href=" 开头的字符串最长达到 200 个字符且以 "> 结尾的字符串

解释：

(?<=(href=")): 匹配以 href=" 开头的字符串，并且捕获 (存储) 到分组中
(?=(">)) : 匹配以 "> 结尾的字符串，并且捕获 (存储) 到分组中

示例如下：

V8kWG2NQwsSoPX4.

?<=(?:) : 匹配以字符串开始，不捕获 (存储)

?=(?:): 匹配以字符串结束，不捕获 (存储)

正则表达式：

(?<=(?:href=")).{1,200}(?=(?:">))

解释：

(?<=(?:href=")): 匹配以 href=" 开头的字符串，并且不捕获 (不存储) 到分组中
(?=(?:">)): 匹配以 "> 结尾的字符串，并且不捕获 (不存储) 到分组中

参考教程:

https://www.cnblogs.com/whaozl/p/5462865.html

`^` 和 `$`

在正则表达式中，^ 表示字符串的开头，$ 表示字符串的结尾。

例如，要匹配以 hello 开头的字符串，可以使用正则表达式 ^hello。要匹配以 world 结尾的字符串，可以使用正则表达式 world$。

在使用正则表达式时，还需要注意转义字符的使用。例如，如果要匹配以. 开头的字符串，可以使用正则表达式 \.，其中 \ 是转义字符，用于将. 转义为普通字符。

下面是一些示例正则表达式，用于匹配字符串的开头和结尾：

^hello：匹配以 hello 开头的字符串。
world$：匹配以 world 结尾的字符串。
^\d+：匹配以数字开头的字符串。
^[a-zA-Z]+：匹配以字母开头的字符串。
\.txt$：匹配以.txt 结尾的字符串。

`(?<name>pattern)`

命名捕获分组用于将捕获的内容存储到一个变量中，具体访问遵循特定的语言标准。

(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2}): 将捕获的内容存储到变量 year、month 和 day 中

在 Python 中，可以使用 groupdict () 方法来获取一个字典，其中键是捕获组的名称，值是捕获组的值。

import re

pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
match = re.match(pattern, '2022-01-01')

if match:
    groups = match.groupdict()
    print(groups['year'])  # 输出: 2022
    print(groups['month'])  # 输出: 01
    print(groups['day'])  # 输出: 01

在 C 中，标准库的正则表达式库 <regex> 不直接支持命名捕获组的语法。但可以使用第三方库，如 Boost.Regex，来实现命名捕获组的功能。以下是使用 Boost.Regex在 C 中实现命名捕获组的示例：

#include <iostream>
#include <boost/regex.hpp>

int main() {
    std::string input = "2022-01-01";
    boost::regex pattern("(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})");
    boost::smatch match;

    if (boost::regex_match(input, match, pattern)) {
        std::cout << match["year"] << std::endl;   // 输出: 2022
        std::cout << match["month"] << std::endl;  // 输出: 01
        std::cout << match["day"] << std::endl;    // 输出: 01
    }

    return 0;
}

resource

?<= 及 ?=

^ 和 $

(?<name>pattern)

`?<=` 及 `?=`

`^` 和 `$`

`(?<name>pattern)`