介绍
grep
命令是 Linux 终端环境中最有用的命令之一。 名称 grep
代表“全局正则表达式打印”。 这意味着您可以使用 grep
来检查它接收到的输入是否与指定的模式匹配。 这个看似微不足道的程序,却异常强大; 它基于复杂规则对输入进行排序的能力使其成为许多命令链中的流行链接。
在本教程中,您将探索 grep
命令的选项,然后您将深入使用正则表达式进行更高级的搜索。
先决条件
要遵循本指南,您需要访问运行基于 Linux 的操作系统的计算机。 这可以是您使用 SSH 连接到的虚拟专用服务器,也可以是您的本地计算机。 请注意,本教程使用运行 Ubuntu 20.04 的 Linux 服务器进行了验证,但给出的示例应该适用于运行任何版本的任何 Linux 发行版的计算机。
如果您打算使用远程服务器来遵循本指南,我们建议您先完成我们的 初始服务器设置指南 。 这样做将为您设置一个安全的服务器环境——包括一个具有 sudo
权限的非 root 用户和一个配置了 UFW 的防火墙——您可以使用它来培养您的 Linux 技能。
作为替代方案,我们鼓励您使用嵌入在此页面上的交互式终端来试验本教程中的示例命令。 单击以下 Launch an Interactive Terminal! 按钮以打开终端窗口并开始使用 Linux (Ubuntu) 环境。
启动交互式终端!
基本用法
在本教程中,您将使用 grep
在 GNU 通用公共许可证版本 3 中搜索各种单词和短语。
如果您使用的是 Ubuntu 系统,则可以在 /usr/share/common-licenses
文件夹中找到该文件。 将其复制到您的主目录:
cp /usr/share/common-licenses/GPL-3 .
如果您在另一个系统上,请使用 curl
命令下载副本:
curl -o GPL-3 https://www.gnu.org/licenses/gpl-3.0.txt
您还将在本教程中使用 BSD 许可证文件。 在 Linux 上,您可以使用以下命令将其复制到您的主目录:
cp /usr/share/common-licenses/BSD .
如果您在另一个系统上,请使用以下命令创建文件:
cat << 'EOF' > BSD Copyright (c) The Regents of the University of California. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the University nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. EOF
现在您有了这些文件,您可以开始使用 grep
。
在最基本的形式中,您使用 grep
来匹配文本文件中的文字模式。 这意味着如果您传递 grep
一个要搜索的单词,它将打印出文件中包含该单词的每一行。
执行以下命令以使用 grep
搜索包含单词 GNU
的每一行:
grep "GNU" GPL-3
第一个参数 GNU
是您要搜索的模式,而第二个参数 GPL-3
是您要搜索的输入文件。
结果输出将是包含模式文本的每一行:
Output GNU GENERAL PUBLIC LICENSE The GNU General Public License is a free, copyleft license for the GNU General Public License is intended to guarantee your freedom to GNU General Public License for most of our software; it applies also to Developers that use the GNU GPL protect your rights with two steps: "This License" refers to version 3 of the GNU General Public License. 13. Use with the GNU Affero General Public License. under version 3 of the GNU Affero General Public License into a single ... ...
在某些系统上,您搜索的模式将在输出中突出显示。
常用选项
默认情况下,grep
将在输入文件中搜索确切的指定模式并返回它找到的行。 您可以通过向 grep
添加一些可选标志来使此行为更有用。
如果您希望 grep
忽略搜索参数的“大小写”并同时搜索大小写变体,您可以指定 -i
或 --ignore-case
选项。
使用以下命令在与以前相同的文件中搜索单词 license
的每个实例(包括大写、小写或混合大小写):
grep -i "license" GPL-3
结果包含:LICENSE
、license
和 License
:
Output GNU GENERAL PUBLIC LICENSE of this license document, but changing it is not allowed. The GNU General Public License is a free, copyleft license for The licenses for most software and other practical works are designed the GNU General Public License is intended to guarantee your freedom to GNU General Public License for most of our software; it applies also to price. Our General Public Licenses are designed to make sure that you (1) assert copyright on the software, and (2) offer you this License "This License" refers to version 3 of the GNU General Public License. "The Program" refers to any copyrightable work licensed under this ... ...
如果有一个带有 LiCeNsE
的实例,它也会被返回。
如果要查找 不 包含指定模式的所有行,可以使用 -v
或 --invert-match
选项。
使用以下命令搜索 BSD 许可证中不包含单词 the
的每一行:
grep -v "the" BSD
您将收到以下输出:
OutputAll rights reserved. Redistribution and use in source and binary forms, with or without are met: may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE ... ...
由于您没有指定“忽略大小写”选项,因此最后两项返回没有单词 the
。
知道匹配发生的行号通常很有用。 您可以使用 -n
或 --line-number
选项来执行此操作。 添加此标志重新运行前面的示例:
grep -vn "the" BSD
这将返回以下文本:
Output2:All rights reserved. 3: 4:Redistribution and use in source and binary forms, with or without 6:are met: 13: may be used to endorse or promote products derived from this software 14: without specific prior written permission. 15: 16:THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 17:ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE ... ...
现在,如果要更改不包含 the
的每一行,则可以引用行号。 这在使用源代码时特别方便。
常用表达
在介绍中,您了解到 grep
代表“全局正则表达式打印”。 “正则表达式”是描述特定搜索模式的文本字符串。
不同的应用程序和编程语言实现正则表达式略有不同。 在本教程中,您将只探索 grep
描述其模式的方式的一小部分。
文字匹配
在本教程前面的示例中,当您搜索单词 GNU
和 the
时,您实际上是在搜索与字符串 GNU
和the
。 精确指定要匹配的字符的模式称为“文字”,因为它们逐字匹配模式。
将这些视为匹配字符串而不是匹配单词会很有帮助。 随着您学习更复杂的模式,这将成为更重要的区别。
除非被其他表达式机制修改,否则所有字母和数字字符(以及某些其他字符)都按字面匹配。
锚点比赛
锚点是特殊字符,它指定匹配必须在行中的哪个位置才有效。
例如,使用锚点,您可以指定您只想知道在行的开头匹配 GNU
的行。 为此,您可以在文字字符串之前使用 ^
锚点。
运行以下命令搜索 GPL-3
文件并找到 GNU
出现在行首的行:
grep "^GNU" GPL-3
此命令将返回以下两行:
OutputGNU General Public License for most of our software; it applies also to GNU General Public License, you may choose any version ever published
类似地,您在模式末尾使用 $
锚点来指示匹配仅在它出现在一行的最末尾时才有效。
此命令将匹配 GPL-3
文件中以单词 and
结尾的每一行:
grep "and$" GPL-3
您将收到以下输出:
Outputthat there is no warranty for this free software. For both users' and The precise terms and conditions for copying, distribution and License. Each licensee is addressed as "you". "Licensees" and receive it, in any medium, provided that you conspicuously and alternative is allowed only occasionally and noncommercially, and network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and provisionally, unless and until the copyright holder explicitly and receives a license from the original licensors, to run, modify and make, use, sell, offer for sale, import and otherwise run, modify and
匹配任何字符
正则表达式中使用句点字符 (.) 表示任何单个字符都可以存在于指定位置。
例如,要匹配 GPL-3
文件中包含两个字符和字符串 cept
的任何内容,您可以使用以下模式:
grep "..cept" GPL-3
此命令返回以下输出:
Outputuse, which is precisely where it is most unacceptable. Therefore, we infringement under applicable copyright law, except executing it on a tells the user that there is no warranty for the work (except to the License by making exceptions from one or more of its conditions. form of a separately written license, or stated as exceptions; You may not propagate or modify a covered work except as expressly 9. Acceptance Not Required for Having Copies. ... ...
此输出包含 accept
和 except
的实例以及这两个单词的变体。 如果也找到该模式,该模式也将匹配 z2cept
。
括号表达式
通过将一组字符放在括号内(\[
和 \]
),您可以指定该位置的字符可以是括号组中的任何一个字符。
例如,要查找包含 too
或 two
的行,您可以使用以下模式简洁地指定这些变体:
grep "t[wo]o" GPL-3
输出显示文件中存在两种变体:
Outputyour programs, too. freedoms that you received. You must make sure that they, too, receive Developers that use the GNU GPL protect your rights with two steps: a computer network, with no transfer of a copy, is not conveying. System Libraries, or general-purpose tools or generally available free Corresponding Source from a network server at no charge. ... ...
括号符号为您提供了一些有趣的选项。 您可以通过在括号内的字符列表以 ^
字符开头来使模式匹配 除了 括号内的任何字符。
此示例类似于模式 .ode
,但不会匹配模式 code
:
grep "[^c]ode" GPL-3
这是您将收到的输出:
Output 1. Source Code. model, to give anyone who possesses the object code either (1) a the only significant mode of use of the product. notice like this when it starts in an interactive mode:
请注意,在返回的第二行中,实际上存在单词 code
。 这不是正则表达式或 grep 的失败。 更确切地说,返回此行是因为在该行的前面,在单词 model
中找到的模式 mode
被发现。 由于存在与模式匹配的实例,因此返回了该行。
方括号的另一个有用功能是您可以指定一个字符范围,而不是单独键入每个可用字符。
这意味着如果要查找以大写字母开头的每一行,可以使用以下模式:
grep "^[A-Z]" GPL-3
这是此表达式返回的输出:
OutputGNU General Public License for most of our software; it applies also to States should not allow patents to restrict development and use of License. Each licensee is addressed as "you". "Licensees" and Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an System Libraries, or general-purpose tools or generally available free Source. User Product is transferred to the recipient in perpetuity or for a ... ...
由于一些遗留排序问题,使用 POSIX 字符类而不是像您刚刚使用的字符范围通常更准确。
讨论每个 POSIX 字符类超出了本指南的范围,但是一个可以完成与前一个示例相同过程的示例在括号选择器中使用 \[:upper:\]
字符类:
grep "^[[:upper:]]" GPL-3
输出将与以前相同。
重复模式零次或多次
最后,最常用的元字符之一是星号,或 *
,意思是“重复前一个字符或表达式零次或多次”。
要查找 GPL-3
文件中包含左括号和右括号的每一行,中间只有字母和单个空格,请使用以下表达式:
grep "([A-Za-z ]*)" GPL-3
您将获得以下输出:
Output Copyright (C) 2007 Free Software Foundation, Inc. distribution (with or without modification), making available to the than the work as a whole, that (a) is included in the normal form of Component, and (b) serves only to enable use of the work with that (if any) on which the executable work runs, or a compiler used to (including a physical distribution medium), accompanied by the (including a physical distribution medium), accompanied by a place (gratis or for a charge), and offer equivalent access to the ... ...
到目前为止,您已经在表达式中使用了句点、星号和其他字符,但有时您需要专门搜索这些字符。
转义元字符
有时您需要搜索文字句点或文字左括号,尤其是在使用源代码或配置文件时。 因为这些字符在正则表达式中具有特殊含义,所以您需要“转义”这些字符以告诉 grep
在这种情况下您不希望使用它们的特殊含义。
您可以通过在通常具有特殊含义的字符前面使用反斜杠字符 (\
) 来转义字符。
例如,要查找以大写字母开头并以句点结尾的任何行,请使用以下表达式来转义结束句点,以便它表示文字句点而不是通常的“任何字符”含义:
grep "^[A-Z].*\.$" GPL-3
这是您将看到的输出:
OutputSource. License by making exceptions from one or more of its conditions. License would be to refrain entirely from conveying the Program. ALL NECESSARY SERVICING, REPAIR OR CORRECTION. SUCH DAMAGES. Also add information on how to contact you by electronic and paper mail.
现在让我们看看其他正则表达式选项。
扩展正则表达式
grep
命令通过使用 -E
标志或调用 egrep
命令而不是 grep
命令支持更广泛的正则表达式语言。
这些选项开启了“扩展正则表达式”的功能。 扩展正则表达式包括所有基本元字符,以及用于表达更复杂匹配的附加元字符。
分组
扩展正则表达式开放的最有用的功能之一是将表达式组合在一起以作为一个单元进行操作或引用的能力。
要将表达式组合在一起,请将它们括在括号中。 如果您想在不使用扩展正则表达式的情况下使用括号,可以使用反斜杠对其进行转义以启用此功能。 这意味着以下三个表达式在功能上是等价的:
grep "\(grouping\)" file.txt grep -E "(grouping)" file.txt egrep "(grouping)" file.txt
交替
类似于括号表达式如何为单个字符匹配指定不同的可能选择,交替允许您为字符串或表达式集指定替代匹配。
要指示交替,请使用竖线字符 |
。 这些通常在括号分组中使用,以指定两个或多个可能性之一应被视为匹配。
以下将在文本中找到 GPL
或 General Public License
:
grep -E "(GPL|General Public License)" GPL-3
输出如下所示:
Output The GNU General Public License is a free, copyleft license for the GNU General Public License is intended to guarantee your freedom to GNU General Public License for most of our software; it applies also to price. Our General Public Licenses are designed to make sure that you Developers that use the GNU GPL protect your rights with two steps: For the developers' and authors' protection, the GPL clearly explains authors' sake, the GPL requires that modified versions be marked as have designed this version of the GPL to prohibit the practice for those ... ...
交替可以通过在选择组中添加额外的选项来选择两个以上的选项,这些选项由额外的竖线 (|
) 字符分隔。
量词
与匹配前一个字符或字符集零次或多次的 *
元字符一样,扩展正则表达式中还有其他元字符可用于指定出现次数。
要匹配一个字符零次或一次,您可以使用 ?
字符。 这使得之前出现的字符或字符集本质上是可选的。
以下通过将 copy
放入可选组中来匹配 copyright
和 right
:
grep -E "(copy)?right" GPL-3
您将收到以下输出:
Output Copyright (C) 2007 Free Software Foundation, Inc. To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have know their rights. Developers that use the GNU GPL protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License "Copyright" also means copyright-like laws that apply to other kinds of ...
+
字符与表达式匹配一次或多次。 这几乎类似于 *
元字符,但对于 +
字符,表达式 必须 至少匹配一次。
以下表达式匹配字符串 free
加上一个或多个非空白字符:
grep -E "free[^[:space:]]+" GPL-3
你会看到这个输出:
Output The GNU General Public License is a free, copyleft license for to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to When we speak of free software, we are referring to freedom, not have the freedom to distribute copies of free software (and charge for you modify it: responsibilities to respect the freedom of others. freedomss that you received. You must make sure that they, too, receive protecting users' freedom to change the software. The systematic of the GPL, as needed to protect the freedom of users. patents cannot be used to render the program non-free.
指定匹配重复
要指定匹配重复的次数,请使用大括号字符({
和 }
)。 这些字符使您可以指定表达式可以匹配的次数的确切数字、范围或上限或下限。
使用以下表达式查找 GPL-3
文件中包含三元音的所有行:
grep -E "[AEIOUaeiou]{3}" GPL-3
返回的每一行都有一个带有三个元音的单词:
Outputchanged, so that their problems will not be attributed erroneously to authors of previous versions. receive it, in any medium, provided that you conspicuously and give under the previous paragraph, plus a right to possession of the covered work so as to satisfy simultaneously your obligations under this
要匹配任何包含 16 到 20 个字符的单词,请使用以下表达式:
grep -E "[[:alpha:]]{16,20}" GPL-3
这是此命令的输出:
Output certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others. c) Prohibiting misrepresentation of the origin of that material, or
仅显示包含该长度内的单词的行。
结论
grep
在文件或文件系统层次结构中查找模式很有用,因此值得花时间熟悉它的选项和语法。
正则表达式更加通用,可以与许多流行的程序一起使用。 例如,许多文本编辑器实现了用于搜索和替换文本的正则表达式。
此外,大多数现代编程语言使用正则表达式对特定数据块执行过程。 一旦您理解了正则表达式,您就能够将这些知识转移到许多与计算机相关的常见任务中,从在文本编辑器中执行高级搜索到验证用户输入。