推荐10个网站,保定清苑城市建设网站,各大网站博客怎么做推广,1简述网站建设流程图目录结构 注#xff1a;提前言明 本文借鉴了以下博主、书籍或网站的内容#xff0c;其列表如下#xff1a; 1、参考书籍#xff1a;《PostgreSQL数据库内核分析》 2、参考书籍#xff1a;《数据库事务处理的艺术#xff1a;事务管理与并发控制》 3、PostgreSQL数据库仓库… 目录结构 注提前言明 本文借鉴了以下博主、书籍或网站的内容其列表如下 1、参考书籍《PostgreSQL数据库内核分析》 2、参考书籍《数据库事务处理的艺术事务管理与并发控制》 3、PostgreSQL数据库仓库链接点击前往 4、日本著名PostgreSQL数据库专家 铃木启修 网站主页点击前往 5、参考书籍《PostgreSQL中文手册》 6、参考书籍《PostgreSQL指南内幕探索》点击前往 7、参考书籍《事务处理 概念与技术》 8、PgSQL · 性能优化 · PostgreSQL TPC-C极限优化玩法点击前往 9、https://www.enterprisedb.com/blog/finding-memory-leaks-postgres-c-code点击前往 1、本文内容全部来源于开源社区 GitHub和以上博主的贡献本文也免费开源可能会存在问题评论区等待大佬们的指正 2、本文目的开源共享 抛砖引玉 一起学习 3、本文不提供任何资源 不存在任何交易 与任何组织和机构无关 4、大家可以根据需要自行 复制粘贴以及作为其他个人用途但是不允许转载 不允许商用 写作不易还请见谅 5、本文内容基于PostgreSQL 16.2源码开发而成 深入理解PostgreSQL数据库之查找 PostgreSQL C 代码中的内存泄漏 文章快速说明索引工具使用背景说明ValgrindLeaks in ArenasSetupIntroducing a leakSet up Valgrind wrapperAddressSanitizer, LeakSanitizerPostgres and AddressSanitizereBPF and bcc tools 文章快速说明索引
学习目标
做数据库内核开发久了就会有一种 少年得志年少轻狂 的错觉然鹅细细一品觉得自己其实不算特别优秀 远远没有达到自己想要的。也许光鲜的表面掩盖了空洞的内在每每想到于此皆有夜半临渊如履薄冰之感。为了睡上几个踏实觉即日起 暂缓其他基于PostgreSQL数据库的兼容功能开发近段时间 将着重于学习分享Postgres的基础知识和实践内幕。 学习内容详见目录
1、深入理解PostgreSQL数据库之查找 PostgreSQL C 代码中的内存泄漏 学习时间
2024-04-02 10:44:15 星期二 学习产出
1、PostgreSQL数据库基础知识回顾 1个 2、技术博客 1篇 3、PostgreSQL数据库内核深入学习 注下面我们所有的学习环境是Centos8PostgreSQL16.2Oracle19CMySQL8.0
postgres# select version();version
------------------------------------------------------------------------------------------------------------PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-21), 64-bit
(1 row)postgres##-----------------------------------------------------------------------------#SQL select * from v$version; BANNER Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production
BANNER_FULL Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production Version 19.17.0.0.0
BANNER_LEGACY Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production
CON_ID 0#-----------------------------------------------------------------------------#mysql select version();
-----------
| version() |
-----------
| 8.0.27 |
-----------
1 row in set (0.06 sec)mysql工具使用背景说明
上周我在 Postgres 的 WAL Sender 进程中寻找内存泄漏问题。我花了几天时间更加熟悉 Valgrind 和 gcc/clang sanitizers但最终对它们毫无用处。最后我偶然发现了 bcc 工具集合中的 memleak 程序它引导我找到了源代码。
由于我第一次弄清楚这一切有点困难所以我想分享我经历的过程。处理一些人为的内存泄漏。
尽管这种情况发生在 Postgres 中并且这篇文章介绍了 Postgres 中的漏洞以便我们进行调查但这些技术非常有用。
具体来说在本文结束时我们将能够获得一个很好的堆栈跟踪从而导致正在运行的程序发生泄漏而无需对所需的程序进行任何修改。就像这样的 135168 bytes in 1 allocations from stack0x0000ffff7fb7ede8 sysmalloc_mmap.isra.00x74 [libc.so.6]0x0000ffff7fb7fc40 sysmalloc0x200 [libc.so.6]0x0000ffff7fb80fbc _int_malloc0xdfc [libc.so.6]0x0000ffff7fb81c50 __malloc0x22c [libc.so.6]0x0000fffffffff000 [unknown] [[uprobes]]0x0000000000927714 palloc0x40 [postgres]0x00000000007df2f4 PostgresMain0x1ac [postgres]0x0000000000755f34 report_fork_failure_to_client0x0 [postgres]0x0000000000757e94 BackendStartup0x1b8 [postgres]0x000000000075805c ServerLoop0xfc [postgres]0x00000000007592f8 PostmasterMain0xed4 [postgres]0x00000000006927d8 main0x230 [postgres]0x0000ffff7fb109dc __libc_start_call_main0x7c [libc.so.6]0x0000ffff7fb10ab0 __libc_start_mainGLIBC_2.170x9c [libc.so.6]0x00000000004912f0 _start0x30 [postgres]Valgrind
计算机科学专业的学生通常会学习 Valgrind 来调试程序中的内存泄漏。但 Valgrind 主要在程序结束时检查泄漏而不是在程序期间。
例如在这个简单的、有漏洞的 main.c 中
#include stdio.h
#include string.hint main() {char* x strdup(flubber);printf(%s\n, x);return 1;
}我们可以使用 Valgrind 构建并运行它会告诉我们有关泄漏的信息
$ gcc main.c
$ valgrind --leak-checkfull ./a.out
203008 Memcheck, a memory error detector
203008 Copyright (C) 2002-2022, and GNU GPLd, by Julian Seward et al.
203008 Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
203008 Command: ./a.out
203008
flubber
203008
203008 HEAP SUMMARY:
203008 in use at exit: 8 bytes in 1 blocks
203008 total heap usage: 2 allocs, 1 frees, 1,032 bytes allocated
203008
203008 8 bytes in 1 blocks are definitely lost in loss record 1 of 1
203008 at 0x48C0860: malloc (vg_replace_malloc.c:442)
203008 by 0x49A8FB3: strdup (strdup.c:42)
203008 by 0x4101FB: main (in /home/phil/tmp/a.out)
203008
203008 LEAK SUMMARY:
203008 definitely lost: 8 bytes in 1 blocks
203008 indirectly lost: 0 bytes in 0 blocks
203008 possibly lost: 0 bytes in 0 blocks
203008 still reachable: 0 bytes in 0 blocks
203008 suppressed: 0 bytes in 0 blocks
203008
203008 For lists of detected and suppressed errors, rerun with: -s
203008 ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)We can add in the correct call to free(x):
#include stdio.h
#include string.h
#include stdlib.hint main() {char* x strdup(flubber);printf(%s\n, x);free(x);return 1;
}重建并再次使用 Valgrind 运行
$ gcc main.c
$ valgrind --leak-checkfull ./a.out
203099 Memcheck, a memory error detector
203099 Copyright (C) 2002-2022, and GNU GPLd, by Julian Seward et al.
203099 Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
203099 Command: ./a.out
203099
flubber
203099
203099 HEAP SUMMARY:
203099 in use at exit: 0 bytes in 0 blocks
203099 total heap usage: 2 allocs, 2 frees, 1,032 bytes allocated
203099
203099 All heap blocks were freed -- no leaks are possible
203099
203099 For lists of detected and suppressed errors, rerun with: -s
203099 ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)Valgrind 告诉我们确实没有泄漏。 Leaks in Arenas
但管理内存的方法有很多种。Postgres 代码通常不会直接调用 malloc。它调用 palloc 在 MemoryContext 中分配内存在其他代码库中有时称为 arena。MemoryContext 是一块空闲内存palloc 从中获取内存。然后在某个时刻它会被一次性释放。
问题是在 MemoryContext 的生命周期内可能会发生泄漏。Valgrind 无法轻易告诉您这些泄露的情况。
Postgres 确实有一个有用的内置方法来转储内存上下文信息。例如您可以将此行添加到任何函数或者您可以从 gdb 执行此行
MemoryContextStats(TopMemoryContext);您将获得有关内存上下文和所有子内存上下文中的分配的信息并将其写入 stderr。通过观察 MemoryContextStats 随着时间的推移打印的分配量的增加并通过添加额外的内存上下文您也许能够发现泄漏的位置。
但显式添加此调用或通过 gdb 添加此调用是相当手动的。如果有一些系统可以自动告诉我们内存分配的增长情况那就方便多了。
让我们在 Postgres 中引入一个泄漏看看我们能做些什么来发现它重现一个我们不知道这个泄漏在哪里的情况。
Setup
首先获取 Postgres签出版本 16.2 并构建它
$ git clone https://github.com/postgres/postgres
$ cd postgres
$ git checkout REL_16_2
$ ./configure \CFLAGS-ggdb -Og -g3 -fno-omit-frame-pointer -DUSE_VALGRIND \--prefix$(pwd)/build \--libdir$(pwd)/build/lib \--enable-debug
$ make make install这些标志启用 Valgrind、启用调试模式并将构建设置为本地安装而不是全局安装。构建完成后您可以创建一个 Postgres 数据库
$ ./build/bin/initdb test-db
The files belonging to this database system will be owned by user phil.
This user must also own the server process.The database cluster will be initialized with locale en_US.UTF-8.
The default database encoding has accordingly been set to UTF8.
The default text search configuration will be set to english.Data page checksums are disabled.creating directory test-db ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... America/New_York
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... okinitdb: warning: enabling trust authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.Success. You can now start the database server using:build/bin/pg_ctl -D test-db -l logfile start然后在前台启动Postgres服务器
$ ./build/bin/postgres --config-file$(pwd)/test-db/postgresql.conf -D $(pwd)/test-db -k $(pwd)/test-db
2024-03-21 16:08:10.984 EDT [213083] LOG: starting PostgreSQL 16.2 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 13.2.1 20231205 (Red Hat 13.2.1-6), 64-bit
2024-03-21 16:08:10.986 EDT [213083] LOG: listening on IPv6 address ::1, port 5432
2024-03-21 16:08:10.986 EDT [213083] LOG: listening on IPv4 address 127.0.0.1, port 5432
2024-03-21 16:08:10.987 EDT [213083] LOG: listening on Unix socket /home/phil/tmp/postgres/test-db/.s.PGSQL.5432
2024-03-21 16:08:10.989 EDT [213086] LOG: database system was shut down at 2024-03-21 16:07:20 EDT
2024-03-21 16:08:10.992 EDT [213083] LOG: database system is ready to accept connections在第二个终端中使用我们构建的 psql 副本连接到它
./build/bin/psql -h localhost postgres
psql (16.2)
Type help for help.postgres# select 1;?column?
----------1
(1 row)现在我们来介绍一下泄漏。 Introducing a leak
src/backend/tcop/postgres.c 中的 PostgresMain() 函数有一个看起来很诱人的巨大循环。 4417 /*4418 * Non-error queries loop here.4419 */44204421 for (;;)4422 {4423 int firstchar;4424 StringInfoData input_message;44254426 #define CHUNKSIZE 40964427 char* waste palloc(CHUNKSIZE);4428 printf(%d from %d, CHUNKSIZE, getpid());让我们做出这样的改变
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 36cc99ec9c..28d08f3b11 100644
--- a/src/backend/tcop/postgres.cb/src/backend/tcop/postgres.c-4423,6 4423,10 PostgresMain(const char *dbname, const char *username)int firstchar;StringInfoData input_message; #define CHUNKSIZE 4096char* waste palloc(CHUNKSIZE);printf(%d from %d\n, CHUNKSIZE, getpid());
/** At top of loop, reset extended-query-message flag, so that any* errors encountered in idle state dont provoke skip.由于我们在主 Postgres 循环中进行分配因此每次单个连接的客户端在同一连接上发送一条语句时我们都会分配一个新的 4KB 块。
由于我们使用 palloc所有内存都将在进程结束时以 Valgrind 认为有效的方式释放。但在单个客户端的生命周期内处理连接的 Postgres 后端进程将随着客户端发送语句而不断增加内存。
这是内存泄漏在极端情况下如果客户端发送 100 万条语句我们将在单个进程中分配 4KB * 1,000,000 4GB并且在进程退出之前不会释放它。这是一个有点人为的案例但这种情况确实发生了。
Set up Valgrind wrapper
让我们用 diff 来重建记住 Valgrind 已经启用了。
$ make make install我们仍然需要将 Postgres 包装在 Valgrind 中。Tom Lane 在 Postgres 邮件列表上分享了他的包装脚本我已经对其进行了改编。将其添加到 Postgres 存储库根目录中的 postgres.valgrind 中
#!/usr/bin/env bashmkdir -p $HOME/valgrind-logsSCRIPT_DIR$( cd -- $( dirname -- ${BASH_SOURCE[0]} ) /dev/null pwd )exec valgrind \--quiet \--show-leak-kindsall \--track-originsyes \--verbose \--suppressions$(realpath $SCRIPT_DIR/../../)/src/tools/valgrind.supp \--time-stampyes --error-exitcode128 --trace-childrenyes \--log-file$HOME/valgrind-logs/%p.log \$SCRIPT_DIR/postgres.orig $旁注我也尝试使用 --leak-checkfull 标志运行这个 Valgrind 脚本但是当我有这个标志时Postgres 总是会崩溃。我不知道为什么。
现在将当前的 Postgres 二进制文件重命名为 postgres.orig并将此脚本复制为新的 Postgres 二进制文件
$ mv build/bin/postgres build/bin/postgres.orig
$ chmod x postgres.valgrind
$ cp postgres.valgrind build/bin/postgres启动服务器无需重新initdb
$ ./build/bin/postgres --config-file$(pwd)/test-db/postgresql.conf -D $(pwd)/test-db -k $(pwd)/test-db
2024-03-21 16:18:28.233 EDT [216336] LOG: starting PostgreSQL 16.2 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 13.2.1 20231205 (Red Hat 13.2.1-6), 64-bit
2024-03-21 16:18:28.234 EDT [216336] LOG: listening on IPv6 address ::1, port 5432
2024-03-21 16:18:28.234 EDT [216336] LOG: listening on IPv4 address 127.0.0.1, port 5432
2024-03-21 16:18:28.234 EDT [216336] LOG: listening on Unix socket /home/phil/tmp/postgres/test-db/.s.PGSQL.5432
2024-03-21 16:18:28.236 EDT [216336] LOG: database system was shut down at 2024-03-21 16:18:00 EDT
2024-03-21 16:18:28.238 EDT [216336] LOG: database system is ready to accept connections并像以前一样连接 psql
$ ./build/bin/psql -h localhost postgres
psql (16.2)
Type help for help.postgres#您将立即在 Postgres 服务器进程中看到一条日志行
4096 from 216344这就是我们引入的泄漏。如果我们在同一个 psql 进程中多次运行 SELECT 1
$ ./build/bin/psql -h localhost postgres
psql (16.2)
Type help for help.postgres# select 1;?column?
----------1
(1 row)postgres# select 1;?column?
----------1
(1 row)postgres# select 1;?column?
----------1
(1 row)postgres# select 1;?column?
----------1
(1 row)我们将在服务器日志中看到相同数量的打印语句
... omitted ...
2024-03-21 16:18:28.238 EDT [216336] LOG: database system is ready to accept connections
4096 from 216344
4096 from 216344
4096 from 216344
4096 from 216344
4096 from 216344现在如果我们在服务器进程上按 Ctrl-c我们就可以查看 Valgrind 日志。
$ cat ~/valgrind-logs/*
00:00:00:00.000 216336 Memcheck, a memory error detector
00:00:00:00.000 216336 Copyright (C) 2002-2022, and GNU GPLd, by Julian Seward et al.
00:00:00:00.000 216336 Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
00:00:00:00.000 216336 Command: /home/phil/tmp/postgres/build/bin/postgres.orig --config-file/home/phil/tmp/postgres/test-db/postgresql.conf -D /home/phil/tmp/postgres/test-db -k /home/phil/tmp/postgres/test-db
00:00:00:00.000 216336 Parent PID: 143066
00:00:00:00.000 216336
00:00:00:14.925 216336
00:00:00:14.925 216336 HEAP SUMMARY:
00:00:00:14.925 216336 in use at exit: 218,493 bytes in 75 blocks
00:00:00:14.925 216336 total heap usage: 1,379 allocs, 353 frees, 1,325,653 bytes allocated
00:00:00:14.925 216336
00:00:00:14.926 216336 LEAK SUMMARY:
00:00:00:14.926 216336 definitely lost: 410 bytes in 5 blocks
00:00:00:14.926 216336 indirectly lost: 3,331 bytes in 45 blocks
00:00:00:14.926 216336 possibly lost: 102,984 bytes in 8 blocks
00:00:00:14.926 216336 still reachable: 47,148 bytes in 182 blocks
00:00:00:14.926 216336 suppressed: 0 bytes in 0 blocks
00:00:00:14.926 216336 Rerun with --leak-checkfull to see details of leaked memory
00:00:00:14.926 216336
00:00:00:14.926 216336 For lists of detected and suppressed errors, rerun with: -s
00:00:00:14.926 216336 ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)结果发现有410个字节肯定丢失了。无法解释每个客户端语句丢失的 4096*5 20,480 字节的累积损失。
而且Valgrind 非常慢。Valgrind 手册说 您的程序将比正常运行速度慢得多例如 20 到 30 倍并且使用更多内存。 我们必须尝试其他方法。 AddressSanitizer, LeakSanitizer
Valgrind 自 2002 年就已存在。但在 2013 年左右Google 直接向 clang 和后来的 gcc 贡献了一些类似的功能。历史对我来说有点难以理解。AddressSanitizer 和 LeakSanitizer 可能曾经是独立的但现在 AddressSanitizer 包含 LeakSanitizer。要获得sanitizer您只需使用 -fsanitizeaddress 编译代码即可。
让我们回到那个简单的、有漏洞的 main.c 程序
$ cat main.c
#include stdio.h
#include string.h
#include stdlib.hint main() {char* x strdup(flubber);printf(%s\n, x);return 1;
}并使用 sanitizers 编译并运行
$ gcc -fsanitizeaddress main.c
$ ./a.out
flubber223338ERROR: LeakSanitizer: detected memory leaksDirect leak of 8 byte(s) in 1 object(s) allocated from:#0 0xffff90c817d4 in __interceptor_strdup (/lib64/libasan.so.80x817d4) (BuildId: 3109905b64795755dad05d7bb29ad23633a06660)#1 0x410238 in main (/home/phil/tmp/a.out0x410238) (BuildId: e98d3e4308c692d5fedef8494d1a925d8f40759a)#2 0xffff90a609d8 in __libc_start_call_main (/lib64/libc.so.60x309d8) (BuildId: 8adf04071ee0060cbf618c401e50c21b9070755f)#3 0xffff90a60aac in __libc_start_mainGLIBC_2.34 (/lib64/libc.so.60x30aac) (BuildId: 8adf04071ee0060cbf618c401e50c21b9070755f)#4 0x41012c in _start (/home/phil/tmp/a.out0x41012c) (BuildId: e98d3e4308c692d5fedef8494d1a925d8f40759a)SUMMARY: AddressSanitizer: 8 byte(s) leaked in 1 allocation(s).嘿那太完美了。甚至比使用 Valgrind 和需要包装器更简单。泄漏检测直接内置于二进制文件中。
让我们用 Postgres 试试这个。 Postgres and AddressSanitizer
回到我们之前克隆的 Postgres git 存储库请记住我们应用了这个 diff它将在每个客户端语句中泄漏内存。
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 36cc99ec9c..28d08f3b11 100644
--- a/src/backend/tcop/postgres.cb/src/backend/tcop/postgres.c-4423,6 4423,10 PostgresMain(const char *dbname, const char *username)int firstchar;StringInfoData input_message; #define CHUNKSIZE 4096char* waste palloc(CHUNKSIZE);printf(%d from %d\n, CHUNKSIZE, getpid());
/** At top of loop, reset extended-query-message flag, so that any* errors encountered in idle state dont provoke skip.再次运行 make clean 和 autoconf但这次在 CFLAGS 值中使用 -fsanitizeaddress 。然后重建。
$ make clean
$ ./configure \CFLAGS-ggdb -Og -g3 -fno-omit-frame-pointer -fsanitizeaddress \--prefix$(pwd)/build \--libdir$(pwd)/build/lib \--enable-debug
$ make make install现在再次运行 Postgres 服务器
./build/bin/postgres --config-file$(pwd)/test-db/postgresql.conf -D $(pwd)/test-db -k $(pwd)/test-db
2024-03-22 10:31:54.724 EDT [233257] LOG: starting PostgreSQL 16.2 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 13.2.1 20231205 (Red Hat 13.2.1-6), 64-bit
2024-03-22 10:31:54.725 EDT [233257] LOG: listening on IPv6 address ::1, port 5432
2024-03-22 10:31:54.725 EDT [233257] LOG: listening on IPv4 address 127.0.0.1, port 5432
2024-03-22 10:31:54.727 EDT [233257] LOG: listening on Unix socket /home/phil/tmp/postgres/test-db/.s.PGSQL.5432
2024-03-22 10:31:54.731 EDT [233260] LOG: database system was interrupted; last known up at 2024-03-22 10:28:56 EDT
2024-03-22 10:31:54.740 EDT [233260] LOG: database system was not properly shut down; automatic recovery in progress
2024-03-22 10:31:54.741 EDT [233260] LOG: invalid record length at 0/1531B08: expected at least 24, got 0
2024-03-22 10:31:54.741 EDT [233260] LOG: redo is not required
2024-03-22 10:31:54.742 EDT [233258] LOG: checkpoint starting: end-of-recovery immediate wait
2024-03-22 10:31:54.793 EDT [233258] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write0.050 s, sync0.001 s, total0.052 s; sync files2, longest0.001 s, average0.001 s; distance0 kB, estimate0 kB; lsn0/1531B08, redo lsn0/1531B08
233260ERROR: LeakSanitizer: detected memory leaks
Direct leak of 328 byte(s) in 1 object(s) allocated from:#0 0xffffaa8e57b0 in malloc (/lib64/libasan.so.80xc57b0) (BuildId: 3109905b64795755dad05d7bb29ad23633a06660)#1 0x10104ac in save_ps_display_args /home/phil/tmp/postgres/src/backend/utils/misc/ps_status.c:165#2 0x9aed7c in main /home/phil/tmp/postgres/src/backend/main/main.c:91#3 0xffffa9fb09d8 in __libc_start_call_main (/lib64/libc.so.60x309d8) (BuildId: 8adf04071ee0060cbf618c401e50c21b9070755f)#4 0xffffa9fb0aac in __libc_start_mainGLIBC_2.34 (/lib64/libc.so.60x30aac) (BuildId: 8adf04071ee0060cbf618c401e50c21b9070755f)#5 0x4a15ac in _start (/home/phil/tmp/postgres/build/bin/postgres0x4a15ac) (BuildId: 4cede14f9b35edbb8bf598d2efbc883c4ace6797)SUMMARY: AddressSanitizer: 328 byte(s) leaked in 1 allocation(s).
2024-03-22 10:31:54.831 EDT [233257] LOG: startup process (PID 233260) exited with exit code 1
2024-03-22 10:31:54.831 EDT [233257] LOG: terminating any other active server processes
2024-03-22 10:31:54.831 EDT [233257] LOG: shutting down due to startup process failure
2024-03-22 10:31:54.832 EDT [233257] LOG: database system is shut down
233257ERROR: LeakSanitizer: detected memory leaksDirect leak of 328 byte(s) in 1 object(s) allocated from:#0 0xffffaa8e57b0 in malloc (/lib64/libasan.so.80xc57b0) (BuildId: 3109905b64795755dad05d7bb29ad23633a06660)#1 0x10104ac in save_ps_display_args /home/phil/tmp/postgres/src/backend/utils/misc/ps_status.c:165#2 0x9aed7c in main /home/phil/tmp/postgres/src/backend/main/main.c:91#3 0xffffa9fb09d8 in __libc_start_call_main (/lib64/libc.so.60x309d8) (BuildId: 8adf04071ee0060cbf618c401e50c21b9070755f)#4 0xffffa9fb0aac in __libc_start_mainGLIBC_2.34 (/lib64/libc.so.60x30aac) (BuildId: 8adf04071ee0060cbf618c401e50c21b9070755f)#5 0x4a15ac in _start (/home/phil/tmp/postgres/build/bin/postgres0x4a15ac) (BuildId: 4cede14f9b35edbb8bf598d2efbc883c4ace6797)SUMMARY: AddressSanitizer: 328 byte(s) leaked in 1 allocation(s).好吧它立即崩溃了那很有意思。
但仔细观察这些消息:
与我们引入的 4096 字节泄漏无关看起来并不像泄漏那么重要因为这些消息与程序初始化有关
由于它们不在循环中显示的分配不会随着时间的推移而不断增长。所以没什么大不了的。
我们的泄漏在哪里事实证明AddressSanitizer/LeakSanitizer 的默认行为是在出现第一个错误时崩溃。在这种情况下很可能许多生产系统后来才引入sanitizers这是没有用的。值得庆幸的是sanitizers有一些运行时选项我们可以在环境变量中设置。
让我们设置环境变量 ASAN_OPTIONShalt_on_errorfalse 并重新运行 Postgres
$ ASAN_OPTIONShalt_on_errorfalse ./build/bin/postgres --config-file$(pwd)/test-db/postgresql.conf -D $(pwd)/test-db -k $(pwd)/test-db
2024-03-22 10:38:07.588 EDT [233554] LOG: starting PostgreSQL 16.2 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 13.2.1 20231205 (Red Hat 13.2.1-6), 64-bit
2024-03-22 10:38:07.589 EDT [233554] LOG: listening on IPv6 address ::1, port 5432
2024-03-22 10:38:07.589 EDT [233554] LOG: listening on IPv4 address 127.0.0.1, port 5432
2024-03-22 10:38:07.591 EDT [233554] LOG: listening on Unix socket /home/phil/tmp/postgres/test-db/.s.PGSQL.5432
2024-03-22 10:38:07.595 EDT [233557] LOG: database system was shut down at 2024-03-22 10:38:06 EDT
233557ERROR: LeakSanitizer: detected memory leaksDirect leak of 336 byte(s) in 1 object(s) allocated from:#0 0xffffbc6957b0 in malloc (/lib64/libasan.so.80xc57b0) (BuildId: 3109905b64795755dad05d7bb29ad23633a06660)#1 0x10104ac in save_ps_display_args /home/phil/tmp/postgres/src/backend/utils/misc/ps_status.c:165#2 0x9aed7c in main /home/phil/tmp/postgres/src/backend/main/main.c:91#3 0xffffbbd609d8 in __libc_start_call_main (/lib64/libc.so.60x309d8) (BuildId: 8adf04071ee0060cbf618c401e50c21b9070755f)#4 0xffffbbd60aac in __libc_start_mainGLIBC_2.34 (/lib64/libc.so.60x30aac) (BuildId: 8adf04071ee0060cbf618c401e50c21b9070755f)#5 0x4a15ac in _start (/home/phil/tmp/postgres/build/bin/postgres0x4a15ac) (BuildId: 4cede14f9b35edbb8bf598d2efbc883c4ace6797)SUMMARY: AddressSanitizer: 336 byte(s) leaked in 1 allocation(s).
2024-03-22 10:38:07.651 EDT [233554] LOG: database system is ready to accept connections好的 它报告了泄漏但仍在继续。让我们像以前一样通过连接 psql 来触发我们添加的泄漏。
$ ./build/bin/psql -h localhost postgres
psql (16.2)
Type help for help.postgres# select 1;?column?
----------1
(1 row)postgres# select 1;?column?
----------1
(1 row)postgres# select 1;?column?
----------1
(1 row)postgres# select 1;?column?
----------1
(1 row)并退出连接。您可能会看到 psql 进程本身的泄漏报告。但我们正在寻找的泄漏是在 Postgres 后端进程中。
2024-03-22 10:38:07.651 EDT [233554] LOG: database system is ready to accept connections
4096 from 233705
4096 from 233705
4096 from 233705
4096 from 233705
4096 from 233705
233710ERROR: LeakSanitizer: detected memory leaks
Direct leak of 336 byte(s) in 1 object(s) allocated from:#0 0xffffbc6957b0 in malloc (/lib64/libasan.so.80xc57b0) (BuildId: 3109905b64795755dad05d7bb29ad23633a06660)#1 0x10104ac in save_ps_display_args /home/phil/tmp/postgres/src/backend/utils/misc/ps_status.c:165#2 0x9aed7c in main /home/phil/tmp/postgres/src/backend/main/main.c:91#3 0xffffbbd609d8 in __libc_start_call_main (/lib64/libc.so.60x309d8) (BuildId: 8adf04071ee0060cbf618c401e50c21b9070755f)#4 0xffffbbd60aac in __libc_start_mainGLIBC_2.34 (/lib64/libc.so.60x30aac) (BuildId: 8adf04071ee0060cbf618c401e50c21b9070755f)#5 0x4a15ac in _start (/home/phil/tmp/postgres/build/bin/postgres0x4a15ac) (BuildId: 4cede14f9b35edbb8bf598d2efbc883c4ace6797)SUMMARY: AddressSanitizer: 336 byte(s) leaked in 1 allocation(s).
233705ERROR: LeakSanitizer: detected memory leaksDirect leak of 336 byte(s) in 1 object(s) allocated from:#0 0xffffbc6957b0 in malloc (/lib64/libasan.so.80xc57b0) (BuildId: 3109905b64795755dad05d7bb29ad23633a06660)#1 0x10104ac in save_ps_display_args /home/phil/tmp/postgres/src/backend/utils/misc/ps_status.c:165#2 0x9aed7c in main /home/phil/tmp/postgres/src/backend/main/main.c:91#3 0xffffbbd609d8 in __libc_start_call_main (/lib64/libc.so.60x309d8) (BuildId: 8adf04071ee0060cbf618c401e50c21b9070755f)#4 0xffffbbd60aac in __libc_start_mainGLIBC_2.34 (/lib64/libc.so.60x30aac) (BuildId: 8adf04071ee0060cbf618c401e50c21b9070755f)#5 0x4a15ac in _start (/home/phil/tmp/postgres/build/bin/postgres0x4a15ac) (BuildId: 4cede14f9b35edbb8bf598d2efbc883c4ace6797)SUMMARY: AddressSanitizer: 336 byte(s) leaked in 1 allocation(s).但我们的泄漏一无所知我猜sanitizers在这方面的工作原理与 Valgrind 类似因为在进程结束时内存已被释放。只是之前没有。
此时我被难住了一段时间。 eBPF and bcc tools
我一直在阅读 Brendan Gregg 的《Systems Performance》一书所以我开始想知道是否有一种方法可以使用 perf 或 bpf profile来记录正在运行的程序的内存使用情况并查看分配的来源。我使用 perf 来分析函数但没有使用它来分析内存使用情况。我四处寻找如何用堆栈跟踪记录分配但无法弄清楚。
因此我查看了 bpf profile并注意到 Brendan Gregg 关于内存泄漏的页面。原来有一个可以直接识别内存泄漏的工具memleak.py
要使用它请安装 bcc 工具套件如果尚未安装。不幸的是每个发行版都会更改程序的名称因此使用 find 可能是最简单的
$ sudo find / -name *memleak*
/tmp/kheaders-6.5.6-300.fc39.aarch64/include/linux/kmemleak.h
/usr/share/man/man3/curs_memleaks.3x.gz
/usr/share/man/man8/bcc-memleak.8.gz
/usr/share/bcc/tools/doc/memleak_example.txt
/usr/share/bcc/tools/memleak
/usr/src/kernels/6.7.9-200.fc39.aarch64/include/linux/kmemleak.h
/usr/src/kernels/6.7.9-200.fc39.aarch64/samples/kmemleak好的它位于 /usr/share/bcc/tools/memleak。
让我们在不使用sanitizers的情况下重建 Postgres
$ make clean
$ ./configure \CFLAGS-ggdb -Og -g3 -fno-omit-frame-pointer \--prefix$(pwd)/build \--libdir$(pwd)/build/lib \--enable-debug
$ make make install运行 Postgres 服务器
$ ./build/bin/postgres --config-file$(pwd)/test-db/postgresql.conf -D $(pwd)/test-db -k $(pwd)/test-db
2024-03-22 11:36:06.364 EDT [245850] LOG: starting PostgreSQL 16.2 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 13.2.1 20231205 (Red Hat 13.2.1-6), 64-bit
2024-03-22 11:36:06.364 EDT [245850] LOG: listening on IPv6 address ::1, port 5432
2024-03-22 11:36:06.364 EDT [245850] LOG: listening on IPv4 address 127.0.0.1, port 5432
2024-03-22 11:36:06.365 EDT [245850] LOG: listening on Unix socket /home/phil/tmp/postgres/test-db/.s.PGSQL.5432
2024-03-22 11:36:06.368 EDT [245853] LOG: database system was shut down at 2024-03-22 11:36:05 EDT
2024-03-22 11:36:06.370 EDT [245850] LOG: database system is ready to accept connections启动 psql 并获取正在服务器上处理我们请求的关联后端进程 PID
$ ./build/bin/psql -h localhost postgres
psql (16.2)
Type help for help.postgres# select pg_backend_pid();pg_backend_pid
----------------245858
(1 row)打开一个新的终端会话保持 psql 连接和 Postgres 服务器进程处于活动状态。并运行 memleak 程序
$ sudo /usr/share/bcc/tools/memleak -p 245858
Attaching to pid 245858, CtrlC to quit.
[11:37:19] Top 10 stacks with outstanding allocations:
[11:37:24] Top 10 stacks with outstanding allocations:它将开始每五秒报告一次泄漏。
让我们通过继续在 psql 会话中运行 SELECT 1 语句来更积极地触发泄漏
$ ./build/bin/psql -h localhost postgres
psql (16.2)
Type help for help.postgres# select pg_backend_pid();pg_backend_pid
----------------245858
(1 row)postgres# select 1;?column?
----------1
(1 row)postgres# select 1;?column?
----------1
(1 row)运行此 SELECT 1 命令 20-30 次左右后您应该看到 memleak 程序报告 135168 bytes in 1 allocations from stack0x0000ffff7fb7ede8 sysmalloc_mmap.isra.00x74 [libc.so.6]0x0000ffff7fb7fc40 sysmalloc0x200 [libc.so.6]0x0000ffff7fb80fbc _int_malloc0xdfc [libc.so.6]0x0000ffff7fb81c50 __malloc0x22c [libc.so.6]0x0000fffffffff000 [unknown] [[uprobes]]0x0000000000927714 palloc0x40 [postgres]0x00000000007df2f4 PostgresMain0x1ac [postgres]0x0000000000755f34 report_fork_failure_to_client0x0 [postgres]0x0000000000757e94 BackendStartup0x1b8 [postgres]0x000000000075805c ServerLoop0xfc [postgres]0x00000000007592f8 PostmasterMain0xed4 [postgres]0x00000000006927d8 main0x230 [postgres]0x0000ffff7fb109dc __libc_start_call_main0x7c [libc.so.6]0x0000ffff7fb10ab0 __libc_start_mainGLIBC_2.170x9c [libc.so.6]0x00000000004912f0 _start0x30 [postgres]这就是内存泄漏有一个漂亮的堆栈跟踪可以帮助我们准确找到它在哪里。