前言
在代码审计工作中,传统的人工审计方式不仅耗时耗力,而且对于复杂的代码逻辑和海量代码库来说,难免存在疏漏。ChatGPT 的出现,为代码审计带来了新的思路和方法。本文基于ChatGPT实现代码审计
github地址:https://github.com/honysyang/AICodeScan.git
项目架构
该项目架构设计如下,主要包含输入层、预处理层、分析引擎层、输出层。
其中,输入层用于指定目标源文件;预处理层用于标准化分析文件,去除空格、特殊字符等内容;分析引擎层包含本地漏洞库扫描、AI分析等,支持本地和云端AI模型;输出层主要用于可视化。
项目运行逻辑
项目运行逻辑如下,目标源文件输入程序后,会对源文件进行预处理,生成待检测文件(AST、标准化后的文件),然后调用本地的漏洞库对文件进行快速扫描,此时会生成漏洞扫描结果,然后将待检测文件和漏洞扫描结果一并传递给安全分析Agent,安全分析Agent会根据提示词从内存安全、控制流安全、数据安全等维度进行分析,并按照特定规范输出结果,最后再将前面的输出结果一并打包输入攻防分析Agent,该Agent会从攻击面、漏洞利用链的角度进行分析,最后会输出一份完整的报告文件。
项目关键代码
程序的核心检测模型如下:
def analyze(self, file_path: str) -> AnalysisResult:
"""执行完整分析流程"""
try:
context = self._load_code(file_path)
# 多引擎分析
safety_data = self._safety_analysis(context)
attack_data = self._attack_analysis(context, safety_data)
return AnalysisResult(
safety_report=safety_data,
attack_report=attack_data
)
except Exception as e:
self.logger.error(f"分析过程中发生错误: {str(e)}", exc_info=True)
raise
本地漏洞扫描关键代码如下:
def scan_code(self, code: str) -> Dict:
"""本地漏洞扫描"""
findings = []
for cwe_id, pattern in self.patterns.items():
try:
if re.search(pattern['regex'], code, re.MULTILINE):
findings.append({
'cwe_id': cwe_id,
'type': pattern['type'],
'description': pattern['description']
})
except re.error as e:
self.logger.error(f"正则表达式解析失败: {e} (CWE: {cwe_id}, 模式: {pattern['regex']})")
continue
return {'local_findings': findings}
AI引擎扫描关键代码如下:
def _safety_analysis(self, context: CodeContext) -> Dict:
"""安全分析阶段"""
# 本地漏洞检测
local_findings = self.vuln_db.scan_code(context.processed_content)
# AI分析
ai_report = self._call_ai_engine(
prompt=SAFETY_PROMPT,
code=context.processed_content,
context=local_findings
)
return self._merge_reports(local_findings, ai_report)
def _attack_analysis(self, context: CodeContext, safety_data: Dict) -> Dict:
"""攻防分析阶段"""
# AI分析
ai_report = self._call_ai_engine(
prompt=ATTACK_PROMPT,
code=context.processed_content,
context=safety_data
)
# 补充本地检测
ai_report['mitigations'] = self.vuln_db.get_mitigations(
ai_report.get('cwe_ids', [])
)
return ai_report
项目测试效果
测试文件:
选取某缓冲区溢出文件,代码地址:
https://github.com/TouwaErioH/security/blob/master/stack%20overflow/vulnerables/vul1.c
源代码如下:
intbar(char *arg, char *out)
{
strcpy(out, arg);
return 0;
}
voidfoo(char *argv[])
{
char buf[256];
bar(argv[1], buf);
}
intmain(int argc, char *argv[])
{
if (argc != 2)
{
fprintf(stderr, "target1: argc != 2n");
exit(EXIT_FAILURE);
}
setuid(0);
foo(argv);
return 0;
}
项目检测结果
(.venv) PS D:codeguardiancore> python .analyzer.py C:Userszhongjie_yangDesktopvul1.c
Current working directory: D:codeguardiancore
2025-03-04 16:56:58,418 - CodeGuardian - INFO - ==================================================
2025-03-04 16:56:58,418 - CodeGuardian - INFO - Initializing CodeGuardian Analyzer
2025-03-04 16:56:58,418 - CodeGuardian - INFO - Log file: D:codeguardiancorelogsanalysis_20250304_165658.log
2025-03-04 16:56:58,419 - VulnerabilityDB - INFO - Loading vulnerability patterns...
2025-03-04 16:56:58,419 - VulnerabilityDB - INFO - Loaded 2 CWE patterns
2025-03-04 16:56:58,420 - VulnerabilityDB - INFO - Loaded 1 mitigation methods
2025-03-04 16:56:58,420 - CodeGuardian - INFO - Loading code file: C:Userszhongjie_yangDesktopvul1.c
2025-03-04 16:56:58,420 - CodeGuardian - INFO - File loaded successfully. Size: 390 characters
2025-03-04 16:56:58,421 - CodeGuardian - INFO - Extracted 3 functions
2025-03-04 16:56:58,421 - CodeGuardian - INFO - Initiating AI analysis...
2025-03-04 16:56:58,422 - CodeGuardian - INFO - Using OpenAI API for analysis
2025-03-04 16:57:01,289 - httpx - INFO - HTTP Request: POST https://api.chatanywhere.tech/v1/chat/completions "HTTP/1.1 200 OK"
2025-03-04 16:57:01,302 - CodeGuardian - INFO - Merging local and AI analysis results
2025-03-04 16:57:01,302 - CodeGuardian - INFO - Initiating AI analysis...
2025-03-04 16:57:01,303 - CodeGuardian - INFO - Using OpenAI API for analysis
2025-03-04 16:57:05,100 - httpx - INFO - HTTP Request: POST https://api.chatanywhere.tech/v1/chat/completions "HTTP/1.1 200 OK"
2025-03-04 16:57:05,102 - ReportGen - INFO - 生成规范化报告...
2025-03-04 16:57:05,110 - ReportGen - INFO - 生成Markdown摘要:summary.md
2025-03-04 16:57:05,113 - ReportGen - INFO - 报告已保存至:D:codeguardiancorereports
分析完成!结果保存在reports目录
输出json文档
{
"functions": [
{
"line": 6,
"name": "bar",
"risks": [
{
"confidence": 92,
"cwe": "CWE-121",
"evidence": "strcpy(out, arg)",
"trace": "main → foo → bar [L10→L6]",
"type": "栈溢出"
}
]
}
],
"metadata": {
"ai_findings": [],
"local_findings": [
{
"cwe_id": "CWE-121",
"description": "栈缓冲区溢出是一种常见的缓冲区溢出漏洞,当程序向栈上的缓冲区写入数据时,写入的数据量超过了该缓冲区的边界,就会覆盖相邻的内存区域。这可能导致程序崩溃、执行任意代码、泄露敏感信息等严重后果,攻击者可以利用该漏洞来控制程序的执行流程,从而达到恶意目的。",
"severity": 5,
"type": "栈缓冲区溢出"
}
]
},
"risks": [
{
"cwe_id": "CWE-121",
"description": "栈缓冲区溢出是一种常见的缓冲区溢出漏洞,当程序向栈上的缓冲区写入数据时,写入的数据量超过了该缓冲区的边界,就会覆盖相邻的内存区域。这可能导致程序崩溃、执行任意代码、泄露敏感信息等严重后果,攻击者可以利用该漏洞来控制程序的执行流程,从而达到恶意目的。",
"severity": 5,
"type": "栈缓冲区溢出"
}
]
}
{
"exploit_chain": {
"entry_points": [
"bar@L6"
],
"shellcode": {
"constraints": {
"bad_chars": [
"0x00",
"0x0A"
],
"max_size": 256
},
"type": "staged"
},
"techniques": [
{
"description": "基于缓冲区溢出的控制流劫持",
"mitre_id": "T1203",
"probability": 0.9,
"steps": [
"构造超长输入以覆盖返回地址",
"利用控制流劫持执行任意代码"
]
}
]
},
"mitigations": []
}
项目当前局限性
1. 当前项目编程语言方面专注于C、C++,漏洞审计方面专注于缓冲区溢出。
2. 当前项目在单个源代码审计方面表现出色,但是在跨文本中还有不足,待后续完善。
原文始发于微信公众号(小杨时光智汇):基于ChatGPT实现C语言代码审计
- 左青龙
- 微信扫一扫
-
- 右白虎
- 微信扫一扫
-
评论