JCTree
官方文档
语法树是从JCTree实现com.sun.source.Tree的子类型及其子类型构建的
今天主要是深入看下AST的数据结构,方便以后更好的处理AST数据。
先找到JCTree(com.sun.tools.javac.tree.JCTree)
,你说怎么找??
上图箭头这里找到com.sun.source.Tree
,然后在子类中就可以找到JCTree
了
这个其实就是语法树的最小节点了(为啥这么说?-- 使用TreeMaker
创建语法树. 语法树是从JCTree
实现 com.sun.source.Tree
的子类型及其子类型构建的)
其实我们看JCTree
的子类名称就大概知道了,都是简洁明了的
举例说明
JCReturn:return语句
JCClassDecl:类定义
JCVariableDecl:字段/变量定义
JCIf:if语句
JCMethodDecl:方法定义语法树节点
JCModifiers:访问标志语法树节点
JCExpression:表达式语法树节点,常见的子类如下
JCAssign:赋值语句
JCAssignOp:+=
JCIdent:标识符,可以是变量,类型,关键字等等
JCLiteral: 字面量表达式,如123, “string”等
JCBinary:二元操作符
详细可以看JCTree
里面的注释
每个子类都有解释
AST生成过程
前一篇了解了编译过程,这里翻一翻源码,再理解下;
public class ParserFactory {
/** The context key for the parser factory. */
protected static final Context.Key<ParserFactory> parserFactoryKey = new Context.Key<>();
public static ParserFactory instance(Context context) {
ParserFactory instance = context.get(parserFactoryKey);
if (instance == null) {
instance = new ParserFactory(context);
}
return instance;
}
final TreeMaker F;
final DocTreeMaker docTreeMaker;
final Log log;
final Tokens tokens;
final Source source;
final Preview preview;
final Names names;
final Options options;
final ScannerFactory scannerFactory;
final Locale locale;
protected ParserFactory(Context context) {
super();
context.put(parserFactoryKey, this);
this.F = TreeMaker.instance(context);
this.docTreeMaker = DocTreeMaker.instance(context);
this.log = Log.instance(context);
this.names = Names.instance(context);
this.tokens = Tokens.instance(context);
this.source = Source.instance(context);
this.preview = Preview.instance(context);
this.options = Options.instance(context);
this.scannerFactory = ScannerFactory.instance(context);
this.locale = context.get(Locale.class);
}
public JavacParser newParser(CharSequence input, boolean keepDocComments, boolean keepEndPos, boolean keepLineMap) {
return newParser(input, keepDocComments, keepEndPos, keepLineMap, false);
}
public JavacParser newParser(CharSequence input, boolean keepDocComments, boolean keepEndPos, boolean keepLineMap, boolean parseModuleInfo) {
// Scanner解析代码为标准Token
Lexer lexer = scannerFactory.newScanner(input, keepDocComments);
// JavacParser创建语法树,JCTree的各种子类
return new JavacParser(this, lexer, keepDocComments, keepLineMap, keepEndPos, parseModuleInfo);
}
}
AST生成大概为两步
1.词法分析,获得所有Token(com.sun.tools.javac.parser.Scanner
的com.sun.tools.javac.parser.JavaTokenizer
)
2.语法分析,创建语法树(JavacParser/TreeMaker/JCTree及其子类
)
我把Token列出来吧,这样好理解点
EOF(),
ERROR(),
IDENTIFIER(Tag.NAMED),
ABSTRACT("abstract"),
ASSERT("assert", Tag.NAMED),
BOOLEAN("boolean", Tag.NAMED),
BREAK("break"),
BYTE("byte", Tag.NAMED),
CASE("case"),
CATCH("catch"),
CHAR("char", Tag.NAMED),
CLASS("class"),
CONST("const"),
CONTINUE("continue"),
DEFAULT("default"),
DO("do"),
DOUBLE("double", Tag.NAMED),
ELSE("else"),
ENUM("enum", Tag.NAMED),
EXTENDS("extends"),
FINAL("final"),
FINALLY("finally"),
FLOAT("float", Tag.NAMED),
FOR("for"),
GOTO("goto"),
IF("if"),
IMPLEMENTS("implements"),
IMPORT("import"),
INSTANCEOF("instanceof"),
INT("int", Tag.NAMED),
INTERFACE("interface"),
LONG("long", Tag.NAMED),
NATIVE("native"),
NEW("new"),
PACKAGE("package"),
PRIVATE("private"),
PROTECTED("protected"),
PUBLIC("public"),
RETURN("return"),
SHORT("short", Tag.NAMED),
STATIC("static"),
STRICTFP("strictfp"),
SUPER("super", Tag.NAMED),
SWITCH("switch"),
SYNCHRONIZED("synchronized"),
THIS("this", Tag.NAMED),
THROW("throw"),
THROWS("throws"),
TRANSIENT("transient"),
TRY("try"),
VOID("void", Tag.NAMED),
VOLATILE("volatile"),
WHILE("while"),
INTLITERAL(Tag.NUMERIC),
LONGLITERAL(Tag.NUMERIC),
FLOATLITERAL(Tag.NUMERIC),
DOUBLELITERAL(Tag.NUMERIC),
CHARLITERAL(Tag.NUMERIC),
STRINGLITERAL(Tag.STRING),
TRUE("true", Tag.NAMED),
FALSE("false", Tag.NAMED),
NULL("null", Tag.NAMED),
UNDERSCORE("_", Tag.NAMED),
ARROW("->"),
COLCOL("::"),
LPAREN("("),
RPAREN(")"),
LBRACE("{"),
RBRACE("}"),
LBRACKET("["),
RBRACKET("]"),
SEMI(";"),
COMMA(","),
DOT("."),
ELLIPSIS("..."),
EQ("="),
GT(">"),
LT("<"),
BANG("!"),
TILDE("~"),
QUES("?"),
COLON(":"),
EQEQ("=="),
LTEQ("<="),
GTEQ(">="),
BANGEQ("!="),
AMPAMP("&&"),
BARBAR("||"),
PLUSPLUS("++"),
SUBSUB("--"),
PLUS("+"),
SUB("-"),
STAR("*"),
SLASH("/"),
AMP("&"),
BAR("|"),
CARET("^"),
PERCENT("%"),
LTLT("<<"),
GTGT(">>"),
GTGTGT(">>>"),
PLUSEQ("+="),
SUBEQ("-="),
STAREQ("*="),
SLASHEQ("/="),
AMPEQ("&="),
BAREQ("|="),
CARETEQ("^="),
PERCENTEQ("%="),
LTLTEQ("<<="),
GTGTEQ(">>="),
GTGTGTEQ(">>>="),
MONKEYS_AT("@"),
CUSTOM;
全是语言的关键字/运算符等,其实想想我们平时写的代码也差不多的了
public String name = "mike"
-
关键字:
public String
-
自定义:
name
-
运算/调用:
= "mike"
函数定义及调用都差不多,都围绕上面Token进行拆解。
我这里就有个疑问:是一个个文件进行分析的吗??是的!!
JavaCompiler
中com.sun.tools.javac.main.JavaCompiler#parse(javax.tools.JavaFileObject, java.lang.CharSequence)
函数是分析入口
而入参就是文件内容,后缀为.class
或.java
,下图是会调用JavaCompiler#parse
的方法
语法分析
语法分析其实就是把上面的“零件”拼出我们编写的代码逻辑(也就是语法树),不然这一堆零件上帝都不知道是干嘛的呢。
Parser
接口类定义了解析各部分的方法了,实现类是JavaParser
,我们来看下接口类
/**
* Reads syntactic units from source code.
* Parsers are normally created from a ParserFactory. // 使用工厂类ParserFactory创建解析器
*
* <p><b>This is NOT part of any supported API.
* If you write code that depends on this, you do so at your own risk.
* This code and its internal interfaces are subject to change or
* deletion without notice.</b>
*/
public interface Parser {
/**
* Parse a compilation unit. // 分析编译单元
* @return a compilation unit
*/
JCCompilationUnit parseCompilationUnit();
/**
* Parse an expression. // 分析表达式
* @return an expression
*/
JCExpression parseExpression();
/**
* Parse a statement. //分析语句(各种声明语句),返回表达式
* @return an expression
*/
JCStatement parseStatement();
/**
* Parse a type. // 分析表达式的类型,也就是String/int这类的类型
* @return an expression for a type
*/
JCExpression parseType();
}
如果要详细研究分析过程,那就要看JavaParser
的实现了,这里只是了解下
parseCompilationUnit:分析编译单元
看了下调用处com.sun.tools.javac.main.JavaCompiler#parse(javax.tools.JavaFileObject, java.lang.CharSequence)
/** Parse contents of input stream.
* @param filename The name of the file from which input stream comes.
* @param content The characters to be parsed.
*/
protected JCCompilationUnit parse(JavaFileObject filename, CharSequence content) {
long msec = now();
JCCompilationUnit tree = make.TopLevel(List.nil());
if (content != null) {
if (verbose) {
log.printVerbose("parsing.started", filename);
}
if (!taskListener.isEmpty()) {
TaskEvent e = new TaskEvent(TaskEvent.Kind.PARSE, filename);
taskListener.started(e);
keepComments = true;
genEndPos = true;
}
// 通过单例工厂类创建解析器
Parser parser = parserFactory.newParser(content, keepComments(), genEndPos,
lineDebugInfo, filename.isNameCompatible("module-info", Kind.SOURCE));
// 生成语法树
tree = parser.parseCompilationUnit();
if (verbose) {
log.printVerbose("parsing.done", Long.toString(elapsed(msec)));
}
}
tree.sourcefile = filename;
if (content != null && !taskListener.isEmpty()) {
TaskEvent e = new TaskEvent(TaskEvent.Kind.PARSE, tree);
taskListener.finished(e);
}
return tree;
}
感觉是以类文件为单元的,分为几部分
CompilationUnit = [ { "@" Annotation } PACKAGE Qualident ";"] {ImportDeclaration} {TypeDeclaration}
-
Annotation 注解
-
Package:包
-
Import:导入包
-
TypeDeclaration:类声明
整体结构其实跟在线AST的解析一致,只是测试代码没有注解
代码逻辑其实生成这几部分的内容,返回一个JCCompilationUnit
,也就是上图的结构
/** CompilationUnit = [ { "@" Annotation } PACKAGE Qualident ";"] {ImportDeclaration} {TypeDeclaration}
*/
public JCTree.JCCompilationUnit parseCompilationUnit() {
Token firstToken = token;
JCModifiers mods = null;
boolean consumedToplevelDoc = false;
boolean seenImport = false;
boolean seenPackage = false;
ListBuffer<JCTree> defs = new ListBuffer<>();
if (token.kind == MONKEYS_AT) // 是否注解,MONKEYS_AT=@
mods = modifiersOpt();
if (token.kind == PACKAGE) { //是否package
int packagePos = token.pos;
List<JCAnnotation> annotations = List.nil();
seenPackage = true;
if (mods != null) {
checkNoMods(mods.flags);
annotations = mods.annotations;
mods = null;
}
nextToken();
JCExpression pid = qualident(false);
accept(SEMI);
JCPackageDecl pd = toP(F.at(packagePos).PackageDecl(annotations, pid));
attach(pd, firstToken.comment(CommentStyle.JAVADOC));
consumedToplevelDoc = true;
defs.append(pd);
}
boolean checkForImports = true;
boolean firstTypeDecl = true;
while (token.kind != EOF) { //这里会循环遍历所有token数据,直到遇到EOF,完整的JCCompilationUnit都是以EOF结束
if (token.pos <= endPosTable.errorEndPos) {
// error recovery
skip(checkForImports, false, false, false);
if (token.kind == EOF)
break;
}
if (checkForImports && mods == null && token.kind == IMPORT) { //是否import
seenImport = true;
defs.append(importDeclaration());
} else {
Comment docComment = token.comment(CommentStyle.JAVADOC); //是否JavaDoc
if (firstTypeDecl && !seenImport && !seenPackage) {
docComment = firstToken.comment(CommentStyle.JAVADOC);
consumedToplevelDoc = true;
}
if (mods != null || token.kind != SEMI) //是否语句结束符分号;
mods = modifiersOpt(mods);
if (firstTypeDecl && token.kind == IDENTIFIER) { //是否自定义声明
ModuleKind kind = ModuleKind.STRONG;
if (token.name() == names.open) {
kind = ModuleKind.OPEN;
nextToken();
}
if (token.kind == IDENTIFIER && token.name() == names.module) {
if (mods != null) {
checkNoMods(mods.flags & ~Flags.DEPRECATED);
}
defs.append(moduleDecl(mods, kind, docComment));
consumedToplevelDoc = true;
break;
} else if (kind != ModuleKind.STRONG) {
reportSyntaxError(token.pos, Errors.ExpectedModule);
}
}
JCTree def = typeDeclaration(mods, docComment);
if (def instanceof JCExpressionStatement)
def = ((JCExpressionStatement)def).expr;
defs.append(def);
if (def instanceof JCClassDecl)
checkForImports = false;
mods = null;
firstTypeDecl = false;
}
}
JCTree.JCCompilationUnit toplevel = F.at(firstToken.pos).TopLevel(defs.toList());
if (!consumedToplevelDoc)
attach(toplevel, firstToken.comment(CommentStyle.JAVADOC));
if (defs.isEmpty())
storeEnd(toplevel, S.prevToken().endPos);
if (keepDocComments)
toplevel.docComments = docComments;
if (keepLineMap)
toplevel.lineMap = S.getLineMap();
this.endPosTable.setParser(null); // remove reference to parser
toplevel.endPositions = this.endPosTable;
return toplevel;
}
其实前面编译原理也说了:语法树节点都是JCTree
的子类构建的,通过TreeMaker
创建,是的,我这里抠个例子代码
//com.sun.tools.javac.parser.JavacParser#modifiersOpt(com.sun.tools.javac.tree.JCTree.JCModifiers)
JCModifiers mods = F.at(pos).Modifiers(flags, annotations.toList());
//F 就是TreeMaker对象
//Modifiers() 构建JCModifiers修饰符节点
parseExpression:分析表达式
这个函数好像并不是编译时调用的,原因是啥?上面parseCompilationUnit()
已经返回JCCompilationUnit
对象了,也就是编译单元的AST
表达式是怎么样的?理解这里有助于理解codeql
里的Expr
官方注释
{
Expression = Expression1 [ExpressionRest]
ExpressionRest = [AssignmentOperator Expression1]
AssignmentOperator = "=" | "+=" | "-=" | "*=" | "/=" | //运算符
"&=" | "|=" | "^=" |
"%=" | "<<=" | ">>=" | ">>>="
Type = Type1 //数据类型,变量或方法返回值
TypeNoParams = TypeNoParams1 //这个不清楚,参数类型吧
StatementExpression = Expression //语句表达式??
ConstantExpression = Expression //变量赋值
}
我也看不懂,没关系,继续学习呗
跟了下代码逻辑,但还是没太理解,但翻到一篇文章,文末“表达式”
定义:Java 表达式由变量、运算符、字面量和方法调用组成
表达式的两个工作
-
执行表达式元素指示的计算
-
返回一些值
个人理解加测试,总结了下
1.表达式是在TypeDeclaration,也就是classBody
2.所有变量名是表达式;包含参数变量/局部变量/成员变量/引用变量
3.所有字面量是表达式;即i=1的1
4.所有运算符参与的(即上面的AssignmentOperator)
- String n = "n"的n = "n"(赋值运算),"n"(字面量)
- int a = i-1的a = i-1,i-1,1
5.函数调用
- System.out.println("name->" + name)的System.out.println
- if (!"".equalsIgnoreCase(name))的"".equalsIgnoreCase(name)
6.数据类型
- private String name;的String
- public String getName() {}方法的返回值类型String
- public void setPassword(String password) {}方法的入参类型String
- private static final Logger logger = LogManager.getLogger(UserAction.class)日志对象类型Logger
总结的说,其实就是包含
-
变量名,如
String name = "n"
的name
-
运算符参与的部分,如四则运算/逻辑运算/位运算/三目运算等
-
字面量,如
String n = "n"
的n
-
类型,如变量类型/参数类型/返回值类型等
-
赋值表达式,如
n = "name"
-
函数调用,如
"".equalsIgnoreCase(name)
,还有创建对象new UserAction()
等
parseStatement:分析语句
该方法仅在ReplParser
中调用,这是 JavacParser 的一个子类,它使用该方法的修改版本覆盖一个方法,该方法旨在允许在没有类、方法等周围上下文的情况下解析 Java 代码的一个“片段”。
只是不知道何时调用,不管先,这里主要是理解语句是啥就行,再深入研究就真的研究编译器了~
语句是怎么样的?理解这里有助于理解codeql
里的Stmt
语句其实是表达式的集合,根据语言语法组成类似自然语言的句子,来表示一定逻辑,下面具体看看
String n; //声明语句
n= "n"; //赋值语句,但其实去掉;就是表达式了
if (!"".equalsIgnoreCase(name)) {
this.name = name;
}
if (!"".equalsIgnoreCase(name)) //if语句
{this.name = name;} //块语句
语句构成部分由两部分
-
语法关键字,如if()/{}/return等
-
表达式
差不多是这样了
parseType:分析类型
这个是解析注解的,在parseCompilationUnit()
解析过程很多处调用,注解主要包含这几个位置有
-
类注解
-
参数注解
-
方法注解
详细自己去研究下,我现在知道他是干嘛的就行。
TreeMaker
我们再来看下如何创建语法树的,编译原理中也说道:使用TreeMaker
创建语法树
TreeMaker
就跟JCTree
在同一个包下com.sun.tools.javac.tree.TreeMaker
,实现了JCTree.Factory
的工厂类
An interface for tree factories
*/
public interface Factory {
JCCompilationUnit TopLevel(List<JCTree> defs);
JCPackageDecl PackageDecl(List<JCAnnotation> annotations,
JCExpression pid);
JCImport Import(JCTree qualid, boolean staticImport);
JCClassDecl ClassDef(JCModifiers mods,
Name name,
typarams,
JCExpression extending,
implementing,
defs);
JCMethodDecl MethodDef(JCModifiers mods,
Name name,
JCExpression restype,
typarams,
JCVariableDecl recvparam,
params,
thrown,
JCBlock body,
JCExpression defaultValue);
JCVariableDecl VarDef(JCModifiers mods,
Name name,
JCExpression vartype,
JCExpression init);
JCSkip Skip();
JCBlock Block(long flags, List<JCStatement> stats);
JCDoWhileLoop DoLoop(JCStatement body, JCExpression cond);
JCWhileLoop WhileLoop(JCExpression cond, JCStatement body);
JCForLoop ForLoop(List<JCStatement> init,
JCExpression cond,
step,
JCStatement body);
JCEnhancedForLoop ForeachLoop(JCVariableDecl var, JCExpression expr, JCStatement body);
JCLabeledStatement Labelled(Name label, JCStatement body);
JCSwitch Switch(JCExpression selector, List<JCCase> cases);
JCCase Case(JCExpression pat, List<JCStatement> stats);
JCSynchronized Synchronized(JCExpression lock, JCBlock body);
JCTry Try(JCBlock body, List<JCCatch> catchers, JCBlock finalizer);
JCTry Try(List<JCTree> resources,
JCBlock body,
catchers,
JCBlock finalizer);
JCCatch Catch(JCVariableDecl param, JCBlock body);
JCConditional Conditional(JCExpression cond,
JCExpression thenpart,
JCExpression elsepart);
JCIf If(JCExpression cond, JCStatement thenpart, JCStatement elsepart);
JCExpressionStatement Exec(JCExpression expr);
JCBreak Break(Name label);
JCContinue Continue(Name label);
JCReturn Return(JCExpression expr);
JCThrow Throw(JCExpression expr);
JCAssert Assert(JCExpression cond, JCExpression detail);
JCMethodInvocation Apply(List<JCExpression> typeargs,
JCExpression fn,
args);
JCNewClass NewClass(JCExpression encl,
typeargs,
JCExpression clazz,
args,
JCClassDecl def);
JCNewArray NewArray(JCExpression elemtype,
dims,
elems);
JCParens Parens(JCExpression expr);
JCAssign Assign(JCExpression lhs, JCExpression rhs);
JCAssignOp Assignop(Tag opcode, JCTree lhs, JCTree rhs);
JCUnary Unary(Tag opcode, JCExpression arg);
JCBinary Binary(Tag opcode, JCExpression lhs, JCExpression rhs);
JCTypeCast TypeCast(JCTree expr, JCExpression type);
JCInstanceOf TypeTest(JCExpression expr, JCTree clazz);
JCArrayAccess Indexed(JCExpression indexed, JCExpression index);
JCFieldAccess Select(JCExpression selected, Name selector);
JCIdent Ident(Name idname);
JCLiteral Literal(TypeTag tag, Object value);
JCPrimitiveTypeTree TypeIdent(TypeTag typetag);
JCArrayTypeTree TypeArray(JCExpression elemtype);
JCTypeApply TypeApply(JCExpression clazz, List<JCExpression> arguments);
JCTypeParameter TypeParameter(Name name, List<JCExpression> bounds);
JCWildcard Wildcard(TypeBoundKind kind, JCTree type);
TypeBoundKind TypeBoundKind(BoundKind kind);
JCAnnotation Annotation(JCTree annotationType, List<JCExpression> args);
JCModifiers Modifiers(long flags, List<JCAnnotation> annotations);
JCErroneous Erroneous(List<? extends JCTree> errs);
JCModuleDecl ModuleDef(JCModifiers mods, ModuleKind kind, JCExpression qualId, List<JCDirective> directives);
JCExports Exports(JCExpression qualId, List<JCExpression> moduleNames);
JCOpens Opens(JCExpression qualId, List<JCExpression> moduleNames);
JCProvides Provides(JCExpression serviceName, List<JCExpression> implNames);
JCRequires Requires(boolean isTransitive, boolean isStaticPhase, JCExpression qualId);
JCUses Uses(JCExpression qualId);
LetExpr LetExpr(List<JCVariableDecl> defs, JCExpression expr);
}
因为是要构建树结构,肯定存在节点间的关系的,工厂类ParserFactory
创建JavaParser
对象的时候会初始化一个TreeMaker
,用于保存解析过程中的JCTree
节点信息
其实前面已经提过节点创建了,都是通过TreeMaker
对象构成不同的JCTree
节点,想了解JCTree
子类的解读可以看大佬的文章(文末“JCTree子类解读”)
创建主要是两部分东西
-
JCTree
子类:节点对象 -
tree.pos
:标记该节点在源代码中的位置
抠个代码例子
解析package的时候,会会获取token的位置pos
然后TreeMaker创建JCPackageDecl前先设置pos
/** Reassign current position.
*/
public TreeMaker at(int pos) {
this.pos = pos;
return this;
}
再调用PackageDecl
创建该对象,其中设置tree
的pos
为刚刚设置的pos
public JCPackageDecl PackageDecl(List<JCAnnotation> annotations,
JCExpression pid) {
Assert.checkNonNull(annotations);
Assert.checkNonNull(pid);
JCPackageDecl tree = new JCPackageDecl(annotations, pid);
tree.pos = pos;
return tree;
}
这样即完成节点间的关系设置,从而形成树结构。
但聪明的你可能想到:pos只是token的位置吧,可能也就是个初始位置~
对的,我也没太去看逻辑,感兴趣的可以去看下Scanner
的token
处理,看pos
是哪个位置。
总结
1.以类文件为编译单元,如.class
或.java
文件
2.表达式:Java 表达式由变量、运算符、字面量和方法调用组成
- 变量名,如`String name = "n"`的`name`
- 运算符参与的部分,如四则运算/逻辑运算/位运算/三目运算等
- 字面量,如`String n = "n"`的`n`
- 类型,如变量类型/参数类型/返回值类型等
- 赋值表达式,如`n = "name"`
- 函数调用,如`"".equalsIgnoreCase(name)`,还有创建对象`new UserAction()`等
3.语句:其实是表达式的集合,根据语言语法组成类似自然语言的句子,来表示一定逻辑,下面具体看看
String n; //声明语句
n= "n"; //赋值语句,但其实去掉;就是表达式了
if (!"".equalsIgnoreCase(name)) {
this.name = name;
}
if (!"".equalsIgnoreCase(name)) //if语句
{this.name = name;} //块语句
语句构成部分由两部分
-
语法关键字,如if()/{}/return/分号等
-
表达式
参考资料
JCTree子类解读:https://www.jianshu.com/p/4bd5dc13f35a
https://openjdk.org/groups/compiler/doc/compilation-overview/index.html
表达式/语句/块:https://docs.oracle.com/javase/tutorial/java/nutsandbolts/expressions.html
表达式:https://www.whitman.edu/mathematics/java_tutorial/java/nutsandbolts/expressions.html
原文始发于微信公众号(alumm0x):CodeQL之前置知识AST生成过程
- 左青龙
- 微信扫一扫
-
- 右白虎
- 微信扫一扫
-
评论