JS 语法树学习(全)

简介

最开始 Mozilla JS Parser API 是 Mozilla 工程师在 Firefox 中创建的 SpiderMonkey 引擎输出 JavaScript AST 的规范文档。而后随着 Javascript 更多语法的加入,The ESTree Spec 诞生了,作为参与构建和使用这些工具的人员的社区标准。这两者的区别在于 Parser API 中描述了一些特定于 SpiderMonkey 引擎的行为,而 ESTree 是社区规范,并且向后兼容 SpiderMonkey 格式。

解析器

Parser 解析一般分为两步,词法分析和语法分析。本文使用 Acorn@7.2.0 作为 Javascript 的解析器,以下面的 JS 代码为例:

const href = 'https://vincentstudio.info'

词法分析

词法分析会把代码转化成令牌(Tokens)流,例如上面的案例,得到的结果大致如下:

[
  Token {
    type: TokenType { label: 'const', keyword: 'const' ... },
    value: 'const', ...
  },
  Token {
    type: TokenType { label: 'name', keyword: 'undefined' ... },
    value: 'href', ...
  },
  Token {
    type: TokenType { label: '=', keyword: 'undefined' ... },
    value: '=', ...
  },
  Token {
    type: TokenType { label: 'string', keyword: 'undefined' ... },
    value: 'https://vincentstudio.info', ...
  },
  Token {
    type: TokenType { label: 'eof', keyword: 'undefined' ... },
    value: undefined, ...
]

Token 的数据结构:

class Token {
  type: TokenType
  value: any
  start: number
  end: number
  loc?: SourceLocation
  range?: [number, number]
}

TokenType 的数据结构:

class TokenType {
  label: string
  keyword: string
  beforeExpr: boolean
  startsExpr: boolean
  isLoop: boolean
  isAssign: boolean
  prefix: boolean
  postfix: boolean
  binop: number
  updateContext?: (prevType: TokenType) => void
}

语法分析

根据词法分析得到的 Tokens 流,将其转换成 AST,得到的结果大致如下:

Node {
  type: 'Program',
  sourceType: 'script',
  body: [
    Node {
      type: 'VariableDeclaration',
      kind: 'const',
      declarations: [
        Node {
          type: 'VariableDeclaration',
          kind: 'const'
          declarations: [
            Node {
              type: 'VariableDeclarator',
              id: Node { type: 'Identifier', name: 'href' },
              init: Node { type: 'Literal', value: 'https://vincentstudio.info' }
            }
          ]
        }
      ]
    }
  ]
}

AST 的所有节点都是 Node 的实例,它的数据结构如下:

class Node {
  type: string
  start: number
  end: number
  loc?: SourceLocation
  sourceFile?: string
  range?: [number, number]
}

ES5

Node 大致分为以下 8 个大类:

Program 根节点

interface Program <: Node {
    type: "Program";
    body: [ Statement ];
}

AST 的顶部, body 包含了多个 Statement(语句)节点。

Identifier 标识符

interface Identifier <: Expression, Pattern {
    type: "Identifier";
    name: string;
}

用户自定义的名称,如变量名,函数名,属性名等。

Literal 字面量

interface Literal <: Expression {
    type: "Literal";
    value: string | boolean | null | number | RegExp;
}

从 value 的类型可以看出,字面量就是值,他的类型有字符串,布尔,数值,null 和正则。

Statement 语句

interface Statement <: Node { }

从根节点就可以看出,AST 是由 Statement 数组构成,我认为 Statement 应该是 AST 中除了 Program 最大的概念了,JS 的各种语法也是从 Statement 展开的:

  • 空语句 “;”
    interface EmptyStatement <: Statement {
        type: "EmptyStatement";
    }
    
  • 调试语句 “debugger;”
    interface DebuggerStatement <: Statement {
        type: "DebuggerStatement";
    }
    
  • 表达式语句 “1 + 1;”
    interface ExpressionStatement <: Statement {
        type: "ExpressionStatement";
        expression: Expression;
    }
    
  • 块语句 “{[body]}”
    interface BlockStatement <: Statement {
        type: "BlockStatement";
        body: [ Statement ];
    }
    
  • With语句 “with ([object]) {[body]}”
    interface WithStatement <: Statement {
        type: "WithStatement";
        object: Expression;
        body: Statement;
    }
    
  • 流程控制语句
    • Return语句 “return [argument]”
      interface ReturnStatement <: Statement {
        type: "ReturnStatement";
        argument: Expression | null;
      }
      
    • 标签语句 “loop: … break loop;”
      interface LabeledStatement <: Statement {
        type: "LabeledStatement";
        label: Identifier;
        body: Statement;
      }
      
    • Break语句 “break [label?];”
      interface BreakStatement <: Statement {
        type: "BreakStatement";
        label: Identifier | null;
      }
      
    • Continue语句 “continue [label?];”
      interface ContinueStatement <: Statement {
        type: "ContinueStatement";
        label: Identifier | null;
      }
      
  • 条件语句
    • If语句 “if ([test]) {[consequent]} else {[alternate]}”
      interface IfStatement <: Statement {
        type: "IfStatement";
        test: Expression;
        consequent: Statement;
        alternate: Statement | null;
      }
      
    • Switch语句 “switch ([discriminant]) {[cases]}”
      interface SwitchStatement <: Statement {
        type: "SwitchStatement";
        discriminant: Expression;
        cases: [ SwitchCase ];
      }
      
      • SwitchCase节点 “case: [test]: [consequent]”
        interface SwitchCase <: Node {
          type: "SwitchCase";
          test: Expression | null;
          consequent: [ Statement ];
        }
        
  • 异常语句
    • Throw语句 “throw [argument]”
      interface ThrowStatement <: Statement {
        type: "ThrowStatement";
        argument: Expression;
      }
      
    • Try语句 “try {[block]} catch {[handler]} finally {[finalizer]}”
      interface TryStatement <: Statement {
        type: "TryStatement";
        block: BlockStatement;
        handler: CatchClause | null;
        finalizer: BlockStatement | null;
      }
      
      • Catch节点
        interface CatchClause <: Node {
          type: "CatchClause";
          param: Pattern;
          body: BlockStatement;
        }
        
  • 循环语句
    • While语句 “while ([test] {[body]}”
      interface WhileStatement <: Statement {
        type: "WhileStatement";
        test: Expression;
        body: Statement;
      }
      
    • DoWhile语句 “do {[test]} while ([body])”
      interface DoWhileStatement <: Statement {
        type: "DoWhileStatement";
        body: Statement;
        test: Expression;
      }
      
    • For语句 “for ([init];[test];[update]) {[body]}”
      interface ForStatement <: Statement {
        type: "ForStatement";
        init: VariableDeclaration | Expression | null;
        test: Expression | null;
        update: Expression | null;
        body: Statement;
      }
      
    • ForIn语句 “for ([left] in [right]) {[body]}”
      interface ForInStatement <: Statement {
        type: "ForInStatement";
        left: VariableDeclaration |  Pattern;
        right: Expression;
        body: Statement;
      }
      

Declaration 声明语句

interface Declaration <: Statement { }

声明语句节点,同样也是语句,只是一个类型的细化。

  • 函数声明 “function [id] ([params]) {[body]}”
    interface FunctionDeclaration <: Function, Declaration {
      type: "FunctionDeclaration";
      id: Identifier;
    }
    
    • 函数
      interface Function <: Node {
        id: Identifier | null;
        params: [ Pattern ];
        body: FunctionBody;
      }
      
  • 变量声明 “var a = 10;”
    interface VariableDeclaration <: Declaration {
      type: "VariableDeclaration";
      declarations: [ VariableDeclarator ];
      kind: "var";
    }
    
    • 变量声明描述
      interface VariableDeclarator <: Node {
        type: "VariableDeclarator";
        id: Pattern;
        init: Expression | null;
      }
      

Expression 表达式

interface Expression <: Node { }
  • This表达式 “this”
    interface ThisExpression <: Expression {
      type: "ThisExpression";
    }
    
  • Array表达式 “[1, 2, 3]”
    interface ArrayExpression <: Expression {
      type: "ArrayExpression";
      elements: [ Expression | null ];
    }
    
  • Object表达式 “{ a: 1 }”
    interface ObjectExpression <: Expression {
      type: "ObjectExpression";
      properties: [ Property ];
    }
    
    • 属性节点
      interface Property <: Node {
        type: "Property";
        key: Literal | Identifier;
        value: Expression;
        kind: "init" | "get" | "set";
      }
      
  • 函数表达式 “function ([params]) {[body]}”
    interface FunctionExpression <: Function, Expression {
      type: "FunctionExpression";
    }
    
  • 一元操作
    • Unary表达式
      interface UnaryExpression <: Expression {
        type: "UnaryExpression";
        operator: UnaryOperator;
        prefix: boolean;
        argument: Expression;
      }
      
      • Unary运算符 “typeof a”
        enum UnaryOperator {
          "-" | "+" | "!" | "~" | "typeof" | "void" | "delete"
        }
        
    • Update表达式 “a++” “—a”
      interface UpdateExpression <: Expression {
        type: "UpdateExpression";
        operator: UpdateOperator;
        argument: Expression;
        prefix: boolean;
      }
      
      • Update运算符
        enum UpdateOperator {
          "++" | "--"
        }
        
  • 二元操作
    • Binary表达式 “a > b”
      interface BinaryExpression <: Expression {
        type: "BinaryExpression";
        operator: BinaryOperator;
        left: Expression;
        right: Expression;
      }
      
      • Binary运算符
        enum BinaryOperator {
          "==" | "!=" | "===" | "!=="
              | "<" | "<=" | ">" | ">="
              | "<<" | ">>" | ">>>"
              | "+" | "-" | "*" | "/" | "%"
              | "|" | "^" | "&" | "in"
              | "instanceof"
        }
        
  • 赋值表达式 “a = 1”
    interface AssignmentExpression <: Expression {
      type: "AssignmentExpression";
      operator: AssignmentOperator;
      left: Pattern | Expression;
      right: Expression;
    }
    
    • 赋值运算符
      enum AssignmentOperator {
        "=" | "+=" | "-=" | "*=" | "/=" | "%="
            | "<<=" | ">>=" | ">>>="
            | "|=" | "^=" | "&="
      }
      
  • 逻辑表达式 “a && b”
    interface LogicalExpression <: Expression {
      type: "LogicalExpression";
      operator: LogicalOperator;
      left: Expression;
      right: Expression;
    }
    
    • 逻辑运算符
      enum LogicalOperator {
        "||" | "&&"
      }
      
  • 成员表达式 “a.b”
    interface MemberExpression <: Expression, Pattern {
      type: "MemberExpression";
      object: Expression;
      property: Expression;
      computed: boolean;
    }
    
  • 条件表达式 “a > b ? c : d”
    interface ConditionalExpression <: Expression {
      type: "ConditionalExpression";
      test: Expression;
      alternate: Expression;
      consequent: Expression;
    }
    
  • 函数调用表达式 “func(1, 2)”
    interface CallExpression <: Expression {
      type: "CallExpression";
      callee: Expression;
      arguments: [ Expression ];
    }
    
  • New表达式 “new Date()”
    interface NewExpression <: Expression {
      type: "NewExpression";
      callee: Expression;
      arguments: [ Expression ];
    }
    
  • Sequence表达式 “1,2,3”
    interface SequenceExpression <: Expression {
      type: "SequenceExpression";
      expressions: [ Expression ];
    }
    

Patterns 模式

interface Pattern <: Node { }

主要在 ES6 的解构赋值中有意义,在 ES5 中,可以理解为和 Identifier 差不多的东西。

ES2015

Program 根节点

extend interface Program {
    sourceType: "script" | "module";
    body: [ Statement | ModuleDeclaration ];
}

如果是 ES6 模块,必须指定 sourceType 为 “module”,否则将指定为 “script”。

Function 函数

extend interface Function {
    generator: boolean;
}

支持 Generator 函数

Statement 语句

  • ForOf语句 “for (let [left] of [right])”
    interface ForOfStatement <: ForInStatement {
      type: "ForOfStatement";
    }
    

Declaration 声明

  • 变量声明
    extend interface VariableDeclaration {
      kind: "var" | "let" | "const";
    }
    

Expression 表达式

  • Super表达式 “super([arguments])”
    interface Super <: Node {
      type: "Super";
    }
    extend interface CallExpression {
      callee: Expression | Super;
    }
    extend interface MemberExpression {
      object: Expression | Super;
    }
    
  • Spread表达式 “[head, …iter]”
    interface SpreadElement <: Node {
      type: "SpreadElement";
      argument: Expression;
    }
    extend interface ArrayExpression {
      elements: [ Expression | SpreadElement | null ];
    }
    extend interface CallExpression {
      arguments: [ Expression | SpreadElement ];
    }
    extend interface NewExpression {
      arguments: [ Expression | SpreadElement ];
    }
    
  • 箭头函数表达式 “() => {[body]}”
    interface ArrowFunctionExpression <: Function, Expression {
      type: "ArrowFunctionExpression";
      body: FunctionBody | Expression;
      expression: boolean;
    }
    
  • Yield表达式 “yield [argument]”
    interface YieldExpression <: Expression {
      type: "YieldExpression";
      argument: Expression | null;
      delegate: boolean;
    }
    
  • 模板字面量 “Hello ${name}
    interface TemplateLiteral <: Expression {
      type: "TemplateLiteral";
      quasis: [ TemplateElement ];
      expressions: [ Expression ];
    }
    
    • 模板元素
      interface TemplateElement <: Node {
        type: "TemplateElement";
        tail: boolean;
        value: {
            cooked: string;
            raw: string;
        };
      }
      
  • 带标签的模板字符串表达式 MDN链接
    interface TaggedTemplateExpression <: Expression {
      type: "TaggedTemplateExpression";
      tag: Expression;
      quasi: TemplateLiteral;
    }
    

Pattern 模式

主要跟解构赋值相关

  • ObjectPattern “{ a, b: c } = { a: 1, b: { c: 2 }}”
    interface AssignmentProperty <: Property {
      type: "Property"; // inherited
      value: Pattern;
      kind: "init";
      method: false;
    }
    interface ObjectPattern <: Pattern {
      type: "ObjectPattern";
      properties: [ AssignmentProperty ];
    }
    
  • ArrayPattern “[a, b] = [1, 2]”
    interface ArrayPattern <: Pattern {
      type: "ArrayPattern";
      elements: [ Pattern | null ];
    }
    
  • RestElement “fun(…args){}”
    interface RestElement <: Pattern {
      type: "RestElement";
      argument: Pattern;
    }
    
  • AssignmentPattern “fun(a=10){}”
    interface AssignmentPattern <: Pattern {
      type: "AssignmentPattern";
      left: Pattern;
      right: Expression;
    }
    

Class 类

interface Class <: Node {
    id: Identifier | null;
    superClass: Expression | null;
    body: ClassBody;
}
  • 类主体
    interface ClassBody <: Node {
      type: "ClassBody";
      body: [ MethodDefinition ];
    }
    
  • 方法定义
    interface MethodDefinition <: Node {
      type: "MethodDefinition";
      key: Expression;
      value: FunctionExpression;
      kind: "constructor" | "method" | "get" | "set";
      computed: boolean;
      static: boolean;
    }
    
  • 类声明 “class [name] [extends] {[body]}”
    interface ClassDeclaration <: Class, Declaration {
      type: "ClassDeclaration";
      id: Identifier;
    }
    
  • 类表达式 “const A = class [name] [extends] {[body]}”
    interface ClassExpression <: Class, Expression {
      type: "ClassExpression";
    }
    
  • 元属性 “new.target”
    interface MetaProperty <: Expression {
      type: "MetaProperty";
      meta: Identifier;
      property: Identifier;
    }
    

Module 模块

  • 模块声明
    interface ModuleDeclaration <: Node { }
    
  • 模块说明符
    interface ModuleSpecifier <: Node {
      local: Identifier;
    }
    
  • Import
    • 导入声明 “import foo from ‘mod’”
      interface ImportDeclaration <: ModuleDeclaration {
        type: "ImportDeclaration";
        specifiers: [ ImportSpecifier | ImportDefaultSpecifier | ImportNamespaceSpecifier ];
        source: Literal;
      }
      
    • 导入说明符 “import { foo as a } from ‘mod’”
      interface ImportSpecifier <: ModuleSpecifier {
        type: "ImportSpecifier";
        imported: Identifier;
      }
      
    • 默认导入说明符 “import foo from ‘mod’”
      interface ImportDefaultSpecifier <: ModuleSpecifier {
        type: "ImportDefaultSpecifier";
      }
      
    • 命名空间导入说明符 “import * as foo from ‘mod’”
      interface ImportNamespaceSpecifier <: ModuleSpecifier {
        type: "ImportNamespaceSpecifier";
      }
      
  • Exports
    • 部分导出声明 “export { foo, bar }” “export var foo = 1”
      interface ExportNamedDeclaration <: ModuleDeclaration {
        type: "ExportNamedDeclaration";
        declaration: Declaration | null;
        specifiers: [ ExportSpecifier ];
        source: Literal | null;
      }
      
    • 导出说明符 “export { foo }” “export { foo as bar }”
      interface ExportSpecifier <: ModuleSpecifier {
        type: "ExportSpecifier";
        exported: Identifier;
      }
      
    • 默认导出声明 “export default foo”
      interface AnonymousDefaultExportedFunctionDeclaration <: Function {
        type: "FunctionDeclaration";
        id: null;
      }
      interface AnonymousDefaultExportedClassDeclaration <: Class {
        type: "ClassDeclaration";
        id: null;
      }
      interface ExportDefaultDeclaration <: ModuleDeclaration {
        type: "ExportDefaultDeclaration";
        declaration: AnonymousDefaultExportedFunctionDeclaration | FunctionDeclaration | AnonymousDefaultExportedClassDeclaration | ClassDeclaration | Expression;
      }
      
    • 全部导出声明 “export * from ‘mod’”
      interface ExportAllDeclaration <: ModuleDeclaration {
        type: "ExportAllDeclaration";
        source: Literal;
      }
      

ES2016

新增二元运算符 **

extend enum BinaryOperator {
    "**"
}

新增赋值运算符 **=

extend enum AssignmentOperator {
    "**="
}

ES2017

async/await

extend interface Function {
    async: boolean;
}

interface AwaitExpression <: Expression {
    type: "AwaitExpression";
    argument: Expression;
}

ES2018

异步迭代器 for-await-of

extend interface ForOfStatement {
  await: boolean;
}

for await (const x of xs) {}

对象支持 Rest/Spread

extend interface ObjectExpression {
    properties: [ Property | SpreadElement ];
}
extend interface ObjectPattern {
    properties: [ AssignmentProperty | RestElement ];
}

ES2015 引入了 Rest 参数和 Spread 运算符,但仅作用于数组,ES2018 新增了对 Object 的支持。

非法转义序列

extend interface TemplateElement {
    value: {
        cooked: string | null;
        raw: string;
    };
}

ES2018 移除对 ECMAScript 在带标签的模版字符串中转义序列的语法限制。
之前,\u 开始一个 unicode 转义,\x 开始一个十六进制转义,\ 后跟一个数字开始一个八进制转义。这使得创建特定的字符串变得不可能,更多细节参考 MDN

ES2019

Catch 语句允许为空

extend interface CatchClause {
    param: Pattern | null;
}

try { } catch { }

ES2020

BigInt 字面量

extend interface Literal <: Expression {
    type: "Literal";
    value: string | boolean | null | number | RegExp | bigint;
}
interface BigIntLiteral <: Literal {
    bigint: string;
}

双问号运算符

extend enum LogicalOperator {
    "||" | "&&" | "??"
}

export * as 语法

extend interface ExportAllDeclaration {
    exported: Identifier | null;
}