表达式解析器 | Expression-Parser

中文

概述

表达式解析器是一个基于递归下降解析技术的 Java 项目，支持标量、向量、矩阵的数学表达式解析与计算。项目还提供了一个交互式 REPL（Read-Eval-Print Loop）终端，方便用户实时输入和计算表达式。

实现原理

本项目采用经典的编译器前端架构，将表达式解析分为四个阶段：

输入: "3 + 4 * 2"

阶段1: 词法分析 (Lexer)              阶段2: 语法分析 (Parser)
┌───────────────────┐                ┌───────────────────────────────────────┐
│ NUMBER(3)         │                │ expr                                  │
│ PLUS(+)           │                │ └── addExpr                           │
│ NUMBER(4)         │ ────────────▶ │     ├── term                          │
│ MULTIPLY(*)       │   Token序列    │     │   └── unary                     │
│ NUMBER(2)         │                │     │       └── power                 │
│ EOF               │                │     │           └── implicitMul       │
└───────────────────┘                │     │               └── postfix       │
                                     │     │                   └── factor    │
                                     │     │                       └── 3     │
                                     │     ├── PLUS                          │
                                     │     └── term                          │
                                     │         ├── unary                     │
                                     │         │   └── power                 │
                                     │         │       └── implicitMul       │
                                     │         │               └── postfix   │
                                     │         │                   └── factor│
                                     │         │                       └── 4 │
                                     │         ├── MULTIPLY                  │
                                     │         └── unary                     │
                                     │             └── power                 │
                                     │                 └── implicitMul       │
                                     │                         └── postfix   │
                                     │                             └── factor│
                                     │                                 └── 2 │
                                     └───────────────────────────────────────┘

阶段3: 构建 AST                        阶段4: 表达式求值 (Eval)
┌───────────────────┐                ┌─────────────────────────────────────┐
│ BinaryOpNode(+)   │                │ BinaryOpNode(+)                     │
│    /        \     │                │   left = 3.eval() = 3               │
│ Number    BinaryOp│ ────────────▶ │   right = *.eval()                  │
│ Node(3)   Node(*) │                │     left = 4.eval() = 4             │
│          /      \ │                │     right = 2.eval() = 2            │
│    Number   Number│                │     return 4 * 2 = 8                │
│    Node(4) Node(2)│                │   return 3 + 8 = 11                 │
└───────────────────┘                └─────────────────────────────────────┘

透传说明：在 3 + 4 * 2 例子中，power、implicitMul、postfix 层级均无对应运算符，直接透传给下一层：

power 无 ^ → 透传给 implicitMul

implicitMul 无隐式乘法（如 2x）→ 透传给 postfix

postfix 无阶乘 ! → 透传给 factor

factor 识别数字 → 返回 NumberNode

1. 词法分析 (Lexical Analysis)

词法分析器（Lexer）将输入字符串分割成一个个Token（词法单元）：

Token 类型	示例	说明
`NUMBER`	`3.14`, `1e-5`	数字（支持浮点数、科学计数法）
`IDENTIFIER`	`sin`, `PI`, `x`	标识符（函数名、常量、变量）
`PLUS/MINUS/MULTIPLY/DIVIDE`	`+`, `-`, `*`, `/`	四则运算符
`POWER`	`^`	幂运算符
`LPAREN/RPAREN`	`(`, `)`	括号

示例：输入 "3 + 4 * 2" 被分割为：[NUMBER:3] [PLUS] [NUMBER:4] [MULTIPLY] [NUMBER:2] [EOF]

2. 语法分析 (Syntax Analysis)

语法分析器（Parser）使用递归下降技术，根据运算符优先级递归解析表达式。

优先级从高到低：

优先级	语法规则	说明
高	`factor`	数字、标识符、括号表达式、数组
↑	`postfix`	阶乘（如 `5!`）
↑	`implicitMul`	隐式乘法（如 `2x`、`3(4+5)`）
↑	`power`	幂运算（右结合，如 `2^3^2 = 2^(3^2)`）
↑	`unary`	一元正负号（如 `-3^2 = -(3^2)`）
↑	`term`	乘、除、取模
↑	`addExpr`	加、减
低	`expr`	赋值表达式

语法规则（BNF 范式：→ 表示定义为，| 表示或，(...) 表示分组，* 表示零次或多次，? 表示零次或一次）：

expr        → IDENTIFIER ASSIGN expr | addExpr
addExpr     → term ((PLUS | MINUS) term)*
term        → unary ((MULTIPLY | DIVIDE | MODULO) unary)*
unary       → (PLUS | MINUS) unary | power
power       → implicitMul (POWER power)?
implicitMul → postfix (postfix)*
postfix     → factor FACTORIAL?
factor      → NUMBER | IDENTIFIER | LPAREN expr RPAREN | LBRACKET ... RBRACKET

3. 抽象语法树 (AST)

解析完成后生成抽象语法树（ExprNode），树节点类型包括：

节点类型	说明	示例
`NumberNode`	数字字面量	`3.14`
`VariableNode`	变量引用	`x`
`BinaryOpNode`	二元运算	`a + b`, `x * y`
`UnaryOpNode`	一元运算	`-x`, `+5`
`FunctionNode`	函数调用	`sin(PI/2)`
`AssignNode`	变量赋值	`x = 10`
`ArrayNode`	数组字面量	`[1, 2, 3]`

AST 示例：表达式 3 + 4 * 2 的语法树：

       (+)
      /   \
    (3)   (*)
         /   \
       (4)   (2)

4. 表达式求值 (Evaluation)

每个 AST 节点实现 eval() 方法，通过递归调用子节点的 eval() 完成求值。

以 BinaryOpNode 为例（源码）：

@Override
public double eval(Map<String, Double> context) {
    // 1. 递归求值左子树
    double leftVal = left.eval(context);
    // 2. 递归求值右子树
    double rightVal = right.eval(context);
    // 3. 应用运算符
    switch (op.type()) {
        case PLUS:  return leftVal + rightVal;
        case MINUS: return leftVal - rightVal;
        case MULTIPLY: return leftVal * rightVal;
        case DIVIDE:   return leftVal / rightVal;
        case POWER:    return Math.pow(leftVal, rightVal);
        // ...
    }
}

NumberNode 直接返回存储的数值（源码）：

@Override
public double eval(Map<String, Double> context) {
    return value;
}

求值过程：对于 3 + 4 * 2

BinaryOpNode(+).eval() → 需要左值和右值
左值：NumberNode(3).eval() → 3
右值：BinaryOpNode(*).eval() → 需要左值和右值
左值：NumberNode(4).eval() → 4
右值：NumberNode(2).eval() → 2
结果：4 * 2 = 8
最终结果：3 + 8 = 11

功能特点

特性	说明
数学运算	加减乘除、幂运算、取模、阶乘
常量	`PI`（圆周率）、`E`（自然常数）
函数	三角函数、对数函数、统计函数等
变量	支持变量赋值和持久化
数组	数组字面量和数组操作
矩阵	转置、行列式、矩阵乘法等
REPL	交互式命令行，支持历史结果引用

快速开始

环境要求

构建: JDK 11 或更高版本（JDK 8 的 javadoc 不支持 --release 参数）
运行: Java 8 或更高版本
Maven（用于构建项目）

构建项目

在项目根目录下运行 Maven 构建命令：

mvn clean package

构建完成后，会在 target/ 目录下生成可执行的 JAR 文件。

启动 REPL

构建成功后，运行以下命令启动交互式 REPL：

java -jar target/Expression-Parser-1.4.0.jar

启动后将显示欢迎界面，输入 help 可查看支持的完整功能列表。

在项目中使用

除了 REPL 交互模式，你也可以将本库作为依赖集成到自己的 Java 项目中。

Maven 依赖：

<dependency>
    <groupId>cn.czyx007</groupId>
    <artifactId>Expression-Parser</artifactId>
    <version>1.4.0</version>
</dependency>

Gradle 依赖：

implementation 'cn.czyx007:Expression-Parser:1.4.0'

使用示例：

import cn.czyx007.expression_parser.api.ExpressionEvaluator;
import cn.czyx007.expression_parser.ast.Value;

// 简单计算
Value result = ExpressionEvaluator.eval("1 + 2 * 3");
System.out.println(result); // 输出: 7

// 使用变量上下文（表达式内赋值，变量可复用）
Map<String, Object> context = new HashMap<>();
Value result1 = ExpressionEvaluator.eval("x = 10; y = 20; x + y", context);
System.out.println(result1); // 输出: 30

// 复用 context 中的变量继续计算
Value result2 = ExpressionEvaluator.eval("x * y", context);
System.out.println(result2); // 输出: 200

REPL 支持的功能

REPL（Read-Eval-Print Loop）支持以下功能，方便用户实时输入和计算表达式：

基本运算

运算符	说明
`+`, `-`, `*`, `/`	加减乘除
`%`	取模
`^`	幂运算（右结合，如 `2^3^2 = 512`）
`!`	阶乘（如 `5! = 120`）

常量

常量	说明
`PI`	圆周率 π ≈ 3.14159...
`E`	自然常数 e ≈ 2.71828...

数学函数

三角函数与反三角函数

sin, cos, tan, asin, acos, atan

双曲函数

sinh, cosh, tanh

指数与对数

exp, ln, log, log10

根号与绝对值

sqrt, cbrt, abs

取整与符号函数

ceil, floor, round, signum, sign

角度转换与其他

degrees, radians, atan2(y,x), hypot(x,y), pow(base,exponent)

统计函数（向量操作）

对于向量 X = [...]; Y = [...];（也可手动按序展开输入）：

基础统计

函数	说明	参数要求
`max(X)`, `min(X)`	极值	2+ 个
`sum(X)`	求和	1+ 个
`count(X)`	参数计数	-
`avg(X)`	平均值	1+ 个
`median(X)`	中位数	1+ 个
`prod(X)`, `product(X)`	乘积	1+ 个

方差与标准差

函数	说明	参数要求
`var(X)`, `variance(X)`	样本方差	≥2
`std(X)`, `stddev(X)`	样本标准差	≥2
`varp(X)`, `variancep(X)`	总体方差	≥1
`stdp(X)`, `stddevp(X)`	总体标准差	≥1

分布与距离统计

函数	说明	参数要求
`gcd(X)`, `lcm(X)`	最大公约数/最小公倍数	整数，2+ 个
`range(X)`	极差	1+ 个
`geomean(X)`	几何平均数	1+ 个正数
`norm1(X)`, `sumabs(X)`	绝对值和 (L1)	-
`norm2(X)`, `rms(X)`	欧几里得范数 (L2)/均方根	-
`percentile(p,X)`, `pctl(p,X)`	百分位数	p ∈ [0,100]

协方差与相关

函数	说明
`cov(X,Y)`, `covariance(X,Y)`	样本协方差
`covp(X,Y)`, `covariancep(X,Y)`	总体协方差
`corr(X,Y)`, `correlation(X,Y)`	相关系数

向量与矩阵操作

向量操作

函数	说明
`dot(X,Y)`, `dotprod(X,Y)`	向量点积
`dist(X,Y)`, `distance(X,Y)`, `euclidean(X,Y)`	欧几里得距离
`manhattan(X,Y)`, `taxicab(X,Y)`	曼哈顿距离

矩阵操作

函数	说明
`transpose(matrix)`, `t(matrix)`	矩阵/向量转置
`det(matrix)`, `determinant(matrix)`	行列式
`matmul(matrix_A, matrix_B)`	矩阵乘法
`trace(matrix)`	矩阵的迹（主对角线之和）
`rank(matrix)`	矩阵的秩
`mean(matrix, axis)`	矩阵均值（axis=0 列，axis=1 行）
`inv(matrix)`	矩阵求逆
`solve(A, b)`	解线性方程组 Ax=b

组合数学

函数	说明
`C(n,k)`, `comb(n,k)`	组合数
`P(n,k)`, `perm(n,k)`	排列数

变量与数组

变量赋值

x = 10 - 赋值
x = 10; y = 2x - 多语句（分号分隔）
ans - 上一次计算结果

数组

[1, 2, 3] - 数组字面量
[[1,2], [3,4]] - 二维数组（矩阵）
scores = [1, 2, 3] - 数组变量赋值
avg(scores) - 数组作为函数参数

隐式乘法

2PI - 数字与常量
3(4+5) - 数字与括号
2sqrt(4) - 数字与函数

REPL 命令

命令	说明
`help`, `?`	显示帮助
`vars`	显示已定义的变量
`clear`	清除所有变量
`exit`, `quit`, `q`	退出

测试

项目包含全面的单元测试。运行以下命令执行测试：

mvn test

项目结构

src/main/java：包含主代码，包括词法分析器、语法分析器和 REPL。
src/test/java：包含各组件的单元测试。
target/：生成的文件和编译后的类。

许可证

此项目基于 MIT 许可证开源。详情请参阅 LICENSE 文件。

English

Overview

Expression Parser is a Java project based on recursive descent parsing technology, supporting mathematical expression parsing and calculation for scalars, vectors, and matrices. The project also provides an interactive REPL (Read-Eval-Print Loop) terminal for users to input and calculate expressions in real-time.

Implementation Principles

This project adopts a classic compiler frontend architecture, dividing expression parsing into four stages:

Input: "3 + 4 * 2"

Stage 1: Lexical Analysis (Lexer)    Stage 2: Syntax Analysis (Parser)
┌───────────────────┐                ┌───────────────────────────────────────┐
│ NUMBER(3)         │                │ expr                                  │
│ PLUS(+)           │                │ └── addExpr                           │
│ NUMBER(4)         │ ────────────▶ │     ├── term                          │
│ MULTIPLY(*)       │   Token Seq    │     │   └── unary                     │
│ NUMBER(2)         │                │     │       └── power                 │
│ EOF               │                │     │           └── implicitMul       │
└───────────────────┘                │     │               └── postfix       │
                                     │     │                   └── factor    │
                                     │     │                       └── 3     │
                                     │     ├── PLUS                          │
                                     │     └── term                          │
                                     │         ├── unary                     │
                                     │         │   └── power                 │
                                     │         │       └── implicitMul       │
                                     │         │               └── postfix   │
                                     │         │                   └── factor│
                                     │         │                       └── 4 │
                                     │         ├── MULTIPLY                  │
                                     │         └── unary                     │
                                     │             └── power                 │
                                     │                 └── implicitMul       │
                                     │                         └── postfix   │
                                     │                             └── factor│
                                     │                                 └── 2 │
                                     └───────────────────────────────────────┘

Stage 3: Build AST                     Stage 4: Expression Evaluation (Eval)
┌───────────────────┐                ┌─────────────────────────────────────┐
│ BinaryOpNode(+)   │                │ BinaryOpNode(+)                     │
│    /        \     │                │   left = 3.eval() = 3               │
│ Number    BinaryOp│ ────────────▶ │   right = *.eval()                  │
│ Node(3)   Node(*) │                │     left = 4.eval() = 4             │
│          /      \ │                │     right = 2.eval() = 2            │
│    Number   Number│                │     return 4 * 2 = 8                │
│    Node(4) Node(2)│                │   return 3 + 8 = 11                 │
└───────────────────┘                └─────────────────────────────────────┘

Pass-through Note: In the 3 + 4 * 2 example, the power, implicitMul, and postfix levels have no corresponding operators and directly pass through to the next level:

power has no ^ → passes to implicitMul

implicitMul has no implicit multiplication (e.g., 2x) → passes to postfix

postfix has no factorial ! → passes to factor

factor recognizes the number → returns NumberNode

1. Lexical Analysis

The Lexer (source) splits the input string into Tokens (lexical units):

Token Type	Example	Description
`NUMBER`	`3.14`, `1e-5`	Numbers (supports floating-point and scientific notation)
`IDENTIFIER`	`sin`, `PI`, `x`	Identifiers (function names, constants, variables)
`PLUS/MINUS/MULTIPLY/DIVIDE`	`+`, `-`, `*`, `/`	Arithmetic operators
`POWER`	`^`	Power operator
`LPAREN/RPAREN`	`(`, `)`	Parentheses

Example: Input "3 + 4 * 2" is split into: [NUMBER:3] [PLUS] [NUMBER:4] [MULTIPLY] [NUMBER:2] [EOF]

2. Syntax Analysis

The Parser (source) uses recursive descent technology to recursively parse expressions based on operator precedence.

Precedence from high to low:

Precedence	Grammar Rule	Description
High	`factor`	Numbers, identifiers, parenthesized expressions, arrays
↑	`postfix`	Factorial (e.g., `5!`)
↑	`implicitMul`	Implicit multiplication (e.g., `2x`, `3(4+5)`)
↑	`power`	Power operation (right-associative, e.g., `2^3^2 = 2^(3^2)`)
↑	`unary`	Unary plus/minus signs (e.g., `-3^2 = -(3^2)`)
↑	`term`	Multiplication, division, modulo
↑	`addExpr`	Addition, subtraction
Low	`expr`	Assignment expressions

Grammar Rules (BNF notation: → means defined as, | means or, (...) means grouping, * means zero or more, ? means zero or one):

expr        → IDENTIFIER ASSIGN expr | addExpr
addExpr     → term ((PLUS | MINUS) term)*
term        → unary ((MULTIPLY | DIVIDE | MODULO) unary)*
unary       → (PLUS | MINUS) unary | power
power       → implicitMul (POWER power)?
implicitMul → postfix (postfix)*
postfix     → factor FACTORIAL?
factor      → NUMBER | IDENTIFIER | LPAREN expr RPAREN | LBRACKET ... RBRACKET

3. Abstract Syntax Tree (AST)

After parsing, an Abstract Syntax Tree (ExprNode) is generated. Tree node types include:

Node Type	Description	Example
`NumberNode`	Numeric literal	`3.14`
`VariableNode`	Variable reference	`x`
`BinaryOpNode`	Binary operation	`a + b`, `x * y`
`UnaryOpNode`	Unary operation	`-x`, `+5`
`FunctionNode`	Function call	`sin(PI/2)`
`AssignNode`	Variable assignment	`x = 10`
`ArrayNode`	Array literal	`[1, 2, 3]`

AST Example: The syntax tree for expression 3 + 4 * 2:

       (+)
      /   \
    (3)   (*)
         /   \
       (4)   (2)

4. Expression Evaluation

Each AST node implements the eval() method to complete evaluation through recursive calls to child nodes' eval().

Taking BinaryOpNode as an example (source):

@Override
public double eval(Map<String, Double> context) {
    // 1. Recursively evaluate left subtree
    double leftVal = left.eval(context);
    // 2. Recursively evaluate right subtree
    double rightVal = right.eval(context);
    // 3. Apply operator
    switch (op.type()) {
        case PLUS:  return leftVal + rightVal;
        case MINUS: return leftVal - rightVal;
        case MULTIPLY: return leftVal * rightVal;
        case DIVIDE:   return leftVal / rightVal;
        case POWER:    return Math.pow(leftVal, rightVal);
        // ...
    }
}

NumberNode directly returns the stored value (source):

@Override
public double eval(Map<String, Double> context) {
    return value;
}

Evaluation Process: For 3 + 4 * 2

BinaryOpNode(+).eval() → needs left and right values
Left value: NumberNode(3).eval() → 3
Right value: BinaryOpNode(*).eval() → needs left and right values
Left value: NumberNode(4).eval() → 4
Right value: NumberNode(2).eval() → 2
Result: 4 * 2 = 8
Final result: 3 + 8 = 11

Features

Feature	Description
Mathematical Operations	Addition, subtraction, multiplication, division, power, modulo, factorial
Constants	`PI` (pi), `E` (natural constant)
Functions	Trigonometric, logarithmic, statistical functions, etc.
Variables	Support variable assignment and persistence
Arrays	Array literals and array operations
Matrices	Transpose, determinant, matrix multiplication, etc.
REPL	Interactive command line with history result reference

Quick Start

Requirements

Build: JDK 11 or higher (JDK 8's javadoc doesn't support --release flag)
Runtime: Java 8 or higher
Maven (for building the project)

Building the Project

Run the Maven build command in the project root directory:

mvn clean package

After building, an executable JAR file will be generated in the target/ directory.

Starting REPL

After successful build, run the following command to start the interactive REPL:

java -jar target/Expression-Parser-1.4.0.jar

After startup, a welcome screen will be displayed. Enter help to view the complete list of supported features.

Using in Your Project

In addition to REPL interactive mode, you can also integrate this library as a dependency into your own Java project.

Maven Dependency:

<dependency>
    <groupId>cn.czyx007</groupId>
    <artifactId>Expression-Parser</artifactId>
    <version>1.4.0</version>
</dependency>

Gradle Dependency:

implementation 'cn.czyx007:Expression-Parser:1.4.0'

Usage Example:

import cn.czyx007.expression_parser.api.ExpressionEvaluator;
import cn.czyx007.expression_parser.ast.Value;

// Simple calculation
Value result = ExpressionEvaluator.eval("1 + 2 * 3");
System.out.println(result); // Output: 7

// With variable context (assign in expression, variables reusable)
Map<String, Object> context = new HashMap<>();
Value result1 = ExpressionEvaluator.eval("x = 10; y = 20; x + y", context);
System.out.println(result1); // Output: 30

// Reuse variables from context
Value result2 = ExpressionEvaluator.eval("x * y", context);
System.out.println(result2); // Output: 200

REPL Supported Features

REPL (Read-Eval-Print Loop) supports the following features for users to input and calculate expressions in real-time:

Basic Operations

Operator	Description
`+`, `-`, `*`, `/`	Addition, subtraction, multiplication, division
`%`	Modulo
`^`	Power operation (right-associative, e.g., `2^3^2 = 512`)
`!`	Factorial (e.g., `5! = 120`)

Constants

Constant	Description
`PI`	Pi π ≈ 3.14159...
`E`	Natural constant e ≈ 2.71828...

Mathematical Functions

Trigonometric and Inverse Trigonometric Functions