基于sqlalchemy的动态生成表|violet blog

基于模型定义的动态数据表设计说明

1. 整体目标

本文基于一组典型的 SQLAlchemy ORM 模型，说明如何通过元数据模型来驱动动态数据表结构的生成，而不过多依赖具体业务场景：

用一个“主记录模型”（这里以 BusinessReport 为例）描述某类业务实体的头信息。
用一个“列配置模型”（这里以 ReportDataColumn 为例）描述该实体对应的数据列定义（列名、是否系统列、顺序、目标值区间等）。
通过主记录的属性方法（如 table_name、to_sqlalchemy_table()）确定每个实体实例对应的实际数据表名并动态构造 SQLAlchemy Table 结构。
通过列配置的转换方法（如 to_sqlalchemy_column()）将高层列描述下沉为 SQLAlchemy 的 Column，实现“一条主记录对应一张动态数据表”的通用模式。

2. 主记录模型：头信息与动态表名

以下示例模型以“报告”为例，结构在其他业务场景（如订单、任务、问卷等）中也同样适用。

2.1 基本字段示例

class BusinessReport(Base, TimestampTableMixin):
    """报告主记录"""

    __tablename__ = "business_report"

    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
    report_file_id: Mapped[str | None] = mapped_column(String(100), unique=True, comment="报告源文件 id 或外部标识")
    name: Mapped[str | None] = mapped_column(String(100), comment="报告名称")
    customer_name: Mapped[str | None] = mapped_column(String(100), comment="客户名称")
    product_name: Mapped[str | None] = mapped_column(String(100), comment="产品名称")
    product_code: Mapped[str | None] = mapped_column(String(100), comment="产品代号")
    start_check_date: Mapped[str | None] = mapped_column(String(32), comment="开始日期")
    end_check_date: Mapped[str | None] = mapped_column(String(32), comment="结束日期")

    order: Mapped[int | None] = mapped_column(Integer, comment="排序用字段")

这些字段构成了“主记录”的元信息，例如：外部来源 id、关联对象信息、时间区间等。
report_file_id 设为唯一，可以防止针对同一外部来源重复创建主记录。

2.2 与列配置的关系：`columns` 关联（通用一对多）

columns = relationship(
    "ReportDataColumn",
    back_populates="report",
    order_by="ReportDataColumn.order",
    foreign_keys="ReportDataColumn.report_id",
    primaryjoin="BusinessReport.id == foreign(ReportDataColumn.report_id)",
)

一个 BusinessReport 对应多条 ReportDataColumn（一对多关系）。
通过 order_by="ReportDataColumn.order" 保证列配置按列顺序排列，便于构造表结构与展示。

2.3 动态表名：`table_name` 属性（通用命名规则）

REPORT_DATA_PREFIX = "report_data"

class BusinessReport(Base, TimestampTableMixin):
    ...
    @property
    def table_name(self) -> str:
        """实际数据表的表名"""
        return f"{REPORT_DATA_PREFIX}_{self.id}"

在通用设计中，可以为某类实体统一指定一个前缀（如 REPORT_DATA_PREFIX = "report_data"、ORDER_DATA_PREFIX 等），再拼接主键 id：

每条主记录对应一张实际数据表，表名形如 <前缀>_<主键 id>。
例如：id=1 的报告对应表 report_data_1，id=2 对应 report_data_2。

3. 主记录方法 `to_sqlalchemy_table`：动态构造表结构

def to_sqlalchemy_table(self, include_system_column=True) -> Table:
    """
    转换为 sqlalchemy.Table 对象， 方便后续orm查询
    include_system_column 是否包含系统列
    """
    # 删除老版表定义
    if (old_table := Base.metadata.tables.get(self.table_name)) is not None:
        self.metadata.remove(old_table)
    columns = []
    for col in self.columns:
        if col.is_system and not include_system_column:
            continue
        columns.append(col.to_sqlalchemy_column())
    table = Table(self.table_name, Base.metadata, *columns, extend_existing=True)
    return table

3.1 旧表定义的清理

Base.metadata.tables.get(self.table_name)：在 SQLAlchemy 的元数据中查找是否已经存在同名表定义。
若存在，则通过 self.metadata.remove(old_table) 先移除旧定义，避免：
- 多次调用 to_sqlalchemy_table 时，元数据中存在冲突的表结构。
- 在同一进程内重载/变更列配置后，仍然引用旧的列结构。

3.2 根据列配置组装 `Column` 列表

遍历 self.columns（当前出货单下的所有 ReportDataColumn 记录）：
- 若 col.is_system 且 include_system_column 为 False，则跳过该列（例如排除系统列，只看业务列）。
- 其余情况下调用 col.to_sqlalchemy_column() 将列配置转为具体的 Column 对象。

3.3 注册为 SQLAlchemy 的 `Table`

Table(self.table_name, Base.metadata, *columns, extend_existing=True)：
- 使用动态生成的表名 table_name 和刚刚收集的 columns 创建 Table 对象。
- 注册到 Base.metadata 中，使后续可以通过该 Table 执行增删改查。
- extend_existing=True 允许在元数据中存在同名表时进行扩展定义，配合前面的移除逻辑，可以安全重建结构。

4. 系统列枚举：`SystemColumn`

class SystemColumn(str, enum.Enum):
    """系统列"""

    ID = "id_"      # 数据主键 id
    ORDER = "order_"  # 数据行顺序

    def to_report_column(self, report_id) -> "ReportDataColumn":
        order = 1 if self == SystemColumn.ID else 2
        return ReportDataColumn(name=self.value, report_id=report_id, is_system=True, order=order)

ID 和 ORDER 是在多数动态表场景中都通用的基础列：
- ID：每行数据的唯一标识（主键）。
- ORDER：记录出现/导入顺序或业务排序。
to_report_column 提供了一个将“系统列枚举”转换为“列配置记录”的通用工厂方法：
- 统一为系统列指定列名、所属主记录、是否系统列、顺序等元信息。

5. 列配置模型：`ReportDataColumn`

以下模型定义的是“某条主记录对应的某一列”的配置，其结构在多数动态字段场景中可复用。

class ReportDataColumn(Base, TimestampTableMixin):
    """出货单 数据列"""

    __tablename__ = "report_data_column"

    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
    report_id: Mapped[int] = mapped_column(Integer, comment="关联报告表id")
    name: Mapped[str] = mapped_column(String(32), comment="列名")
    is_system: Mapped[bool] = mapped_column(Boolean, comment="是否是系统列 系统列无目标值等元数据")

    target_value: Mapped[float | None] = mapped_column(Float, comment="目标值")
    target_upper_limit: Mapped[float | None] = mapped_column(Float, comment="目标值上限")
    target_lower_limit: Mapped[float | None] = mapped_column(Float, comment="目标值下限")

    order: Mapped[int | None] = mapped_column(Integer, comment="列顺序")
    ...

    def to_sqlalchemy_column(self) -> Column:
        """
        转换为 sqlalchemy.Column 对象， 非系统列数据都以 string 类型保存
        """
        type_arg = String(100)
        column_kwargs = {}
        # 系统列 id
        if self.is_system and self.name == SystemColumn.ID.value:
            type_arg = Integer
            column_kwargs["nullable"] = False
            column_kwargs["primary_key"] = True
            column_kwargs["autoincrement"] = True
        # 系系列 order
        elif self.is_system and self.name == SystemColumn.ORDER.value:
            type_arg = Integer
            column_kwargs["nullable"] = False
        else:  # 非系统列
            column_kwargs["nullable"] = True
        return Column(self.name, type_arg, **column_kwargs)  # type: ignore

5.1 字段含义

report_id：关联到某个 BusinessReport.id，表示该列属于哪张出货单。
name：列名，最终会作为动态表中的字段名，例如 "id_"、"order_" 或 "Brightness" 等。
is_system：是否为系统列（id_ / order_）。
target_value / target_upper_limit / target_lower_limit：业务上的目标值及上下界，用于质检或 SPC 分析等。
order：列的显示/存储顺序，用于对列进行排序。

5.2 与 `BusinessReport` 的反向关系

report = relationship(
    "BusinessReport",
    back_populates="columns",
    foreign_keys="ReportDataColumn.report_id",
    primaryjoin="ReportDataColumn.report_id == remote(BusinessReport.id)",
)

通过 report 可以从列配置反查所属的出货单。
back_populates="columns" 与 BusinessReport.columns 配对，形成双向关系。

5.3 `to_sqlalchemy_column`：将列配置转换为 `Column`

def to_sqlalchemy_column(self) -> Column:
    """
    转换为 sqlalchemy.Column 对象， 非系统列数据都以 string 类型保存
    """
    type_arg = String(100)
    column_kwargs = {}
    # 系统列 id
    if self.is_system and self.name == SystemColumn.ID.value:
        type_arg = Integer
        column_kwargs["nullable"] = False
        column_kwargs["primary_key"] = True
        column_kwargs["autoincrement"] = True
    # 系列列 order
    elif self.is_system and self.name == SystemColumn.ORDER.value:
        type_arg = Integer
        column_kwargs["nullable"] = False
    else:  # 非系统列
        column_kwargs["nullable"] = True
    return Column(self.name, type_arg, **column_kwargs)  # type: ignore

系统列 id_：
- 类型为 Integer，nullable=False，primary_key=True，autoincrement=True。
- 作为动态数据表的主键，自增行 id。
系统列 order_：
- 类型为 Integer，nullable=False。
- 表示每一行数据的顺序。
业务列（非系统列）：
- 默认类型为 String(100)，nullable=True。
- 所有非系统列的实际测量值等统一按字符串存储，便于兼容多种 Excel/来源格式，后续可以在业务层进行类型转换或校验。

6. 基于模型的整体流程总结（不依赖具体路由实现）

仅根据 models.py，可以抽象出以下“建模层流程”：

步骤 1：为每个出货单创建一条 BusinessReport 记录，记录头信息。
步骤 2：为该出货单创建多条 ReportDataColumn 记录，包括：
- 使用 SystemColumn 快速生成系统列（id_、order_）。
- 为每个业务字段生成对应的列配置（列名、是否系统列、顺序、目标值区间等）。
步骤 3：在需要构造实际数据表结构时：
- 通过 BusinessReport.table_name 获取当前出货单的动态表名（如 report_data_1）。
- 调用 BusinessReport.to_sqlalchemy_table()：
  - 清理旧的表定义。
  - 遍历 columns，调用 to_sqlalchemy_column() 转换为 Column。
  - 基于这些列组装并注册一个 SQLAlchemy Table 对象。
步骤 4：上层业务代码可以基于该 Table 对象完成建表、插入、查询、更新等操作。

通过这种模型设计，系统实现了：

一张出货单一套列配置（ReportDataColumn）。
一张出货单对应一张物理/逻辑数据表（report_data_<id>）。
动态表结构完全由模型层（BusinessReport + ReportDataColumn + SystemColumn）驱动，上层路由和服务只需要调用这些模型方法即可完成数据表的构造与操作。

7. 完整模型代码示例

下面给出一个完整的模型定义示例，展示如何在实际项目中组合主记录模型、系统列枚举与列配置模型：

import enum

from sqlalchemy import Boolean, Column, Float, Integer, String, Table
from sqlalchemy.orm import Mapped, mapped_column, relationship

from src.database import Base, TimestampTableMixin

REPORT_DATA_PREFIX = "report_data"


class BusinessReport(Base, TimestampTableMixin):
    """报告主记录（示例）"""

    __tablename__ = "business_report"

    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
    report_file_id: Mapped[str | None] = mapped_column(String(100), unique=True, comment="报告源文件 id 或外部标识")
    name: Mapped[str | None] = mapped_column(String(100), comment="报告名称")
    customer_name: Mapped[str | None] = mapped_column(String(100), comment="客户名称")
    product_name: Mapped[str | None] = mapped_column(String(100), comment="产品名称")
    product_code: Mapped[str | None] = mapped_column(String(100), comment="产品代号")
    start_check_date: Mapped[str | None] = mapped_column(String(32), comment="开始日期")
    end_check_date: Mapped[str | None] = mapped_column(String(32), comment="结束日期")

    order: Mapped[int | None] = mapped_column(Integer, comment="排序用字段")

    columns = relationship(
        "ReportDataColumn",
        back_populates="report",
        order_by="ReportDataColumn.order",
        foreign_keys="ReportDataColumn.report_id",
        primaryjoin="BusinessReport.id == foreign(ReportDataColumn.report_id)",
    )

    @property
    def table_name(self) -> str:
        """实际数据表的表名"""
        return f"{REPORT_DATA_PREFIX}_{self.id}"

    def to_sqlalchemy_table(self, include_system_column: bool = True) -> Table:
        """
        转换为 sqlalchemy.Table 对象，方便后续 ORM 查询或 DDL 操作。
        include_system_column 表示是否包含系统列。
        """
        # 删除老版表定义
        if (old_table := Base.metadata.tables.get(self.table_name)) is not None:
            self.metadata.remove(old_table)

        columns: list[Column] = []
        for col in self.columns:
            if col.is_system and not include_system_column:
                continue
            columns.append(col.to_sqlalchemy_column())

        table = Table(self.table_name, Base.metadata, *columns, extend_existing=True)
        return table


class SystemColumn(str, enum.Enum):
    """系统列定义"""

    ID = "id_"       # 数据主键 id
    ORDER = "order_"  # 数据行顺序

    def to_report_column(self, report_id: int) -> "ReportDataColumn":
        """将系统列枚举转换为列配置记录"""
        order = 1 if self == SystemColumn.ID else 2
        return ReportDataColumn(name=self.value, report_id=report_id, is_system=True, order=order)


class ReportDataColumn(Base, TimestampTableMixin):
    """动态数据表的列配置（示例）"""

    __tablename__ = "report_data_column"

    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
    report_id: Mapped[int] = mapped_column(Integer, comment="关联报告表 id")
    name: Mapped[str] = mapped_column(String(32), comment="列名")
    is_system: Mapped[bool] = mapped_column(Boolean, comment="是否是系统列，系统列无目标值等元数据")

    target_value: Mapped[float | None] = mapped_column(Float, comment="目标值")
    target_upper_limit: Mapped[float | None] = mapped_column(Float, comment="目标值上限")
    target_lower_limit: Mapped[float | None] = mapped_column(Float, comment="目标值下限")

    order: Mapped[int | None] = mapped_column(Integer, comment="列顺序")

    report = relationship(
        "BusinessReport",
        back_populates="columns",
        foreign_keys="ReportDataColumn.report_id",
        primaryjoin="ReportDataColumn.report_id == remote(BusinessReport.id)",
    )

    def to_sqlalchemy_column(self) -> Column:
        """
        转换为 sqlalchemy.Column 对象。
        示例中，非系统列数据都以 string 类型保存，便于兼容多种外部来源格式。
        """
        type_arg = String(100)
        column_kwargs: dict[str, object] = {}

        # 系统列 id_
        if self.is_system and self.name == SystemColumn.ID.value:
            type_arg = Integer
            column_kwargs["nullable"] = False
            column_kwargs["primary_key"] = True
            column_kwargs["autoincrement"] = True
        # 系统列 order_
        elif self.is_system and self.name == SystemColumn.ORDER.value:
            type_arg = Integer
            column_kwargs["nullable"] = False
        # 业务列
        else:
            column_kwargs["nullable"] = True

        return Column(self.name, type_arg, **column_kwargs)  # type: ignore[arg-type]