Python文件批量处理_目录扫描解析【指导】

舞夢輝影 2026-01-02 00:00:00 次阅读

Python批量处理文件推荐用pathlib遍历（如Path(".").rglob("*.txt")），os.walk适合深层嵌套统计，需注意路径准确、异常控制、编码指定（utf-8）、覆盖防护及进度提示。

Python批量处理文件，核心是用 os 或 pathlib 扫描目录结构，再结合具体逻辑（如读取、重命名、过滤、转换）逐个操作文件。关键不是写得复杂，而是路径准确、异常可控、逻辑清晰。

pathlib 是 Python 3.4+ 内置模块，语法简洁、跨平台友好，比 os.walk 更易读。

扫描当前目录下所有 .txt 文件：
  from pathlib import Path
  for f in Path(".").rglob("*.txt"):
    if f.is_file():
      print(f.name, f.parent)
只扫一级子目录（不递归）：
  for f in Path("data").iterdir():
    if f.is_file() and f.suffix == ".csv":
      process_csv(f)

需要明确获取“根目录、子目录名列表、文件名列表”三元组时，os.walk 更直接，尤其适合按层级做统计或跳过特定文件夹。

批量操作最常出错的不是逻辑，而是路径拼错、中文乱码、静默覆盖。

读文本文件务必指定 encoding，尤其 Windows 下默认 cp1252：
with open(f, encoding="utf-8") as fp:
content = fp.read()
写入前检查目标路径是否存在，避免 OSError：
  output = f.with_suffix(".clean.csv")
  output.parent.mkdir(parents=True, exist_ok=True)
  output.write_text(clean_data, encoding="utf-8")
重命名/移动前加 exists() 判断，防止意外覆盖：
  if not new_path.exists():
    f.rename(new_path)
  else:
    print(f"跳过 {f.name}：目标已存在")

处理几百个文件时，卡住或没反应容易让人怀疑脚本挂了。一行 print 或 tqdm 就够用。

基础计数提示（无需装包）：
  files = list(Path("input").rglob("*.json"))
  for i, f in enumerate(files, 1):
    process(f)
    print(f"\r处理中: {i}/{len(files)}", end="")
想更专业点，pip install tqdm 后：
from tqdm import tqdm
for f in tqdm(files, desc="解析 JSON"): process(f)