AI之 模型管理: huggingface 数据集/模型库 上传和下载

huggingface 数据集/模型库 上传和下载

Posted by 董江 on Thursday, February 13, 2025

huggingface 数据集/模型库 上传和下载

环境准备

云原生时代,golang为王; 而AI时代,Python为王。 需要安装Python相关的环境和libary包;

python虚拟环境

由于 Python 包管理和依赖管理本身问题,不同项目对同一个包的不同版本的依赖不同时,会导致依赖混乱、不同版本包冲突等,多项目环境不可用等问题

因此,python3 提出 虚拟环境 概念, 从开发环境多目录开始,屏蔽依赖包管理薄弱问题;

首次创建:

# 安装virtualenv
$ pip install virtualenv

# 在 ~/ 里面创建虚拟环境 .env
$ cd ~/
$ virtualenv .env

# 切换进入 .env/ 这个虚拟环境
$ source .env/bin/activate

以后只要切换虚拟环境

# 切换进入 .env/ 这个虚拟环境
jiangdong@Mac Mini:~ $ source .env/bin/activate
(.env) jiangdong@Mac Mini:~ $ 

安装huggingface hub相关 python 包

jiangdong@Mac Mini:~ $ source .env/bin/activate
(.env) jiangdong@Mac Mini:~ $ 

# 安装 huggingface hub 主包
jiangdong@Mac Mini:~ $ pip install -U "huggingface_hub"
Requirement already satisfied: huggingface_hub in ./.env/lib/python3.13/site-packages (0.28.1)
Requirement already satisfied: filelock in ./.env/lib/python3.13/site-packages (from huggingface_hub) (3.17.0)
Requirement already satisfied: fsspec>=2023.5.0 in ./.env/lib/python3.13/site-packages (from huggingface_hub) (2025.2.0)
Requirement already satisfied: packaging>=20.9 in ./.env/lib/python3.13/site-packages (from huggingface_hub) (24.2)
Requirement already satisfied: pyyaml>=5.1 in ./.env/lib/python3.13/site-packages (from huggingface_hub) (6.0.2)
Requirement already satisfied: requests in ./.env/lib/python3.13/site-packages (from huggingface_hub) (2.32.3)
Requirement already satisfied: tqdm>=4.42.1 in ./.env/lib/python3.13/site-packages (from huggingface_hub) (4.67.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in ./.env/lib/python3.13/site-packages (from huggingface_hub) (4.12.2)
Requirement already satisfied: charset-normalizer<4,>=2 in ./.env/lib/python3.13/site-packages (from requests->huggingface_hub) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in ./.env/lib/python3.13/site-packages (from requests->huggingface_hub) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./.env/lib/python3.13/site-packages (from requests->huggingface_hub) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in ./.env/lib/python3.13/site-packages (from requests->huggingface_hub) (2025.1.31)

# 安装huggingface_hub[cli]包
(.env) jiangdong@Mac Mini:~ $  pip install -U "huggingface_hub[cli]"
Requirement already satisfied: huggingface_hub[cli] in ./.env/lib/python3.13/site-packages (0.28.1)
Requirement already satisfied: filelock in ./.env/lib/python3.13/site-packages (from huggingface_hub[cli]) (3.17.0)
Requirement already satisfied: fsspec>=2023.5.0 in ./.env/lib/python3.13/site-packages (from huggingface_hub[cli]) (2025.2.0)
Requirement already satisfied: packaging>=20.9 in ./.env/lib/python3.13/site-packages (from huggingface_hub[cli]) (24.2)
Requirement already satisfied: pyyaml>=5.1 in ./.env/lib/python3.13/site-packages (from huggingface_hub[cli]) (6.0.2)
Requirement already satisfied: requests in ./.env/lib/python3.13/site-packages (from huggingface_hub[cli]) (2.32.3)
Requirement already satisfied: tqdm>=4.42.1 in ./.env/lib/python3.13/site-packages (from huggingface_hub[cli]) (4.67.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in ./.env/lib/python3.13/site-packages (from huggingface_hub[cli]) (4.12.2)
Requirement already satisfied: InquirerPy==0.3.4 in ./.env/lib/python3.13/site-packages (from huggingface_hub[cli]) (0.3.4)
Requirement already satisfied: pfzy<0.4.0,>=0.3.1 in ./.env/lib/python3.13/site-packages (from InquirerPy==0.3.4->huggingface_hub[cli]) (0.3.4)
Requirement already satisfied: prompt-toolkit<4.0.0,>=3.0.1 in ./.env/lib/python3.13/site-packages (from InquirerPy==0.3.4->huggingface_hub[cli]) (3.0.50)
Requirement already satisfied: charset-normalizer<4,>=2 in ./.env/lib/python3.13/site-packages (from requests->huggingface_hub[cli]) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in ./.env/lib/python3.13/site-packages (from requests->huggingface_hub[cli]) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./.env/lib/python3.13/site-packages (from requests->huggingface_hub[cli]) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in ./.env/lib/python3.13/site-packages (from requests->huggingface_hub[cli]) (2025.1.31)
Requirement already satisfied: wcwidth in ./.env/lib/python3.13/site-packages (from prompt-toolkit<4.0.0,>=3.0.1->InquirerPy==0.3.4->huggingface_hub[cli]) (0.2.13)

# 安装 huggingface_hub[hf_transfer] 包
(.env) jiangdong@Mac Mini:~ $  pip install -U "huggingface_hub[hf_transfer]"
Requirement already satisfied: huggingface_hub[hf_transfer] in ./.env/lib/python3.13/site-packages (0.28.1)
Requirement already satisfied: filelock in ./.env/lib/python3.13/site-packages (from huggingface_hub[hf_transfer]) (3.17.0)
Requirement already satisfied: fsspec>=2023.5.0 in ./.env/lib/python3.13/site-packages (from huggingface_hub[hf_transfer]) (2025.2.0)
Requirement already satisfied: packaging>=20.9 in ./.env/lib/python3.13/site-packages (from huggingface_hub[hf_transfer]) (24.2)
Requirement already satisfied: pyyaml>=5.1 in ./.env/lib/python3.13/site-packages (from huggingface_hub[hf_transfer]) (6.0.2)
Requirement already satisfied: requests in ./.env/lib/python3.13/site-packages (from huggingface_hub[hf_transfer]) (2.32.3)
Requirement already satisfied: tqdm>=4.42.1 in ./.env/lib/python3.13/site-packages (from huggingface_hub[hf_transfer]) (4.67.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in ./.env/lib/python3.13/site-packages (from huggingface_hub[hf_transfer]) (4.12.2)
Requirement already satisfied: hf-transfer>=0.1.4 in ./.env/lib/python3.13/site-packages (from huggingface_hub[hf_transfer]) (0.1.9)
Requirement already satisfied: charset-normalizer<4,>=2 in ./.env/lib/python3.13/site-packages (from requests->huggingface_hub[hf_transfer]) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in ./.env/lib/python3.13/site-packages (from requests->huggingface_hub[hf_transfer]) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./.env/lib/python3.13/site-packages (from requests->huggingface_hub[hf_transfer]) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in ./.env/lib/python3.13/site-packages (from requests->huggingface_hub[hf_transfer]) (2025.1.31)

# 安装 transformers 由于调用数据集、model加载等能力
(.env) jiangdong@Mac Mini:~ $  pip install -U "transformers" 
Requirement already satisfied: transformers in ./.env/lib/python3.13/site-packages (4.48.3)
Requirement already satisfied: filelock in ./.env/lib/python3.13/site-packages (from transformers) (3.17.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.24.0 in ./.env/lib/python3.13/site-packages (from transformers) (0.28.1)
Requirement already satisfied: numpy>=1.17 in ./.env/lib/python3.13/site-packages (from transformers) (2.2.2)
Requirement already satisfied: packaging>=20.0 in ./.env/lib/python3.13/site-packages (from transformers) (24.2)
Requirement already satisfied: pyyaml>=5.1 in ./.env/lib/python3.13/site-packages (from transformers) (6.0.2)
Requirement already satisfied: regex!=2019.12.17 in ./.env/lib/python3.13/site-packages (from transformers) (2024.11.6)
Requirement already satisfied: requests in ./.env/lib/python3.13/site-packages (from transformers) (2.32.3)
Requirement already satisfied: tokenizers<0.22,>=0.21 in ./.env/lib/python3.13/site-packages (from transformers) (0.21.0)
Requirement already satisfied: safetensors>=0.4.1 in ./.env/lib/python3.13/site-packages (from transformers) (0.5.2)
Requirement already satisfied: tqdm>=4.27 in ./.env/lib/python3.13/site-packages (from transformers) (4.67.1)
Requirement already satisfied: fsspec>=2023.5.0 in ./.env/lib/python3.13/site-packages (from huggingface-hub<1.0,>=0.24.0->transformers) (2025.2.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in ./.env/lib/python3.13/site-packages (from huggingface-hub<1.0,>=0.24.0->transformers) (4.12.2)
Requirement already satisfied: charset-normalizer<4,>=2 in ./.env/lib/python3.13/site-packages (from requests->transformers) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in ./.env/lib/python3.13/site-packages (from requests->transformers) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./.env/lib/python3.13/site-packages (from requests->transformers) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in ./.env/lib/python3.13/site-packages (from requests->transformers) (2025.1.31)

# 最后pip check是否安装可用
(.env) jiangdong@Mac Mini:~ $ pip list
Package            Version
------------------ ---------
certifi            2025.1.31
charset-normalizer 3.4.1
filelock           3.17.0
fsspec             2025.2.0
hf_transfer        0.1.9
huggingface-hub    0.28.1
idna               3.10
inquirerpy         0.3.4
Jinja2             3.1.5
MarkupSafe         3.0.2
mpmath             1.3.0
networkx           3.4.2
numpy              2.2.2
packaging          24.2
pfzy               0.3.4
pip                25.0.1
prompt_toolkit     3.0.50
PyYAML             6.0.2
regex              2024.11.6
requests           2.32.3
safetensors        0.5.2
setuptools         75.8.0
sgl-kernel         0.0.1
sympy              1.13.1
tokenizers         0.21.0
torch              2.6.0
tqdm               4.67.1
transformers       4.48.3
typing_extensions  4.12.2
urllib3            2.3.0
wcwidth            0.2.13

数据集 和 模型 推送和下载

环境配置

配置 huggingface hub 环境变量

# .zshrc 或者 .bashrc
$ cat .zshrc 

# 用于tranformer 端点续传
export HF_HUB_ENABLE_HF_TRANSFER=1

# 下载和推送超时设置
export HF_HUB_ETAG_TIMEOUT=86400
export HF_HUB_DOWNLOAD_TIMEOUT=86400 

# 国内 mirror镜像源
export HF_ENDPOINT=https://hf-mirror.com

# huggingface token: read token 用于pull; write token 用于push; 管理token支持组织更改、删除等
export HUGGING_FACE_HUB_TOKEN=xxxxxxxxxxxxxxxxxxxxxxxxxxx


(.env) jiangdong@Mac Mini:tmp $ source ~/.zshrc 
(.env) jiangdong@Mac Mini:tmp $ env
...
HF_HUB_ENABLE_HF_TRANSFER=1
HF_HUB_ETAG_TIMEOUT=86400
HF_HUB_DOWNLOAD_TIMEOUT=86400
HF_ENDPOINT=https://hf-mirror.com
HUGGING_FACE_HUB_TOKEN=xxxxxxxxxxxxxxxxxxxxxxxxxxx
...

cli是否可用

# 安装git 和 git lfs
(.env) jiangdong@Mac Mini:~ $ brew install git
Warning: git 2.48.1 is already installed and up-to-date.
To reinstall 2.48.1, run:
  brew reinstall git
(.env) jiangdong@Mac Mini:~ $ brew install git-lfs 
Warning: git-lfs 3.6.1 is already installed and up-to-date.
To reinstall 3.6.1, run:
  brew reinstall git-lfs


# cli 可用
(.env) jiangdong@Mac Mini:~ $ huggingface-cli -h
usage: huggingface-cli <command> [<args>]

positional arguments:
  {download,upload,repo-files,env,login,whoami,logout,auth,repo,lfs-enable-largefiles,lfs-multipart-upload,scan-cache,delete-cache,tag,version,upload-large-folder}
                        huggingface-cli command helpers
    download            Download files from the Hub
    upload              Upload a file or a folder to a repo on the Hub
    repo-files          Manage files in a repo on the Hub
    env                 Print information about the environment.
    login               Log in using a token from huggingface.co/settings/tokens
    whoami              Find out which huggingface.co account you are logged in as.
    logout              Log out
    auth                Other authentication related commands
    repo                {create} Commands to interact with your huggingface.co repos.
    lfs-enable-largefiles
                        Configure your repository to enable upload of files > 5GB.
    scan-cache          Scan cache directory.
    delete-cache        Delete revisions from the cache directory.
    tag                 (create, list, delete) tags for a repo in the hub
    version             Print information about the huggingface-cli version.
    upload-large-folder
                        Upload a large folder to a repo on the Hub

options:
  -h, --help            show this help message and exit
(.env) jiangdong@Mac Mini:~ $ huggingface-cli version
huggingface_hub version: 0.28.1

dateset下载和推送

下载:

jiangdong@Mac Mini:tmp $ huggingface-cli download  --repo-type dataset simplescaling/s1K --local-dir s1k
Downloading '.gitattributes' to '/Users/jiangdong/.cache/huggingface/hub/datasets--simplescaling--s1K/blobs/1ef325f1b111266a6b26e0196871bd78baa8c2f3.incomplete'
.gitattributes: 2.46kB [00:00, 5.79MB/s]                                                                                                                                                                                                                      
Download complete. Moving file to /Users/jiangdong/.cache/huggingface/hub/datasets--simplescaling--s1K/blobs/1ef325f1b111266a6b26e0196871bd78baa8c2f3
Downloading 'README.md' to '/Users/jiangdong/.cache/huggingface/hub/datasets--simplescaling--s1K/blobs/099326cf6f2575e2302bba53675444bcfdd6eb07.incomplete'
README.md: 22.7kB [00:00, 26.9MB/s]
Download complete. Moving file to /Users/jiangdong/.cache/huggingface/hub/datasets--simplescaling--s1K/blobs/099326cf6f2575e2302bba53675444bcfdd6eb07
train-00000-of-00001.parquet: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.88M/6.88M [01:53<00:00, 60.6kB/s]
Download complete. Moving file to /Users/jiangdong/.cache/huggingface/hub/datasets--simplescaling--s1K/blobs/899de0fb79be8465efb311aec94c4dcf9863c72684610b4626a8dacef2c2d2e7
/Users/jiangdong/.cache/huggingface/hub/datasets--simplescaling--s1K/snapshots/278d72baaa2b887a7e76a70a0ae254a5a45536e4

PS: 也支持git clone 但是磁盘空间会使用2倍大小,其中包括 .git 多版本内容

推送:

jiangdong@Mac Mini:tmp $ cd s1k 
jiangdong@Mac Mini:s1k $ huggingface-cli upload s1k-cupy . . --repo-type dataset
Start hashing 3 files.
Finished hashing 3 files.
train-00000-of-00001.parquet: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.88M/6.88M [00:06<00:00, 1.02MB/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:07<00:00,  7.08s/it]
Removing 1 file(s) from commit that have not changed.
https://hf-mirror.com/datasets/DONGJINAG/s1k-cupy/tree/main/.

model下载和推送

下载:

jiangdong@Mac Mini:tmp $ huggingface-cli download deepseek-ai/Janus-1.3B --local-dir Janus-1.3B
Downloading '.gitattributes' to 'Janus-1.3B/.cache/huggingface/download/wPaCkH-WbT7GsmxMKKrNZTV4nSM=.a6344aac8c09253b3b630fb776ae94478aa0275b.incomplete'
.gitattributes: 1.52kB [00:00, 3.36MB/s]                                                                                                                                                                                                                      
Download complete. Moving file to Janus-1.3B/.gitattributes
Downloading 'README.md' to 'Janus-1.3B/.cache/huggingface/download/Xn7B-BWUGOee2Y6hCZtEhtFu4BE=.44e58a85f10a1aa0f43442501f8151cd16259516.incomplete'
README.md: 2.96kB [00:00, 4.76MB/s]                                                                                                                                                                                                                           
Download complete. Moving file to Janus-1.3B/README.md
Downloading 'arch.jpg' to 'Janus-1.3B/.cache/huggingface/download/sF5KJ0gbkGHoKLZwGTyqVDlXH78=.16b5a62960433000444996af47a63979016aa39f.incomplete'
arch.jpg: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 250k/250k [00:00<00:00, 250kB/s]
Download complete. Moving file to Janus-1.3B/arch.jpg
Downloading 'config.json' to 'Janus-1.3B/.cache/huggingface/download/8_PA_wEVGiVa2goH2H4KQOQpvVY=.ae9d81cc1bb235f4e91a4f87c98152e44306036f.incomplete'
config.json: 1.45kB [00:00, 4.85MB/s]                                                                                                                                                                                                                         
Download complete. Moving file to Janus-1.3B/config.json
...

推送:

jiangdong@Mac Mini:tmp $ cd Janus-1.3B
jiangdong@Mac Mini:Janus-1.3B $ huggingface-cli upload  Janus-1.3B-copy. . --repo-type model
...

「如果这篇文章对你有用,请随意打赏」

Kubeservice博客

如果这篇文章对你有用,请随意打赏

使用微信扫描二维码完成支付