| Port details |
- py-tokenizers Fast state-of-the-art tokenizers optimized for research and production
- 0.22.1_2 textproc
=2 0.22.1Version of this port present on the latest quarterly branch. - Maintainer: tagattie@FreeBSD.org
 - Port Added: 2024-02-12 08:36:07
- Last Update: 2025-11-11 11:08:12
- Commit Hash: d6b6027
- People watching this port, also watch:: jdictionary, py311-Automat, py311-python-gdsii, py311-PyOpenGL, p5-Sane
- Also Listed In: python
- License: APACHE20
- WWW:
- https://github.com/huggingface/tokenizers
- Description:
- Provides an implementation of today's most used tokenizers, with a
focus on performance and versatility.
Main features:
- Train new vocabularies and tokenize, using today's most used
tokenizers.
- Extremely fast (both training and tokenization), thanks to the Rust
implementation. Takes less than 20 seconds to tokenize a GB of text
on a server's CPU.
- Easy to use, but also extremely versatile.
- Designed for research and production.
- Normalization comes with alignments tracking. It's always possible
to get the part of the original sentence that corresponds to a given
token.
- Does all the pre-processing: Truncate, Pad, add the special tokens
your model needs.
¦ ¦ ¦ ¦ 
- Manual pages:
- FreshPorts has no man page information for this port.
- pkg-plist: as obtained via:
make generate-plist - There is no configure plist information for this port.
- USE_RC_SUBR (Service Scripts)
- no SUBR information found for this port
- Dependency lines:
-
- ${PYTHON_PKGNAMEPREFIX}tokenizers>0:textproc/py-tokenizers@${PY_FLAVOR}
- To install the port:
- cd /usr/ports/textproc/py-tokenizers/ && make install clean
- To add the package, run one of these commands:
- pkg install textproc/py-tokenizers
- pkg install py311-tokenizers
NOTE: If this package has multiple flavors (see below), then use one of them instead of the name specified above. NOTE: This is a Python port. Instead of py311-tokenizers listed in the above command, you can pick from the names under the Packages section.- PKGNAME: py311-tokenizers
- Package flavors (<flavor>: <package>)
- distinfo:
- TIMESTAMP = 1758523747
SHA256 (tokenizers-0.22.1.tar.gz) = 61de6522785310a309b3407bac22d99c4db5dba349935e99e4d15ea2226af2d9
SIZE (tokenizers-0.22.1.tar.gz) = 363123
Packages (timestamps in pop-ups are UTC):
- Dependencies
- NOTE: FreshPorts displays only information on required and default dependencies. Optional dependencies are not covered.
- Build dependencies:
-
- py311-maturin>=1.0<2.0 : devel/py-maturin@py311
- rust>=1.91.0 : lang/rust
- pkgconf>=1.3.0_1 : devel/pkgconf
- python3.11 : lang/python311
- py311-build>=0 : devel/py-build@py311
- py311-installer>=0 : devel/py-installer@py311
- Test dependencies:
-
- py311-requests>0 : www/py-requests@py311
- py311-numpy>0 : math/py-numpy@py311
- py311-datasets>0 : misc/py-datasets@py311
- py311-pytest>=7,1 : devel/py-pytest@py311
- python3.11 : lang/python311
- Runtime dependencies:
-
- py311-huggingface-hub>=0.16.4<2.0 : misc/py-huggingface-hub@py311
- python3.11 : lang/python311
- Library dependencies:
-
- libonig.so : devel/oniguruma
- This port is required by:
- for Run
-
- misc/py-aider-chat
- misc/py-anthropic
- misc/py-litellm
- misc/py-sentence-transformers
- misc/py-transformers
Configuration Options:
- No options to configure
- Options name:
- textproc_py-tokenizers
- USES:
- cargo python
- FreshPorts was unable to extract/find any pkg message
- Master Sites:
|
| Commit History - (may be incomplete: for full details, see links to repositories near top of page) |
| Commit | Credits | Log message |
0.22.1_2 11 Nov 2025 11:08:12
    |
Mikael Urankar (mikael)  |
lang/rust: Bump revisions after 1.91.1
PR: 290816 |
0.22.1_1 03 Oct 2025 08:16:50
    |
Mikael Urankar (mikael)  |
lang/rust: Bump revisions after 1.90.0
PR: 289709 |
0.22.1 22 Sep 2025 07:13:25
    |
Hiroki Tagato (tagattie)  |
textproc/py-tokenizers: update to 0.22.1
Changelog: https://github.com/huggingface/tokenizers/releases/tag/v0.22.1
Reported by: Repology |
0.22.0_1 01 Sep 2025 08:25:04
    |
Mikael Urankar (mikael)  |
lang/rust: Bump revisions after 1.89.0
PR: 288923 |
0.22.0 31 Aug 2025 01:40:55
    |
Hiroki Tagato (tagattie)  |
textproc/py-tokenizers: update to 0.22.0
Changelog: https://github.com/huggingface/tokenizers/releases/tag/v0.22.0
Reported by: portscout |
0.21.4 05 Aug 2025 06:43:18
    |
Hiroki Tagato (tagattie)  |
textproc/py-tokenizers: update to 0.21.4
Changelog:
- https://github.com/huggingface/tokenizers/releases/tag/v0.21.3
- https://github.com/huggingface/tokenizers/releases/tag/v0.21.4
Reported by: Repology |
0.21.2_1 03 Jul 2025 08:46:01
    |
Mikael Urankar (mikael)  |
lang/rust: Bump revisions after 1.88.0
PR: 287766 |
0.21.2 26 Jun 2025 09:16:07
    |
Hiroki Tagato (tagattie)  |
textproc/py-tokenizers: update to 0.21.2
While here, refactor test-related part so that it can execute both
python and rust tests.
Changelog: https://github.com/huggingface/tokenizers/releases/tag/v0.21.2
Reported by: Repology |
0.21.1_2 05 Jun 2025 07:52:53
    |
Mikael Urankar (mikael)  |
lang/rust: Bump revisions after 1.87.0
PR: 286829 |
0.21.1_1 08 Apr 2025 08:41:13
    |
Mikael Urankar (mikael)  |
lang/rust: Bump revisions after 1.86.0
PR: 285840 |
0.21.1 22 Mar 2025 08:35:24
    |
Hiroki Tagato (tagattie)  |
textproc/py-tokenizers: update to 0.21.1
Changelog: https://github.com/huggingface/tokenizers/releases/tag/v0.21.1
Reported by: portscout |
0.21.0_2 24 Feb 2025 07:55:52
    |
Mikael Urankar (mikael)  |
lang/rust: Bump revisions after 1.85.0
PR: 284884 |
0.21.0_1 20 Jan 2025 11:06:50
    |
Mikael Urankar (mikael)  |
lang/rust: Bump revisions after 1.84.0
PR: 283962 |
0.21.0 06 Dec 2024 02:24:32
    |
Hiroki Tagato (tagattie)  |
textproc/py-tokenizers: update to 0.21.0
Changelog: https://github.com/huggingface/tokenizers/releases/tag/v0.21.0
Reported by: Repology |
0.20.3_2 01 Dec 2024 09:24:18
    |
Mikael Urankar (mikael)  |
lang/rust: Bump revisions after 1.83.0
PR: 283000 |
0.20.3_1 08 Nov 2024 08:24:20
    |
Mikael Urankar (mikael)  |
lang/rust: Bump revisions after 1.82.0
PR: 282516 |
0.20.3 07 Nov 2024 12:30:23
    |
Hiroki Tagato (tagattie)  |
textproc/py-tokenizers: update to 0.20.3
Changelog:
- https://github.com/huggingface/tokenizers/releases/tag/v0.20.2
- https://github.com/huggingface/tokenizers/releases/tag/v0.20.3
Reported by: portscout |
0.20.1 18 Oct 2024 05:12:30
    |
Hiroki Tagato (tagattie)  |
textproc/py-tokenizers: update to 0.20.1
While here, add LICENSE_FILE.
Changelog: https://github.com/huggingface/tokenizers/releases/tag/v0.20.1
Reported by: Repology |
0.20.0_2 10 Sep 2024 11:00:34
    |
Mikael Urankar (mikael)  |
lang/rust: Bump revisions after 1.81.0
PR: 281300 |
0.20.0_1 10 Sep 2024 10:58:07
    |
Mikael Urankar (mikael)  Author: Siva Mahadevan |
*/*: remove STRIP_CMD calls in rust based ports
This is not needed after bc4fedc1fec0d359365c04d43be9e32bf101a50e
PR: 246993
Differential Revision: https://reviews.freebsd.org/D46503 |
0.20.0_1 26 Aug 2024 08:08:01
    |
Mikael Urankar (mikael)  |
lang/rust: Bump revisions after 1.80.1
PR: 280490 |
0.20.0 10 Aug 2024 22:35:02
    |
Hiroki Tagato (tagattie)  |
textproc/py-tokenizers: update to 0.20.0
Changelog: https://github.com/huggingface/tokenizers/releases/tag/v0.20.0
Reported by: portscout |
0.19.1_2 18 Jun 2024 10:59:14
    |
Mikael Urankar (mikael)  |
lang/rust: Bump revisions after 1.79.0
PR: 279707 |
0.19.1_1 13 May 2024 11:03:24
    |
Mikael Urankar (mikael)  |
lang/rust: Bump revisions after 1.78.0
PR: 278834 |
0.19.1 21 Apr 2024 08:18:00
    |
Hiroki Tagato (tagattie)  |
textproc/py-tokenizers: update to 0.19.1
Changelog:
- https://github.com/huggingface/tokenizers/releases/tag/v0.19.0
- https://github.com/huggingface/tokenizers/releases/tag/v0.19.1
Reported by: Repology |
0.15.2_2 23 Mar 2024 09:41:46
    |
Mikael Urankar (mikael)  |
lang/rust: Bump revisions after 1.77.0
PR: 277786 |
0.15.2_1 19 Feb 2024 11:59:23
    |
Mikael Urankar (mikael)  |
lang/rust: Bump revisions after 1.76.0
PR: 276920 |
0.15.2 14 Feb 2024 09:17:15
    |
Hiroki Tagato (tagattie)  |
textproc/py-tokenizers: update to 0.15.2
While here, enable tests.
Changelog: https://github.com/huggingface/tokenizers/releases/tag/v0.15.2
Reported by: portscout |
0.15.1 12 Feb 2024 08:34:14
    |
Hiroki Tagato (tagattie)  |
textproc/py-tokenizers: add port: Fast state-of-the-art tokenizers optimized for
research and production
Provides an implementation of today's most used tokenizers, with a
focus on performance and versatility.
Main features:
- Train new vocabularies and tokenize, using today's most used
tokenizers.
- Extremely fast (both training and tokenization), thanks to the Rust
implementation. Takes less than 20 seconds to tokenize a GB of text
on a server's CPU.
- Easy to use, but also extremely versatile.
- Designed for research and production.
- Normalization comes with alignments tracking. It's always possible
to get the part of the original sentence that corresponds to a given
token.
- Does all the pre-processing: Truncate, Pad, add the special tokens
your model needs.
WWW: https://github.com/huggingface/tokenizers |