Port details |
- py-tokenizers Fast state-of-the-art tokenizers optimized for research and production
- 0.21.0 textproc =2 0.20.0_2Version of this port present on the latest quarterly branch.
- Maintainer: tagattie@FreeBSD.org
- Port Added: 2024-02-12 08:36:07
- Last Update: 2024-12-06 02:24:32
- Commit Hash: 2b6d4bc
- People watching this port, also watch:: jdictionary, py311-Automat, py311-python-gdsii, py39-PyOpenGL, p5-Sane
- Also Listed In: python
- License: APACHE20
- WWW:
- https://github.com/huggingface/tokenizers
- Description:
- Provides an implementation of today's most used tokenizers, with a
focus on performance and versatility.
Main features:
- Train new vocabularies and tokenize, using today's most used
tokenizers.
- Extremely fast (both training and tokenization), thanks to the Rust
implementation. Takes less than 20 seconds to tokenize a GB of text
on a server's CPU.
- Easy to use, but also extremely versatile.
- Designed for research and production.
- Normalization comes with alignments tracking. It's always possible
to get the part of the original sentence that corresponds to a given
token.
- Does all the pre-processing: Truncate, Pad, add the special tokens
your model needs.
- ¦ ¦ ¦ ¦
- Manual pages:
- FreshPorts has no man page information for this port.
- pkg-plist: as obtained via:
make generate-plist - There is no configure plist information for this port.
- Dependency lines:
-
- ${PYTHON_PKGNAMEPREFIX}tokenizers>0:textproc/py-tokenizers@${PY_FLAVOR}
- To install the port:
- cd /usr/ports/textproc/py-tokenizers/ && make install clean
- To add the package, run one of these commands:
- pkg install textproc/py-tokenizers
- pkg install py311-tokenizers
NOTE: If this package has multiple flavors (see below), then use one of them instead of the name specified above. NOTE: This is a Python port. Instead of py311-tokenizers listed in the above command, you can pick from the names under the Packages section.- PKGNAME: py311-tokenizers
- Package flavors (<flavor>: <package>)
- distinfo:
- TIMESTAMP = 1733450061
SHA256 (tokenizers-0.21.0.tar.gz) = ee0894bf311b75b0c03079f33859ae4b2334d675d4e93f5a4132e1eae2834fe4
SIZE (tokenizers-0.21.0.tar.gz) = 343021
Packages (timestamps in pop-ups are UTC):
- Dependencies
- NOTE: FreshPorts displays only information on required and default dependencies. Optional dependencies are not covered.
- Build dependencies:
-
- py311-maturin>=1.0<2.0 : devel/py-maturin@py311
- rust>=1.83.0 : lang/rust
- pkgconf>=1.3.0_1 : devel/pkgconf
- python3.11 : lang/python311
- py311-build>=0 : devel/py-build@py311
- py311-installer>=0 : devel/py-installer@py311
- Test dependencies:
-
- oniguruma.pc : devel/oniguruma
- python3.11 : lang/python311
- Runtime dependencies:
-
- py311-huggingface-hub>=0.16.4<1.0 : misc/py-huggingface-hub@py311
- python3.11 : lang/python311
- This port is required by:
- for Run
-
- misc/py-aider-chat
- misc/py-anthropic
- misc/py-litellm
Configuration Options:
- No options to configure
- Options name:
- textproc_py-tokenizers
- USES:
- cargo python
- FreshPorts was unable to extract/find any pkg message
- Master Sites:
|