Port details |
- py-textract Extract text from any document
- 1.6.5_3 textproc
=0 Version of this port present on the latest quarterly branch. - Maintainer: DtxdF@disroot.org
 - Port Added: 2022-10-25 20:51:06
- Last Update: 2023-01-30 13:02:41
- Commit Hash: f5e6e81
- Also Listed In: python
- License: MIT
- Description:
- textract provides a single interface for extracting content embedded
from Word documents, PowerPoint presentations, PDFs and much more,
which can be used for further textual analysis and visualization.
¦ ¦ ¦ ¦ 
- pkg-plist: as obtained via:
make generate-plist - There is no configure plist information for this port.
- Dependency lines:
-
- ${PYTHON_PKGNAMEPREFIX}textract>0:textproc/py-textract@${PY_FLAVOR}
- To install the port:
- cd /usr/ports/textproc/py-textract/ && make install clean
- To add the package, run one of these commands:
- pkg install textproc/py-textract
- pkg install py39-textract
NOTE: If this package has multiple flavors (see below), then use one of them instead of the name specified above. NOTE: This is a Python port. Instead of py39-textract listed in the above command, you can pick from the names under the Packages section.- PKGNAME: py39-textract
- Package flavors (<flavor>: <package>)
- distinfo:
- TIMESTAMP = 1659835075
SHA256 (textract-1.6.5.tar.gz) = 68f0f09056885821e6c43d8538987518daa94057c306679f2857cc5ee66ad850
SIZE (textract-1.6.5.tar.gz) = 17871
Packages (timestamps in pop-ups are UTC):
- Dependencies
- NOTE: FreshPorts displays only information on required and default dependencies. Optional dependencies are not covered.
- Build dependencies:
-
- py39-setuptools>=63.1.0 : devel/py-setuptools@py39
- python3.9 : lang/python39
- Test dependencies:
-
- python3.9 : lang/python39
- Runtime dependencies:
-
- py39-argcomplete>=1.10.0 : devel/py-argcomplete@py39
- py39-chardet>=3 : textproc/py-chardet@py39
- py39-six>1.12.0 : devel/py-six@py39
- antiword>0 : textproc/antiword
- py39-beautifulsoup>=4.8.0 : www/py-beautifulsoup@py39
- py39-docx2txt>=0.8 : textproc/py-docx2txt@py39
- ffmpeg>0 : multimedia/ffmpeg
- flac>0 : audio/flac
- jpeg-turbo>0 : graphics/jpeg-turbo
- lame>0 : audio/lame
- py39-libxml2>0 : textproc/py-libxml2@py39
- libxslt>=1.1.15 : textproc/libxslt
- py39-extract-msg>=0.29 : textproc/py-extract-msg@py39
- poppler-utils>0 : graphics/poppler-utils
- py39-python-pptx>=0.6.18 : textproc/py-python-pptx@py39
- pstotext>0 : print/pstotext
- sox>0 : audio/sox
- py39-speechrecognition>=3.8.1 : audio/py-speechrecognition@py39
- py39-xlrd>=1.2.0 : textproc/py-xlrd@py39
- tesseract>0 : graphics/tesseract
- unrtf>0 : textproc/unrtf
- py39-setuptools>=63.1.0 : devel/py-setuptools@py39
- python3.9 : lang/python39
- There are no ports dependent upon this port
Configuration Options:
- ===> The following configuration options are available for py39-textract-1.6.5_3:
ANTIWORD=on: DOC document support
BEAUTIFULSOUP=on: HTML parsing library
DOCX2TXT=on: DOCX document support
LIBXML2=on: Python interface for XML parser library
LIBXSLT=on: XML stylesheet transformation library
MSG=on: MS Outlook MSG file format support
PPTX=on: MS PowerPoint PPTX presentations support
PS=on: PostScript document support
SPREADSHEET=on: XLS and XLSX spreadsheet support
UNRTF=on: RTF document support
====> Options available for the group AUDIO
FFMPEG=on: FFmpeg support (WMA, AIFF, AC3, APE...)
FLAC=on: FLAC lossless audio codec support
LAME=on: LAME MP3 audio encoder support
POCKETSPHINX=off: Interface to CMU Sphinxbase and Pocketsphinx
SOX=on: Command-line audio processing tool
SPEECH_RECOGNITION=on: Python library for performing speech recognition
====> Options available for the group OCR
JPEG_TURBO=on: SIMD-accelerated JPEG codec
TESSERACT=on: Commercial quality open source OCR engine
====> PDF document support
PDFMINER=off: PDF parser and analyzer
PDFTOTEXT=on: Extract text from a PDF document
===> Use 'make config' to modify these settings
- Options name:
- textproc_py-textract
- USES:
- python:3.8+
- FreshPorts was unable to extract/find any pkg message
- Master Sites:
|
Notes from UPDATING |
- These upgrade notes are taken from /usr/ports/UPDATING
- 2017-11-30
Affects: */py* Author: mat@FreeBSD.org Reason:
Ports using Python via USES=python are now flavored. All the py3-* ports
have been removed and folded into their py-* master ports.
People using Poudriere 3.2+ and binary packages do not have to do anything.
For other people, to build the Python 3.6 version of, for example,
databases/py-gdbm, you need to run:
# make FLAVOR=py36 install
|
Number of commits found: 7
Commit History - (may be incomplete: for full details, see links to repositories near top of page) |
Commit | Credits | Log message |
1.6.5_3 30 Jan 2023 13:02:41
    |
Po-Chuan Hsieh (sunpoet)  |
textproc/py-textract: Add NO_ARCH
- While I'm here, fix indent
Approved by: portmgr (blanket) |
1.6.5_3 30 Jan 2023 12:59:34
    |
Po-Chuan Hsieh (sunpoet)  |
audio/py-speechrecognition: Update to 3.9.0
- Update PORTNAME: use lowercase
- Change MASTER_SITES from GitHub to PYPI
- Update version requirement of RUN_DEPENDS
- Take maintainership
Changes: https://github.com/Uberi/speech_recognition/releases |
1.6.5_3 11 Jan 2023 15:58:34
    |
Dmitry Marakasov (amdmi3)  |
*/*: rename CHEESESHOP to PYPI in MASTER_SITES
PR: 267994
Differential revision: D37518
Approved by: bapt |
1.6.5_3 09 Jan 2023 12:37:17
    |
Tobias C. Berner (tcberner)  |
graphics/poppler: bump dependencies
Follow-up to 9b78681895a5a5b7225299242098f7f2f27d959c |
1.6.5_2 08 Dec 2022 05:45:34
    |
Tobias C. Berner (tcberner)  |
graphics/poppler: bump dependencies |
1.6.5_1 08 Nov 2022 05:07:17
    |
Tobias C. Berner (tcberner)  |
graphics/poppler: bump PORTREVISION of dependencies
- after update to 22.11 in d01d0d73b169 |
1.6.5 25 Oct 2022 20:49:12
    |
Li-Wen Hsu (lwhsu)  Author: Jesús Daniel Colmenares Oviedo |
Add textproc/py-textract: Extract text from any document
textract provides a single interface for extracting content embedded
from Word documents, PowerPoint presentations, PDFs and much more,
which can be used for further textual analysis and visualization.
WWW: https://github.com/deanmalmgren/textract
PR: 265768 |
Number of commits found: 7
|