37 Commits

Author SHA1 Message Date
UniDoc Build
d287d85878 prepare release 2024-12-20 06:39:10 +00:00
UniDoc Build
ac4fe09ce0 prepare release 2024-06-27 16:15:49 +00:00
UniDoc Build
51251b1e5f prepare release 2024-04-16 11:40:43 +00:00
UniDoc Build
006d8524e7 prepare release 2024-03-27 22:34:33 +00:00
UniDoc Build
4cb53dbb7b prepare release 2024-02-11 21:29:32 +00:00
UniDoc Build
a8fa52b222 prepare release 2024-01-22 01:16:41 +00:00
UniDoc Build
22e9f4bade prepare release 2023-12-17 13:54:01 +00:00
UniDoc Build
97e47ce77b prepare release 2023-11-11 11:29:03 +00:00
UniDoc Build
89a1ba3e48 prepare release 2023-10-07 13:58:01 +00:00
UniDoc Build
87c1e49788 prepare release 2023-08-03 17:30:04 +00:00
UniDoc Build
854d57a737 prepare release 2023-04-06 19:57:40 +00:00
UniDoc Build
e274047f4f prepare release 2022-12-15 21:59:56 +00:00
UniDoc Build
930693130b prepare release 2022-09-10 15:35:04 +00:00
UniDoc Build
96640edbe3 prepare release 2022-07-13 21:28:43 +00:00
UniDoc Build
ad2a915d0a prepare release 2022-06-27 19:58:38 +00:00
UniDoc Build
7101928e27 prepare release 2022-04-27 00:10:33 +00:00
UniDoc Build
aaa8a1d860 prepare release 2022-03-13 12:41:53 +00:00
UniDoc Build
dfadfc1b51 prepare release 2022-02-05 21:34:53 +00:00
UniDoc Build
100631484f prepare release 2021-12-14 01:08:28 +00:00
UniDoc Build
804e0287b4 prepare release 2021-10-22 10:53:20 +00:00
UniDoc Build
22540b937c prepare release 2020-10-19 10:58:10 +00:00
UniDoc Build
1501d07a74 prepare release 2020-08-27 21:45:09 +00:00
Peter Williams
88fda44e0a
Text extraction code for columns. (#366)
* Fixed filename:page in logging

* Got CMap working for multi-rune entries

* Treat CMap entries as strings instead of runes to handle multi-byte encodings.

* Added a test for multibyte encoding.

* First version of text extraction that recognizes columns

* Added an expanation of the text columns code to README.md.

* fixed typos

* Abstracted textWord depth calculation. This required change textMark to *textMark in a lot of code.

* Added function comments.

* Fixed text state save/restore.

* Adjusted inter-word search distance to make paragrah division work for thanh.pdf

* Got text_test.go passing.

* Reinstated hyphen suppression

* Handle more cases of fonts not being set in text extraction code.

* Fixed typo

* More verbose logging

* Adding tables to text extractor.

* Added tests for columns extraction.

* Removed commented code

* Check for textParas that are on the same line when writing out extracted text.

* Absorb text to the left of paras into paras e.g. Footnote numbers

* Removed funny character from text_test.go

* Commented out a creator_test.go test that was broken by my text extraction changes.

* Big changes to columns text extraction code for PR.

Performance improvements in several places.
Commented code.

* Updated extractor/README

* Cleaned up some comments and removed a panic

* Increased threshold for truncating extracted text when there is no license 100 -> 102.

This is a workaround to let a test in creator_test.go pass.

With the old text extraction code the following extracted text was 100 chars. With the new code it
is 102 chars which looks correct.

"你好\n你好你好你好你好\n河上白云\n\nUnlicensed UniDoc - Get a license on https://unidoc.io\n\n"

* Improved an error message.

* Removed irrelevant spaces

* Commented code and removed unused functions.

* Reverted PdfRectangle changes

* Added duplicate text detection.

* Combine diacritic textMarks in text extraction

* Reinstated a diacritic recombination test.

* Small code reorganisation

* Reinstated handling of rotated text

* Addressed issues in PR review

* Added color fields to TextMark

* Updated README

* Reinstated the disabled tests I missed before.

* Tightened definition for tables to prevent detection of tables where there weren't any.

* Compute line splitting search range based on fontsize of first word in word bag.

* Use errors.Is(err, core.ErrNotSupported) to distinguish unsupported font errorrs.

See https://blog.golang.org/go1.13-errors

* Fixed some naming and added some comments.

* errors.Is -> xerrors.Is and %w -> %v for go 1.12 compatibility

* Removed code that doesn't ever get called.

* Removed unused test
2020-06-30 19:33:10 +00:00
Gunnsteinn Hall
1b1158ed94 Merge remote-tracking branch 'upstream/master' into dev-merge-master 2020-06-16 21:45:48 +00:00
Gunnsteinn Hall
11f692bc3a
Font subsetting and font optimization improvements (#362)
* Track runes in IdentityEncoder (for subsetting), track decoded runes

* Working with the identity encoder in font_composite.go

* Add GetFilterArray to multi encoder.  Add comments.

* Add NewFromContents constructor to extractor only requiring contents and resources

* golint fixes

* Optimizer compress streams - improved detection of raw streams

* Optimize - CleanContentStream optimizer that removes redundant operands

* WIP Optimize - clean fonts

Will support both font file reduction and subsetting. (WIP)

* Optimize - image processing - try combined DCT and Flate

* Update options.go

* Update optimizer.go

* Create utils.go for optimize with common methods needed for optimization

* Optimizer - add font subsetting method

Covers XObject Forms, annotaitons etc.  Uses extractor package to extract text marks covering what fonts and glyphs are used.  Package truetype used for subsetting.

* Add some comments

* Fix cmap parsing rune conversion

* Error checking for extractor.  Add some comments.

* Update Jenkinsfile

* Update modules
2020-06-16 21:19:10 +00:00
Gunnsteinn Hall
e8d29245a2 Prepare release v3.7.1 2020-05-25 23:07:17 +00:00
Gunnsteinn Hall
ad2a1e9c9d
Subsetting fixes (#346)
* Update unitype lib which improves subsetting

* Add text extraction check to creator font subsetting example

Helps ensure ToUnicode map is set correctly.

* Clean up import

* Fix spelling
2020-05-12 07:15:09 +00:00
Gunnsteinn Hall
9ef2f27694
Support for subsetting fonts (#335)
* Subsetting of TrueType CID fonts using unitype

* Simplify call to SubsetRegistered so can be done right after loading font via creator finalizer

* Add an EnableFontSubsetting function on the creator to simplify font subsetting for creator users
2020-05-05 00:17:27 +00:00
Alexey Pavlyukov
a69d788171
Add timestamp signature handler (#301)
* Add timestamp signature handler

* Add timestamp signature handler test

* fix PR issues

* fix PR issues

* fix PR issues

* Fix

Co-authored-by: Gunnsteinn Hall <gunnsteinn.hall@gmail.com>
2020-04-22 20:21:53 +00:00
Jacek Kucharczyk
c582323a8f
JBIG2 Generic Encoder (#264)
* Prepared skeleton and basic component implementations for the jbig2 encoding.

* Added Bitset. Implemented Bitmap.

* Decoder with old Arithmetic Decoder

* Partly working arithmetic

* Working arithmetic decoder.

* MMR patched.

* rebuild to apache.

* Working generic

* Working generic

* Decoded full document

* Update Jenkinsfile go version [master] (#398)

* Update Jenkinsfile go version

* Decoded AnnexH document

* Minor issues fixed.

* Update README.md

* Fixed generic region errors. Added benchmark. Added bitmap unpadder. Added Bitmap toImage method.

* Fixed endofpage error

* Added integration test.

* Decoded all test files without errors. Implemented JBIG2Global.

* Merged with v3 version

* Fixed the EOF in the globals issue

* Fixed the JBIG2 ChocolateData Decode

* JBIG2 Added license information

* Minor fix in jbig2 encoding.

* Applied the logging convention

* Cleaned unnecessary imports

* Go modules clear unused imports

* checked out the README.md

* Moved trace to Debug. Fixed the build integrate tag in the document_decode_test.go

* Initial encoder skeleton

* Applied UniPDF Developer Guide. Fixed lint issues.

* Cleared documentation, fixed style issues.

* Added jbig2 doc.go files. Applied unipdf guide style.

* Minor code style changes.

* Minor naming and style issues fixes.

* Minor naming changes. Style issues fixed.

* Review r11 fixes.

* Added JBIG2 Encoder skeleton.

* Moved Document and Page to jbig2/document package. Created decoder package responsible for decoding jbig2 stream.

* Implemented raster functions.

* Added raster uni low test funcitons.

* Added raster low test functions

* untracked files on jbig2-encoder: c869089 Added raster low test functions

* index on jbig2-encoder: c869089 Added raster low test functions

* Added morph files.

* implemented jbig2 encoder basics

* JBIG2 Encoder - Generic method

* Added jbig2 image encode ttests, black/white image tests

* cleaned and tested jbig2 package

* unfinished jbig2 classified encoder

* jbig2 minor style changes

* minor jbig2 encoder changes

* prepared JBIG2 Encoder

* Style and lint fixes

* Minor changes and lints

* Fixed shift unsinged value build errors

* Minor naming change

* Added jbig2 encode, image gondels. Fixed jbig2 decode bug.

* Provided jbig2 core.DecodeGlobals function.

* Fixed JBIG2Encoder `r6` revision issues.

* Removed public JBIG2Encoder document.

* Minor style changes

* added NewJBIG2Encoder function.

* fixed JBIG2Encoder 'r9' revision issues.

* Cleared 'r9' commented code.

* Updated ACKNOWLEDGEMENETS. Fixed JBIG2Encoder 'r10' revision issues.

Co-authored-by: Gunnsteinn Hall <gunnsteinn.hall@gmail.com>
2020-03-27 11:47:41 +00:00
Adrian-George Bostan
d961079c5d
Add basic image rendering support (#266)
* Add render package
* Add text state
* Add more text operators
* Remove unnecessary files
* Add text font
* Add custom text render method
* Improve text rendering method
* Rename text state methods
* Refactor and document context interface
* Refact text begin/end operators
* Fix graphics state transformations
* Keep original font when doing font substitution
* Take page cropbox into account
* Revert to substitution font if original font measurement is 0
* Add font substitution package
* Implement addition transform.Point methods
* Use transform.Point in the image context package
* Remove unneeded functionality from the render image package
* Fix golint notices in the image rendering package
* Fix go vet notices in the render package
* Fix golint notices in the top-level render package
* Improve render context package documentation
* Document context text state struct.
* Document context text font struct.
* Minor logging improvements
* Add license disclaimer to the render package files
* Avoid using package aliases where possible
* Change style of section comments
* Adapt render package import style to follow the developer guide
* Improve documentation for the internal matrix implementation
* Update render package dependency versions
* Apply crop box post render
* Account for offseted media boxes
* Improve metrics of rendered characters
* Fix text matrix translation
* Change priority of fonts used for measuring rendered characters
* Skip invalid m and l operators on image rendering
* Small fix for v operator
* Fix rendered characters spacing issues
* Refactor naming of internal render packages
2020-03-02 21:22:54 +00:00
Jacek Kucharczyk
e85616cec2 JBIG2Decoder implementation (#67)
* Prepared skeleton and basic component implementations for the jbig2 encoding.
* Added Bitset. Implemented Bitmap.
* Decoder with old Arithmetic Decoder
* Partly working arithmetic
* Working arithmetic decoder.
* MMR patched.
* rebuild to apache.
* Working generic
* Decoded full document
* Decoded AnnexH document
* Minor issues fixed.
* Update README.md
* Fixed generic region errors. Added benchmark. Added bitmap unpadder. Added Bitmap toImage method.
* Fixed endofpage error
* Added integration test.
* Decoded all test files without errors. Implemented JBIG2Global.
* Merged with v3 version
* Fixed the EOF in the globals issue
* Fixed the JBIG2 ChocolateData Decode
* JBIG2 Added license information
* Minor fix in jbig2 encoding.
* Applied the logging convention
* Cleaned unnecessary imports
* Go modules clear unused imports
* checked out the README.md
* Moved trace to Debug. Fixed the build integrate tag in the document_decode_test.go
* Applied UniPDF Developer Guide. Fixed lint issues.
* Cleared documentation, fixed style issues.
* Added jbig2 doc.go files. Applied unipdf guide style.
* Minor code style changes.
* Minor naming and style issues fixes.
* Minor naming changes. Style issues fixed.
* Review r11 fixes.
* Integrate jbig2 tests with build system
* Added jbig2 integration test golden files.
* Minor jbig2 integration test fix
* Removed jbig2 integration image assertions
* Fixed jbig2 rowstride issue. Implemented jbig2 bit writer
* Changed golden files logic. Fixes r13 issues.
2019-07-14 21:18:40 +00:00
Adrian-George Bostan
8acac88784 Update module version and import paths (#1)
* Update import path to use unipdf
* Update module name and version
2019-05-16 20:08:40 +00:00
Gunnsteinn Hall
3d22e17a91 Prepare release of v3.0.0-alpha.3 2019-03-28 17:19:39 +00:00
Gunnsteinn Hall
323dc5394f release v3.0.0-alpha.2 2019-02-07 12:17:54 +00:00
Denys Smirnov
622ae5668d textencoding: generate table for WinAnsi encoding from CP1252 2019-01-01 17:20:01 +02:00
Denys Smirnov
41af4a14eb list dependencies for dep and go modules 2018-11-29 01:15:19 +02:00