107 Commits

Author SHA1 Message Date
UniDoc Build
96640edbe3 prepare release 2022-07-13 21:28:43 +00:00
UniDoc Build
ad2a915d0a prepare release 2022-06-27 19:58:38 +00:00
UniDoc Build
e12cd12d02 prepare release 2022-06-06 22:48:24 +00:00
UniDoc Build
7101928e27 prepare release 2022-04-27 00:10:33 +00:00
UniDoc Build
aaa8a1d860 prepare release 2022-03-13 12:41:53 +00:00
UniDoc Build
dfadfc1b51 prepare release 2022-02-05 21:34:53 +00:00
UniDoc Build
100631484f prepare release 2021-12-14 01:08:28 +00:00
UniDoc Build
804e0287b4 prepare release 2021-10-22 10:53:20 +00:00
UniDoc Build
b3f338f7a4 prepare release 2021-09-23 22:37:42 +00:00
UniDoc Build
49979c7312 prepare release 2021-08-13 01:33:42 +00:00
UniDoc Build
60f464c58f prepare release 2021-07-30 00:21:16 +00:00
UniDoc Build
9d8efb87a8 prepare release 2021-06-21 14:01:56 +00:00
UniDoc Build
edb7c66944 prepare release 2021-05-31 17:17:31 +00:00
UniDoc Build
aa9968c6af prepare release 2021-05-11 00:01:27 +00:00
UniDoc Build
b221a76c5e prepare release 2021-04-23 20:28:14 +00:00
UniDoc Build
596e8b8b8a prepare release 2021-04-17 13:46:54 +00:00
UniDoc Build
dada0fe1d4 prepare release 2021-04-06 22:35:37 +00:00
UniDoc Build
e309710fcd prepare release 2021-03-23 23:12:52 +00:00
UniDoc Build
9a2a3ba8f6 prepare release 2021-03-13 21:28:23 +00:00
UniDoc Build
ec7f5e55c3 prepare release 2021-02-22 02:29:48 +00:00
UniDoc Build
8b10191fd5 prepare release 2021-02-11 10:35:13 +00:00
UniDoc Build
4b16f3c2ce prepare release 2021-01-26 01:31:56 +00:00
UniDoc Build
6ec1f6abf1 prepare release 2021-01-07 14:20:10 +00:00
UniDoc Build
ec282cd9c5 prepare release 2020-12-06 13:03:03 +00:00
UniDoc Build
bafd659395 prepare release 2020-11-23 22:15:56 +00:00
UniDoc Build
79e32364de prepare release 2020-11-11 18:48:37 +00:00
UniDoc Build
22540b937c prepare release 2020-10-19 10:58:10 +00:00
UniDoc Build
56a210342e prepare release 2020-10-12 14:17:59 +00:00
UniDoc Build
87cbc66cbd prepare release 2020-10-05 19:28:24 +00:00
UniDoc Build
22ca2c0eed prepare release 2020-09-28 23:18:17 +00:00
UniDoc Build
9107a86674 prepare release 2020-09-21 01:20:10 +00:00
UniDoc Build
b991a36456 prepare release 2020-09-14 09:32:45 +00:00
UniDoc Build
fd3b669a36 prepare release 2020-09-07 00:23:12 +00:00
UniDoc Build
61b6580cb9 prepare release 2020-08-31 21:12:07 +00:00
UniDoc Build
1501d07a74 prepare release 2020-08-27 21:45:09 +00:00
Peter Williams
88fda44e0a
Text extraction code for columns. (#366)
* Fixed filename:page in logging

* Got CMap working for multi-rune entries

* Treat CMap entries as strings instead of runes to handle multi-byte encodings.

* Added a test for multibyte encoding.

* First version of text extraction that recognizes columns

* Added an expanation of the text columns code to README.md.

* fixed typos

* Abstracted textWord depth calculation. This required change textMark to *textMark in a lot of code.

* Added function comments.

* Fixed text state save/restore.

* Adjusted inter-word search distance to make paragrah division work for thanh.pdf

* Got text_test.go passing.

* Reinstated hyphen suppression

* Handle more cases of fonts not being set in text extraction code.

* Fixed typo

* More verbose logging

* Adding tables to text extractor.

* Added tests for columns extraction.

* Removed commented code

* Check for textParas that are on the same line when writing out extracted text.

* Absorb text to the left of paras into paras e.g. Footnote numbers

* Removed funny character from text_test.go

* Commented out a creator_test.go test that was broken by my text extraction changes.

* Big changes to columns text extraction code for PR.

Performance improvements in several places.
Commented code.

* Updated extractor/README

* Cleaned up some comments and removed a panic

* Increased threshold for truncating extracted text when there is no license 100 -> 102.

This is a workaround to let a test in creator_test.go pass.

With the old text extraction code the following extracted text was 100 chars. With the new code it
is 102 chars which looks correct.

"你好\n你好你好你好你好\n河上白云\n\nUnlicensed UniDoc - Get a license on https://unidoc.io\n\n"

* Improved an error message.

* Removed irrelevant spaces

* Commented code and removed unused functions.

* Reverted PdfRectangle changes

* Added duplicate text detection.

* Combine diacritic textMarks in text extraction

* Reinstated a diacritic recombination test.

* Small code reorganisation

* Reinstated handling of rotated text

* Addressed issues in PR review

* Added color fields to TextMark

* Updated README

* Reinstated the disabled tests I missed before.

* Tightened definition for tables to prevent detection of tables where there weren't any.

* Compute line splitting search range based on fontsize of first word in word bag.

* Use errors.Is(err, core.ErrNotSupported) to distinguish unsupported font errorrs.

See https://blog.golang.org/go1.13-errors

* Fixed some naming and added some comments.

* errors.Is -> xerrors.Is and %w -> %v for go 1.12 compatibility

* Removed code that doesn't ever get called.

* Removed unused test
2020-06-30 19:33:10 +00:00
Adrian-George Bostan
54e965785b
Add cached Stream method for CMap objects (#382)
* Add cached Stream method for CMaps

* Use CMap Stream method when creating font PDF dictionary objects
2020-06-27 00:30:18 +00:00
Adrian-George Bostan
7bf2f62c3b
Skip referenced pages which are not present in the catalog (#377)
* Skip referenced pages which are not present in the catalog

* Improve documentation for the copyObject method of the writer

* Add creator test case for checking referenced page destinations
2020-06-18 15:06:06 +00:00
Gunnsteinn Hall
11f692bc3a
Font subsetting and font optimization improvements (#362)
* Track runes in IdentityEncoder (for subsetting), track decoded runes

* Working with the identity encoder in font_composite.go

* Add GetFilterArray to multi encoder.  Add comments.

* Add NewFromContents constructor to extractor only requiring contents and resources

* golint fixes

* Optimizer compress streams - improved detection of raw streams

* Optimize - CleanContentStream optimizer that removes redundant operands

* WIP Optimize - clean fonts

Will support both font file reduction and subsetting. (WIP)

* Optimize - image processing - try combined DCT and Flate

* Update options.go

* Update optimizer.go

* Create utils.go for optimize with common methods needed for optimization

* Optimizer - add font subsetting method

Covers XObject Forms, annotaitons etc.  Uses extractor package to extract text marks covering what fonts and glyphs are used.  Package truetype used for subsetting.

* Add some comments

* Fix cmap parsing rune conversion

* Error checking for extractor.  Add some comments.

* Update Jenkinsfile

* Update modules
2020-06-16 21:19:10 +00:00
Adrian-George Bostan
99ef1b861d
Combo field appearance (#370)
* Fix combo field appearances not being shown

* Fix V object type for choice and button fields

* Refactor form fill for combo and checkbox fields

* Add fill test case for text, combo and checkbox fields

* Prevent panic when flattening forms using a nil appearance generator
2020-06-10 16:58:00 +00:00
Adrian-George Bostan
6b8d5c42f7
Fix outline null object check (#367) 2020-06-05 11:46:55 +00:00
Peter Williams
5777ee1394
Handle multibyte entries in CMaps. (#353)
* Fixed filename:page in logging

* Got CMap working for multi-rune entries

* Treat CMap entries as strings instead of runes to handle multi-byte encodings.

* Added a test for multibyte encoding.

* Changed rune->CharCode maps to string->CharCode.

* Removed unintentional changes.

* Updated comments to match new function definitions.

* Changed some []rune APIs to string

* Fixes for reviewer comments.
2020-06-03 13:55:15 +00:00
Adrian-George Bostan
5efaa02e23
Use page indirect object for internal outline destinations (#359)
* Use page indirect object for internal outlines

* Use page indirect object in creator outline destinations

* Adapt creator test case to test outline creation and retrieval
2020-05-22 16:19:43 +00:00
Adrian-George Bostan
d2941b5477
Add reader method for checking if the AcroForm needs repair (#356)
* Add AcroFormNeeds repair method

* Add AcroForm repair check test case
2020-05-20 16:04:02 +00:00
Adrian-George Bostan
80d51c5532
Add reader AcroForm repair functionality (#351)
* Add method for retrieving widget parent form field

* Add reader method for repairing AcroForm

* Add AcroForm repair test case

* Add AcroForm repair options

* RepairAcroForm documentation improvements
2020-05-19 12:42:07 +00:00
Gunnsteinn Hall
ad2a1e9c9d
Subsetting fixes (#346)
* Update unitype lib which improves subsetting

* Add text extraction check to creator font subsetting example

Helps ensure ToUnicode map is set correctly.

* Clean up import

* Fix spelling
2020-05-12 07:15:09 +00:00
Adrian-George Bostan
aef6e5e976
Fix CMap generation and serialization for composite fonts (#344)
* Fix CMap charcode mapping serialization

* Improve CMap generation in the NewCompositePdfFontFromTTF function
2020-05-08 00:15:09 +00:00
Gunnsteinn Hall
9ef2f27694
Support for subsetting fonts (#335)
* Subsetting of TrueType CID fonts using unitype

* Simplify call to SubsetRegistered so can be done right after loading font via creator finalizer

* Add an EnableFontSubsetting function on the creator to simplify font subsetting for creator users
2020-05-05 00:17:27 +00:00
Adrian-George Bostan
d84d0c4375
Form fill fixes (#328)
* Parse form fields with embedded widget annotations

* Try matching fields both by partial and full names on form fill

* Use default font if widget font is not found when generating appearance

* Add JSON extract and fill test case
2020-04-24 16:48:06 +00:00
Adrian-George Bostan
cb0166e96b
Add low level PageLabels support (#325)
* Add reader method for retriving the PageLabels entry from the catalog
* Add writer method for setting the PageLabels entry in the catalog.
* Add creator method for adding page labels for the output file
* Add creator page labels test case
* Minor page labels test case correction
2020-04-22 21:17:33 +00:00