7 Commits

Author SHA1 Message Date
Gunnsteinn Hall
11f692bc3a
Font subsetting and font optimization improvements (#362)
* Track runes in IdentityEncoder (for subsetting), track decoded runes

* Working with the identity encoder in font_composite.go

* Add GetFilterArray to multi encoder.  Add comments.

* Add NewFromContents constructor to extractor only requiring contents and resources

* golint fixes

* Optimizer compress streams - improved detection of raw streams

* Optimize - CleanContentStream optimizer that removes redundant operands

* WIP Optimize - clean fonts

Will support both font file reduction and subsetting. (WIP)

* Optimize - image processing - try combined DCT and Flate

* Update options.go

* Update optimizer.go

* Create utils.go for optimize with common methods needed for optimization

* Optimizer - add font subsetting method

Covers XObject Forms, annotaitons etc.  Uses extractor package to extract text marks covering what fonts and glyphs are used.  Package truetype used for subsetting.

* Add some comments

* Fix cmap parsing rune conversion

* Error checking for extractor.  Add some comments.

* Update Jenkinsfile

* Update modules
2020-06-16 21:19:10 +00:00
Peter Williams
5777ee1394
Handle multibyte entries in CMaps. (#353)
* Fixed filename:page in logging

* Got CMap working for multi-rune entries

* Treat CMap entries as strings instead of runes to handle multi-byte encodings.

* Added a test for multibyte encoding.

* Changed rune->CharCode maps to string->CharCode.

* Removed unintentional changes.

* Updated comments to match new function definitions.

* Changed some []rune APIs to string

* Fixes for reviewer comments.
2020-06-03 13:55:15 +00:00
Gunnsteinn Hall
ad2a1e9c9d
Subsetting fixes (#346)
* Update unitype lib which improves subsetting

* Add text extraction check to creator font subsetting example

Helps ensure ToUnicode map is set correctly.

* Clean up import

* Fix spelling
2020-05-12 07:15:09 +00:00
Gunnsteinn Hall
9ef2f27694
Support for subsetting fonts (#335)
* Subsetting of TrueType CID fonts using unitype

* Simplify call to SubsetRegistered so can be done right after loading font via creator finalizer

* Add an EnableFontSubsetting function on the creator to simplify font subsetting for creator users
2020-05-05 00:17:27 +00:00
Adrian-George Bostan
9de5fe644e
Add PdfFont text encoding methods (#257)
* Add PdfFont method for encoding runes to charcode bytes
* Add getter method for CMap nbits
* Take CMap nbits into account when encoding text
* Adapt font test cases to include text encoding testing
2020-02-17 22:54:20 +00:00
Adrian-George Bostan
e2b3c6e6ba
Add predefined CMaps for Type 0 composite fonts (#246)
* Add packed predefined cmaps
* Add cmap cid range parsing
* Load base cmap for predefined cmaps
* Refactor pdfFont to Unicode methods
* Preserve CharcodeBytesToUnicode behavior
* Add support for CID-keyed Type 0 fonts
* Add method documentation for the cmap package
* Refactor and document charcode to Unicode conversion code
* Add more cmap parsing test cases
* Add more method documentation in the cmap package.
* Remove unused code from the bcmaps package
* Improve cmap test case
* Assume identity when encoder is missing on regenerating field appearance
* Add missing encoder log message
* Add inverse CMap mappings
* Add CMap encoder
* Address golint notices and small fix in the cmap package
* Keep smaller charcodes when generating cmap inverse mappings
* Update extractor test case
* Keep latest supplement charcodes/CIDs when computing inverse mappings
* Fix comment typo
2020-02-07 19:56:30 +00:00
Adrian-George Bostan
c64812093d Remmove pdf folder and move packages up one level (#2) 2019-05-16 20:44:51 +00:00