* Track runes in IdentityEncoder (for subsetting), track decoded runes
* Working with the identity encoder in font_composite.go
* Add GetFilterArray to multi encoder. Add comments.
* Add NewFromContents constructor to extractor only requiring contents and resources
* golint fixes
* Optimizer compress streams - improved detection of raw streams
* Optimize - CleanContentStream optimizer that removes redundant operands
* WIP Optimize - clean fonts
Will support both font file reduction and subsetting. (WIP)
* Optimize - image processing - try combined DCT and Flate
* Update options.go
* Update optimizer.go
* Create utils.go for optimize with common methods needed for optimization
* Optimizer - add font subsetting method
Covers XObject Forms, annotaitons etc. Uses extractor package to extract text marks covering what fonts and glyphs are used. Package truetype used for subsetting.
* Add some comments
* Fix cmap parsing rune conversion
* Error checking for extractor. Add some comments.
* Update Jenkinsfile
* Update modules
* Fixed filename:page in logging
* Got CMap working for multi-rune entries
* Treat CMap entries as strings instead of runes to handle multi-byte encodings.
* Added a test for multibyte encoding.
* Changed rune->CharCode maps to string->CharCode.
* Removed unintentional changes.
* Updated comments to match new function definitions.
* Changed some []rune APIs to string
* Fixes for reviewer comments.
* Added text bounding box extraction.
* Add `font` field to textMark struct;
Create a new method `TextComponents` to retrieve all the text components of the extracted text in the page, with position and character informations
* Reorganizing extractor/text.go
* Added a text extraction position test.
* Added another text extraction location test.
* Text extraction location testing.
* Added tests for text extraction with location information.
* Cleaned up text extraction tests. No changes to functionality.
* Simplifying text extraction code.
* Simplified line construction in text.go
* Returning TextMark's in TextMarkArray which are based on PdfObjectArray but read-only, so not pointers.
* Added text extraction to show PDFs marked-up with bounding boxes of substring in extracted text.
* Add comments explaining how to calculate text bounding boxes.
* Made text_test.go naming consistent with function comments in text.go
* Use tm, pt, tl for textMark/TextMark PageText and TextLine receivers and local variables.
* uncommeted text stress test. Use go test --short to skip
* TextMark.Offset is now an index into the extracted text. It was an index into []rune(text)