* Track runes in IdentityEncoder (for subsetting), track decoded runes
* Working with the identity encoder in font_composite.go
* Add GetFilterArray to multi encoder. Add comments.
* Add NewFromContents constructor to extractor only requiring contents and resources
* golint fixes
* Optimizer compress streams - improved detection of raw streams
* Optimize - CleanContentStream optimizer that removes redundant operands
* WIP Optimize - clean fonts
Will support both font file reduction and subsetting. (WIP)
* Optimize - image processing - try combined DCT and Flate
* Update options.go
* Update optimizer.go
* Create utils.go for optimize with common methods needed for optimization
* Optimizer - add font subsetting method
Covers XObject Forms, annotaitons etc. Uses extractor package to extract text marks covering what fonts and glyphs are used. Package truetype used for subsetting.
* Add some comments
* Fix cmap parsing rune conversion
* Error checking for extractor. Add some comments.
* Update Jenkinsfile
* Update modules
* Fixed filename:page in logging
* Got CMap working for multi-rune entries
* Treat CMap entries as strings instead of runes to handle multi-byte encodings.
* Added a test for multibyte encoding.
* Changed rune->CharCode maps to string->CharCode.
* Removed unintentional changes.
* Updated comments to match new function definitions.
* Changed some []rune APIs to string
* Fixes for reviewer comments.
* Update unitype lib which improves subsetting
* Add text extraction check to creator font subsetting example
Helps ensure ToUnicode map is set correctly.
* Clean up import
* Fix spelling
* Subsetting of TrueType CID fonts using unitype
* Simplify call to SubsetRegistered so can be done right after loading font via creator finalizer
* Add an EnableFontSubsetting function on the creator to simplify font subsetting for creator users
* Fixed issue #243. Added optimize integration tests.
* Minor style change.
* XObjImage getParamsDict updates Columns and Rows.
* Added doc file for the optimize/tests package.
* UpdateParams for CCITTFax Encoder accepts Width and Height also. Removed
GetParamsDict Columns and Rows parameters from model.Image and
model.XObjImage.
* Fix i386 issue for the jbig2 arithmetic encoder.
* Added 386 architecture to the .travis/cross_build.sh
* Add render package
* Add text state
* Add more text operators
* Remove unnecessary files
* Add text font
* Add custom text render method
* Improve text rendering method
* Rename text state methods
* Refactor and document context interface
* Refact text begin/end operators
* Fix graphics state transformations
* Keep original font when doing font substitution
* Take page cropbox into account
* Revert to substitution font if original font measurement is 0
* Add font substitution package
* Implement addition transform.Point methods
* Use transform.Point in the image context package
* Remove unneeded functionality from the render image package
* Fix golint notices in the image rendering package
* Fix go vet notices in the render package
* Fix golint notices in the top-level render package
* Improve render context package documentation
* Document context text state struct.
* Document context text font struct.
* Minor logging improvements
* Add license disclaimer to the render package files
* Avoid using package aliases where possible
* Change style of section comments
* Adapt render package import style to follow the developer guide
* Improve documentation for the internal matrix implementation
* Update render package dependency versions
* Apply crop box post render
* Account for offseted media boxes
* Improve metrics of rendered characters
* Fix text matrix translation
* Change priority of fonts used for measuring rendered characters
* Skip invalid m and l operators on image rendering
* Small fix for v operator
* Fix rendered characters spacing issues
* Refactor naming of internal render packages
* Fixed PdfColorspaceSpecialIndexed.ImageToRGB() Fixes https://github.com/unidoc/unipdf/issues/258
* Fixed indexed colorspace bounds checking.
* Being super cautious to prevent a divide by zero error. I don't think the base cs can have <1 cpts.
* Updated image hash in extract_images_test.go to match new indexed colorspace code.
* add testfile from unipdf#258
* Add PdfFont method for encoding runes to charcode bytes
* Add getter method for CMap nbits
* Take CMap nbits into account when encoding text
* Adapt font test cases to include text encoding testing
* Add packed predefined cmaps
* Add cmap cid range parsing
* Load base cmap for predefined cmaps
* Refactor pdfFont to Unicode methods
* Preserve CharcodeBytesToUnicode behavior
* Add support for CID-keyed Type 0 fonts
* Add method documentation for the cmap package
* Refactor and document charcode to Unicode conversion code
* Add more cmap parsing test cases
* Add more method documentation in the cmap package.
* Remove unused code from the bcmaps package
* Improve cmap test case
* Assume identity when encoder is missing on regenerating field appearance
* Add missing encoder log message
* Add inverse CMap mappings
* Add CMap encoder
* Address golint notices and small fix in the cmap package
* Keep smaller charcodes when generating cmap inverse mappings
* Update extractor test case
* Keep latest supplement charcodes/CIDs when computing inverse mappings
* Fix comment typo
* Take decode arrays into account when processing grayscale images
* Adapt image extraction test case hashes
* Minor refactoring in the ColorAt image method
* Always return vanilla data from the jbig2 decoder
* Add ColorAt method for images
* Avoid resample on image to Go image conversion
* Avoid resample when converting grayscale image to RGB
* Preserve old behavior of image to Go image conversion
* Add missing case in the ToGoImage method
* Fix grayscale to RGB image conversion
* Improve code documentation
* Fix color extraction for CMYK and 4 bit RGB
* Add test case for the ColorAt image method
* Avoid resampling when converting CMYK image to RGB
* Add notice comment for the GetSamples/SetSamples image methods
* Add extract images test case, with memory profiling
* Use TotalAlloc insted of Alloc for memory profiling
* Remove calls to debug.FreeOSMemory from test cases
* Added text bounding box extraction.
* Add `font` field to textMark struct;
Create a new method `TextComponents` to retrieve all the text components of the extracted text in the page, with position and character informations
* Reorganizing extractor/text.go
* Added a text extraction position test.
* Added another text extraction location test.
* Text extraction location testing.
* Added tests for text extraction with location information.
* Cleaned up text extraction tests. No changes to functionality.
* Simplifying text extraction code.
* Simplified line construction in text.go
* Returning TextMark's in TextMarkArray which are based on PdfObjectArray but read-only, so not pointers.
* Added text extraction to show PDFs marked-up with bounding boxes of substring in extracted text.
* Add comments explaining how to calculate text bounding boxes.
* Made text_test.go naming consistent with function comments in text.go
* Use tm, pt, tl for textMark/TextMark PageText and TextLine receivers and local variables.
* uncommeted text stress test. Use go test --short to skip
* TextMark.Offset is now an index into the extracted text. It was an index into []rune(text)
Manually verified that output PDFs look good and leave hash check to detect change. If there is a change in the future, the hash change will trigger a failure upon which the output PDFs need to be re-checked and hashes updated if appropriate.
* Make PdfObjectDictionary Merge method chainable
* Update page resources Font dictionary when applying license information
* Add license font to the page resources only when it does not exist
* Update hash for split test after verification