unipdf

mirror of https://github.com/unidoc/unipdf.git synced 2025-04-26 13:48:55 +08:00

Author	SHA1	Message	Date
Gunnsteinn Hall	11f692bc3a	Font subsetting and font optimization improvements (#362 ) * Track runes in IdentityEncoder (for subsetting), track decoded runes * Working with the identity encoder in font_composite.go * Add GetFilterArray to multi encoder. Add comments. * Add NewFromContents constructor to extractor only requiring contents and resources * golint fixes * Optimizer compress streams - improved detection of raw streams * Optimize - CleanContentStream optimizer that removes redundant operands * WIP Optimize - clean fonts Will support both font file reduction and subsetting. (WIP) * Optimize - image processing - try combined DCT and Flate * Update options.go * Update optimizer.go * Create utils.go for optimize with common methods needed for optimization * Optimizer - add font subsetting method Covers XObject Forms, annotaitons etc. Uses extractor package to extract text marks covering what fonts and glyphs are used. Package truetype used for subsetting. * Add some comments * Fix cmap parsing rune conversion * Error checking for extractor. Add some comments. * Update Jenkinsfile * Update modules	2020-06-16 21:19:10 +00:00
Peter Williams	5777ee1394	Handle multibyte entries in CMaps. (#353 ) * Fixed filename:page in logging * Got CMap working for multi-rune entries * Treat CMap entries as strings instead of runes to handle multi-byte encodings. * Added a test for multibyte encoding. * Changed rune->CharCode maps to string->CharCode. * Removed unintentional changes. * Updated comments to match new function definitions. * Changed some []rune APIs to string * Fixes for reviewer comments.	2020-06-03 13:55:15 +00:00
Gunnsteinn Hall	ad2a1e9c9d	Subsetting fixes (#346 ) * Update unitype lib which improves subsetting * Add text extraction check to creator font subsetting example Helps ensure ToUnicode map is set correctly. * Clean up import * Fix spelling	2020-05-12 07:15:09 +00:00
Adrian-George Bostan	aef6e5e976	Fix CMap generation and serialization for composite fonts (#344 ) * Fix CMap charcode mapping serialization * Improve CMap generation in the NewCompositePdfFontFromTTF function	2020-05-08 00:15:09 +00:00
Gunnsteinn Hall	9ef2f27694	Support for subsetting fonts (#335 ) * Subsetting of TrueType CID fonts using unitype * Simplify call to SubsetRegistered so can be done right after loading font via creator finalizer * Add an EnableFontSubsetting function on the creator to simplify font subsetting for creator users	2020-05-05 00:17:27 +00:00
Adrian-George Bostan	6678fc040a	Cache raw CMap data (#324 )	2020-04-21 21:53:36 +00:00
Gunnsteinn Hall	11f3a6e7a2	Fix for crash in CCITT decoder. Resolves https://github.com/unidoc/unipdf/issues/314 (#315 )	2020-04-16 23:05:50 +00:00
Adrian-George Bostan	d605803bd2	Prevent panics (#305 ) * Remove panic on font nil Differences array * Remove unused bcmaps function * Remove panics from the core/security/crypt package * Fix extractor invalid Do operand crash * Fix TTF parser crash for invalid hhea number of hMetrics * Remove ECB crypt panics * Remove standard_r6 panics * Remove panic from render package	2020-04-14 21:09:16 +00:00
Jacek Kucharczyk	ad0b31ea1b	Optimizer fix for the CCITTFax Encoder. ISS #243 . Fixes JBIG2 i386 architecture compile issue. (#297 ) * Fixed issue #243. Added optimize integration tests. * Minor style change. * XObjImage getParamsDict updates Columns and Rows. * Added doc file for the optimize/tests package. * UpdateParams for CCITTFax Encoder accepts Width and Height also. Removed GetParamsDict Columns and Rows parameters from model.Image and model.XObjImage. * Fix i386 issue for the jbig2 arithmetic encoder. * Added 386 architecture to the .travis/cross_build.sh	2020-04-08 11:11:49 +00:00
Jacek Kucharczyk	29efa30439	JBIG2 Encoder support for inserting binary images into PDF (#288 ) * Added JBIG2 PDF support * Added JBIG2 Encoder binary image requirements * PR #288 revision r1 fixes * PR #288 revision r2 fixes	2020-04-03 20:54:59 +00:00
Jacek Kucharczyk	c582323a8f	JBIG2 Generic Encoder (#264 ) * Prepared skeleton and basic component implementations for the jbig2 encoding. * Added Bitset. Implemented Bitmap. * Decoder with old Arithmetic Decoder * Partly working arithmetic * Working arithmetic decoder. * MMR patched. * rebuild to apache. * Working generic * Working generic * Decoded full document * Update Jenkinsfile go version [master] (#398) * Update Jenkinsfile go version * Decoded AnnexH document * Minor issues fixed. * Update README.md * Fixed generic region errors. Added benchmark. Added bitmap unpadder. Added Bitmap toImage method. * Fixed endofpage error * Added integration test. * Decoded all test files without errors. Implemented JBIG2Global. * Merged with v3 version * Fixed the EOF in the globals issue * Fixed the JBIG2 ChocolateData Decode * JBIG2 Added license information * Minor fix in jbig2 encoding. * Applied the logging convention * Cleaned unnecessary imports * Go modules clear unused imports * checked out the README.md * Moved trace to Debug. Fixed the build integrate tag in the document_decode_test.go * Initial encoder skeleton * Applied UniPDF Developer Guide. Fixed lint issues. * Cleared documentation, fixed style issues. * Added jbig2 doc.go files. Applied unipdf guide style. * Minor code style changes. * Minor naming and style issues fixes. * Minor naming changes. Style issues fixed. * Review r11 fixes. * Added JBIG2 Encoder skeleton. * Moved Document and Page to jbig2/document package. Created decoder package responsible for decoding jbig2 stream. * Implemented raster functions. * Added raster uni low test funcitons. * Added raster low test functions * untracked files on jbig2-encoder: c869089 Added raster low test functions * index on jbig2-encoder: c869089 Added raster low test functions * Added morph files. * implemented jbig2 encoder basics * JBIG2 Encoder - Generic method * Added jbig2 image encode ttests, black/white image tests * cleaned and tested jbig2 package * unfinished jbig2 classified encoder * jbig2 minor style changes * minor jbig2 encoder changes * prepared JBIG2 Encoder * Style and lint fixes * Minor changes and lints * Fixed shift unsinged value build errors * Minor naming change * Added jbig2 encode, image gondels. Fixed jbig2 decode bug. * Provided jbig2 core.DecodeGlobals function. * Fixed JBIG2Encoder `r6` revision issues. * Removed public JBIG2Encoder document. * Minor style changes * added NewJBIG2Encoder function. * fixed JBIG2Encoder 'r9' revision issues. * Cleared 'r9' commented code. * Updated ACKNOWLEDGEMENETS. Fixed JBIG2Encoder 'r10' revision issues. Co-authored-by: Gunnsteinn Hall <gunnsteinn.hall@gmail.com>	2020-03-27 11:47:41 +00:00
Adrian-George Bostan	d961079c5d	Add basic image rendering support (#266 ) * Add render package * Add text state * Add more text operators * Remove unnecessary files * Add text font * Add custom text render method * Improve text rendering method * Rename text state methods * Refactor and document context interface * Refact text begin/end operators * Fix graphics state transformations * Keep original font when doing font substitution * Take page cropbox into account * Revert to substitution font if original font measurement is 0 * Add font substitution package * Implement addition transform.Point methods * Use transform.Point in the image context package * Remove unneeded functionality from the render image package * Fix golint notices in the image rendering package * Fix go vet notices in the render package * Fix golint notices in the top-level render package * Improve render context package documentation * Document context text state struct. * Document context text font struct. * Minor logging improvements * Add license disclaimer to the render package files * Avoid using package aliases where possible * Change style of section comments * Adapt render package import style to follow the developer guide * Improve documentation for the internal matrix implementation * Update render package dependency versions * Apply crop box post render * Account for offseted media boxes * Improve metrics of rendered characters * Fix text matrix translation * Change priority of fonts used for measuring rendered characters * Skip invalid m and l operators on image rendering * Small fix for v operator * Fix rendered characters spacing issues * Refactor naming of internal render packages	2020-03-02 21:22:54 +00:00
Peter Williams	e056c0e4d4	Fixed PdfColorspaceSpecialIndexed.ImageToRGB() (#259 ) * Fixed PdfColorspaceSpecialIndexed.ImageToRGB() Fixes https://github.com/unidoc/unipdf/issues/258 * Fixed indexed colorspace bounds checking. * Being super cautious to prevent a divide by zero error. I don't think the base cs can have <1 cpts. * Updated image hash in extract_images_test.go to match new indexed colorspace code. * add testfile from unipdf#258	2020-02-26 13:26:20 +00:00
Adrian-George Bostan	9de5fe644e	Add PdfFont text encoding methods (#257 ) * Add PdfFont method for encoding runes to charcode bytes * Add getter method for CMap nbits * Take CMap nbits into account when encoding text * Adapt font test cases to include text encoding testing	2020-02-17 22:54:20 +00:00
Adrian-George Bostan	e2b3c6e6ba	Add predefined CMaps for Type 0 composite fonts (#246 ) * Add packed predefined cmaps * Add cmap cid range parsing * Load base cmap for predefined cmaps * Refactor pdfFont to Unicode methods * Preserve CharcodeBytesToUnicode behavior * Add support for CID-keyed Type 0 fonts * Add method documentation for the cmap package * Refactor and document charcode to Unicode conversion code * Add more cmap parsing test cases * Add more method documentation in the cmap package. * Remove unused code from the bcmaps package * Improve cmap test case * Assume identity when encoder is missing on regenerating field appearance * Add missing encoder log message * Add inverse CMap mappings * Add CMap encoder * Address golint notices and small fix in the cmap package * Keep smaller charcodes when generating cmap inverse mappings * Update extractor test case * Keep latest supplement charcodes/CIDs when computing inverse mappings * Fix comment typo	2020-02-07 19:56:30 +00:00
Samuel Stauffer	5f19bfa269	Address comments on PR	2020-01-06 11:13:16 -08:00
Samuel Stauffer	e85397b57a	Unify and optimize number parsing	2020-01-06 11:05:42 -08:00
Adrian-George Bostan	23aec77478	Add basic support for UTF-16 text encodings (#203 ) * Add UTF-16 text encoder	2019-11-28 00:47:00 +00:00
Adrian-George Bostan	56e81d3a1a	Take decode arrays into account when processing grayscale images (#159 ) * Take decode arrays into account when processing grayscale images * Adapt image extraction test case hashes * Minor refactoring in the ColorAt image method * Always return vanilla data from the jbig2 decoder	2019-08-30 19:16:23 +00:00
Jacek Kucharczyk	24648f4481	Issue #144 Fix - JBIG2 - Changed integer variables types (#148 ) * Fixing platform indepenedent integer size * Cleared test logs. * Cleared unnecessary int32 * Defined precise integer size for jbig2 segments.	2019-08-29 19:12:18 +00:00
Adrian-George Bostan	febf633172	Image memory optimizations (#149 ) * Add ColorAt method for images * Avoid resample on image to Go image conversion * Avoid resample when converting grayscale image to RGB * Preserve old behavior of image to Go image conversion * Add missing case in the ToGoImage method * Fix grayscale to RGB image conversion * Improve code documentation * Fix color extraction for CMYK and 4 bit RGB * Add test case for the ColorAt image method * Avoid resampling when converting CMYK image to RGB * Add notice comment for the GetSamples/SetSamples image methods	2019-08-22 20:15:16 +00:00
Adrian-George Bostan	cca04199e6	Add extract images test case, with memory profiling (#146 ) * Add extract images test case, with memory profiling * Use TotalAlloc insted of Alloc for memory profiling * Remove calls to debug.FreeOSMemory from test cases	2019-08-19 22:37:16 +00:00
Peter Williams	9ebcfcf168	Finding bounding boxes of substrings of extracted text. (#109 ) * Added text bounding box extraction. * Add `font` field to textMark struct; Create a new method `TextComponents` to retrieve all the text components of the extracted text in the page, with position and character informations * Reorganizing extractor/text.go * Added a text extraction position test. * Added another text extraction location test. * Text extraction location testing. * Added tests for text extraction with location information. * Cleaned up text extraction tests. No changes to functionality. * Simplifying text extraction code. * Simplified line construction in text.go * Returning TextMark's in TextMarkArray which are based on PdfObjectArray but read-only, so not pointers. * Added text extraction to show PDFs marked-up with bounding boxes of substring in extracted text. * Add comments explaining how to calculate text bounding boxes. * Made text_test.go naming consistent with function comments in text.go * Use tm, pt, tl for textMark/TextMark PageText and TextLine receivers and local variables. * uncommeted text stress test. Use go test --short to skip * TextMark.Offset is now an index into the extracted text. It was an index into []rune(text)	2019-07-18 06:41:47 +00:00
Jacek Kucharczyk	4b1c345214	JBIG2 decoder benchmark patch	2019-07-16 15:40:22 +00:00
Jacek Kucharczyk	e85616cec2	JBIG2Decoder implementation (#67 ) * Prepared skeleton and basic component implementations for the jbig2 encoding. * Added Bitset. Implemented Bitmap. * Decoder with old Arithmetic Decoder * Partly working arithmetic * Working arithmetic decoder. * MMR patched. * rebuild to apache. * Working generic * Decoded full document * Decoded AnnexH document * Minor issues fixed. * Update README.md * Fixed generic region errors. Added benchmark. Added bitmap unpadder. Added Bitmap toImage method. * Fixed endofpage error * Added integration test. * Decoded all test files without errors. Implemented JBIG2Global. * Merged with v3 version * Fixed the EOF in the globals issue * Fixed the JBIG2 ChocolateData Decode * JBIG2 Added license information * Minor fix in jbig2 encoding. * Applied the logging convention * Cleaned unnecessary imports * Go modules clear unused imports * checked out the README.md * Moved trace to Debug. Fixed the build integrate tag in the document_decode_test.go * Applied UniPDF Developer Guide. Fixed lint issues. * Cleared documentation, fixed style issues. * Added jbig2 doc.go files. Applied unipdf guide style. * Minor code style changes. * Minor naming and style issues fixes. * Minor naming changes. Style issues fixed. * Review r11 fixes. * Integrate jbig2 tests with build system * Added jbig2 integration test golden files. * Minor jbig2 integration test fix * Removed jbig2 integration image assertions * Fixed jbig2 rowstride issue. Implemented jbig2 bit writer * Changed golden files logic. Fixes r13 issues.	2019-07-14 21:18:40 +00:00
Adrian-George Bostan	d8dcc051b3	Fix annotation flatten when AcroForm does not exist (#93 ) * Fix annotation flatten when AcroForm does not exist. * Adapt test case file hashes to account for file flattening	2019-06-25 19:29:03 +00:00
Gunnsteinn Hall	7a9a8ff542	Add FDF merge test case for form filling and flattening with change detection (#98 ) Manually verified that output PDFs look good and leave hash check to detect change. If there is a change in the future, the hash change will trigger a failure upon which the output PDFs need to be re-checked and hashes updated if appropriate.	2019-06-25 08:08:51 +00:00
Adrian-George Bostan	8425bf7c8f	Update page resources Font dictionary when applying license information (#5 ) * Make PdfObjectDictionary Merge method chainable * Update page resources Font dictionary when applying license information * Add license font to the page resources only when it does not exist * Update hash for split test after verification	2019-05-30 10:52:05 +00:00
Adrian-George Bostan	c64812093d	Remmove pdf folder and move packages up one level (#2 )	2019-05-16 20:44:51 +00:00

29 Commits