unipdf

mirror of https://github.com/unidoc/unipdf.git synced 2025-04-26 13:48:55 +08:00

Author	SHA1	Message	Date
UniDoc Build	79e32364de	prepare release	2020-11-11 18:48:37 +00:00
UniDoc Build	22540b937c	prepare release	2020-10-19 10:58:10 +00:00
UniDoc Build	56a210342e	prepare release	2020-10-12 14:17:59 +00:00
UniDoc Build	87cbc66cbd	prepare release	2020-10-05 19:28:24 +00:00
UniDoc Build	22ca2c0eed	prepare release	2020-09-28 23:18:17 +00:00
UniDoc Build	9107a86674	prepare release	2020-09-21 01:20:10 +00:00
UniDoc Build	b991a36456	prepare release	2020-09-14 09:32:45 +00:00
UniDoc Build	fd3b669a36	prepare release	2020-09-07 00:23:12 +00:00
UniDoc Build	61b6580cb9	prepare release	2020-08-31 21:12:07 +00:00
UniDoc Build	1501d07a74	prepare release	2020-08-27 21:45:09 +00:00
Peter Williams	88fda44e0a	Text extraction code for columns. (#366 ) * Fixed filename:page in logging * Got CMap working for multi-rune entries * Treat CMap entries as strings instead of runes to handle multi-byte encodings. * Added a test for multibyte encoding. * First version of text extraction that recognizes columns * Added an expanation of the text columns code to README.md. * fixed typos * Abstracted textWord depth calculation. This required change textMark to textMark in a lot of code. Added function comments. * Fixed text state save/restore. * Adjusted inter-word search distance to make paragrah division work for thanh.pdf * Got text_test.go passing. * Reinstated hyphen suppression * Handle more cases of fonts not being set in text extraction code. * Fixed typo * More verbose logging * Adding tables to text extractor. * Added tests for columns extraction. * Removed commented code * Check for textParas that are on the same line when writing out extracted text. * Absorb text to the left of paras into paras e.g. Footnote numbers * Removed funny character from text_test.go * Commented out a creator_test.go test that was broken by my text extraction changes. * Big changes to columns text extraction code for PR. Performance improvements in several places. Commented code. * Updated extractor/README * Cleaned up some comments and removed a panic * Increased threshold for truncating extracted text when there is no license 100 -> 102. This is a workaround to let a test in creator_test.go pass. With the old text extraction code the following extracted text was 100 chars. With the new code it is 102 chars which looks correct. "你好\n你好你好你好你好\n河上白云\n\nUnlicensed UniDoc - Get a license on https://unidoc.io\n\n" * Improved an error message. * Removed irrelevant spaces * Commented code and removed unused functions. * Reverted PdfRectangle changes * Added duplicate text detection. * Combine diacritic textMarks in text extraction * Reinstated a diacritic recombination test. * Small code reorganisation * Reinstated handling of rotated text * Addressed issues in PR review * Added color fields to TextMark * Updated README * Reinstated the disabled tests I missed before. * Tightened definition for tables to prevent detection of tables where there weren't any. * Compute line splitting search range based on fontsize of first word in word bag. * Use errors.Is(err, core.ErrNotSupported) to distinguish unsupported font errorrs. See https://blog.golang.org/go1.13-errors * Fixed some naming and added some comments. * errors.Is -> xerrors.Is and %w -> %v for go 1.12 compatibility * Removed code that doesn't ever get called. * Removed unused test	2020-06-30 19:33:10 +00:00
Adrian-George Bostan	54e965785b	Add cached Stream method for CMap objects (#382 ) * Add cached Stream method for CMaps * Use CMap Stream method when creating font PDF dictionary objects	2020-06-27 00:30:18 +00:00
Adrian-George Bostan	7bf2f62c3b	Skip referenced pages which are not present in the catalog (#377 ) * Skip referenced pages which are not present in the catalog * Improve documentation for the copyObject method of the writer * Add creator test case for checking referenced page destinations	2020-06-18 15:06:06 +00:00
Gunnsteinn Hall	11f692bc3a	Font subsetting and font optimization improvements (#362 ) * Track runes in IdentityEncoder (for subsetting), track decoded runes * Working with the identity encoder in font_composite.go * Add GetFilterArray to multi encoder. Add comments. * Add NewFromContents constructor to extractor only requiring contents and resources * golint fixes * Optimizer compress streams - improved detection of raw streams * Optimize - CleanContentStream optimizer that removes redundant operands * WIP Optimize - clean fonts Will support both font file reduction and subsetting. (WIP) * Optimize - image processing - try combined DCT and Flate * Update options.go * Update optimizer.go * Create utils.go for optimize with common methods needed for optimization * Optimizer - add font subsetting method Covers XObject Forms, annotaitons etc. Uses extractor package to extract text marks covering what fonts and glyphs are used. Package truetype used for subsetting. * Add some comments * Fix cmap parsing rune conversion * Error checking for extractor. Add some comments. * Update Jenkinsfile * Update modules	2020-06-16 21:19:10 +00:00
Adrian-George Bostan	99ef1b861d	Combo field appearance (#370 ) * Fix combo field appearances not being shown * Fix V object type for choice and button fields * Refactor form fill for combo and checkbox fields * Add fill test case for text, combo and checkbox fields * Prevent panic when flattening forms using a nil appearance generator	2020-06-10 16:58:00 +00:00
Adrian-George Bostan	6b8d5c42f7	Fix outline null object check (#367 )	2020-06-05 11:46:55 +00:00
Peter Williams	5777ee1394	Handle multibyte entries in CMaps. (#353 ) * Fixed filename:page in logging * Got CMap working for multi-rune entries * Treat CMap entries as strings instead of runes to handle multi-byte encodings. * Added a test for multibyte encoding. * Changed rune->CharCode maps to string->CharCode. * Removed unintentional changes. * Updated comments to match new function definitions. * Changed some []rune APIs to string * Fixes for reviewer comments.	2020-06-03 13:55:15 +00:00
Adrian-George Bostan	5efaa02e23	Use page indirect object for internal outline destinations (#359 ) * Use page indirect object for internal outlines * Use page indirect object in creator outline destinations * Adapt creator test case to test outline creation and retrieval	2020-05-22 16:19:43 +00:00
Adrian-George Bostan	d2941b5477	Add reader method for checking if the AcroForm needs repair (#356 ) * Add AcroFormNeeds repair method * Add AcroForm repair check test case	2020-05-20 16:04:02 +00:00
Adrian-George Bostan	80d51c5532	Add reader AcroForm repair functionality (#351 ) * Add method for retrieving widget parent form field * Add reader method for repairing AcroForm * Add AcroForm repair test case * Add AcroForm repair options * RepairAcroForm documentation improvements	2020-05-19 12:42:07 +00:00
Gunnsteinn Hall	ad2a1e9c9d	Subsetting fixes (#346 ) * Update unitype lib which improves subsetting * Add text extraction check to creator font subsetting example Helps ensure ToUnicode map is set correctly. * Clean up import * Fix spelling	2020-05-12 07:15:09 +00:00
Adrian-George Bostan	aef6e5e976	Fix CMap generation and serialization for composite fonts (#344 ) * Fix CMap charcode mapping serialization * Improve CMap generation in the NewCompositePdfFontFromTTF function	2020-05-08 00:15:09 +00:00
Gunnsteinn Hall	9ef2f27694	Support for subsetting fonts (#335 ) * Subsetting of TrueType CID fonts using unitype * Simplify call to SubsetRegistered so can be done right after loading font via creator finalizer * Add an EnableFontSubsetting function on the creator to simplify font subsetting for creator users	2020-05-05 00:17:27 +00:00
Adrian-George Bostan	d84d0c4375	Form fill fixes (#328 ) * Parse form fields with embedded widget annotations * Try matching fields both by partial and full names on form fill * Use default font if widget font is not found when generating appearance * Add JSON extract and fill test case	2020-04-24 16:48:06 +00:00
Adrian-George Bostan	cb0166e96b	Add low level PageLabels support (#325 ) * Add reader method for retriving the PageLabels entry from the catalog * Add writer method for setting the PageLabels entry in the catalog. * Add creator method for adding page labels for the output file * Add creator page labels test case * Minor page labels test case correction	2020-04-22 21:17:33 +00:00
Alexey Pavlyukov	a69d788171	Add timestamp signature handler (#301 ) * Add timestamp signature handler * Add timestamp signature handler test * fix PR issues * fix PR issues * fix PR issues * Fix Co-authored-by: Gunnsteinn Hall <gunnsteinn.hall@gmail.com>	2020-04-22 20:21:53 +00:00
Alfred Hall	bc5c0d95d3	Merge pull request #320 from gunnsth/dev-writer-error-handling Fix error handling in Writer	2020-04-18 17:13:40 +00:00
Gunnsteinn Hall	6308fc8014	Fix error handling in write, with a testcase.	2020-04-18 13:48:44 +00:00
Gunnsteinn Hall	fa5f13501b	Fixes	2020-04-18 11:12:26 +00:00
Gunnsteinn Hall	d23d4b8c79	Add NewCompositePdfFontFromTTF to load composite TTF from memory	2020-04-18 10:37:10 +00:00
Adrian-George Bostan	a351532cd3	Prevent Type 0 function evaluation crash (#309 )	2020-04-15 21:05:20 +00:00
Adrian-George Bostan	ff79a9b1bd	Prevent recursion when building invalid outline tree (#308 )	2020-04-15 19:33:36 +00:00
Adrian-George Bostan	d605803bd2	Prevent panics (#305 ) * Remove panic on font nil Differences array * Remove unused bcmaps function * Remove panics from the core/security/crypt package * Fix extractor invalid Do operand crash * Fix TTF parser crash for invalid hhea number of hMetrics * Remove ECB crypt panics * Remove standard_r6 panics * Remove panic from render package	2020-04-14 21:09:16 +00:00
Jacek Kucharczyk	ad0b31ea1b	Optimizer fix for the CCITTFax Encoder. ISS #243 . Fixes JBIG2 i386 architecture compile issue. (#297 ) * Fixed issue #243. Added optimize integration tests. * Minor style change. * XObjImage getParamsDict updates Columns and Rows. * Added doc file for the optimize/tests package. * UpdateParams for CCITTFax Encoder accepts Width and Height also. Removed GetParamsDict Columns and Rows parameters from model.Image and model.XObjImage. * Fix i386 issue for the jbig2 arithmetic encoder. * Added 386 architecture to the .travis/cross_build.sh	2020-04-08 11:11:49 +00:00
Jacek Kucharczyk	29efa30439	JBIG2 Encoder support for inserting binary images into PDF (#288 ) * Added JBIG2 PDF support * Added JBIG2 Encoder binary image requirements * PR #288 revision r1 fixes * PR #288 revision r2 fixes	2020-04-03 20:54:59 +00:00
Adrian-George Bostan	64a43b38d2	Prevent crashing when processing content stream (#291 ) * Skip invalid pop operation on empty graphics state stacks * Fix clipping input values to size for Type 0 Functions * Do not pass invalid Q content stream operator to external handlers	2020-04-01 20:08:41 +00:00
Adrian-George Bostan	edba514087	Use NRGBA when loading model.Image instances from Go images	2020-03-26 21:47:00 +02:00
Adrian-George Bostan	1d46fb4cc6	Parse ttf encoding subtable 31 after subtable 10 (#273 )	2020-03-07 13:08:30 +00:00
Gunnsteinn Hall	937669cfed	Add basic glyph metrics support for Type 0 CID fonts (#272 ) * Add basic glyph metrics support for Type 0 CID fonts * Initialize font widths map if no W array is present	2020-03-05 18:47:16 +00:00
Peter Williams	e056c0e4d4	Fixed PdfColorspaceSpecialIndexed.ImageToRGB() (#259 ) * Fixed PdfColorspaceSpecialIndexed.ImageToRGB() Fixes https://github.com/unidoc/unipdf/issues/258 * Fixed indexed colorspace bounds checking. * Being super cautious to prevent a divide by zero error. I don't think the base cs can have <1 cpts. * Updated image hash in extract_images_test.go to match new indexed colorspace code. * add testfile from unipdf#258	2020-02-26 13:26:20 +00:00
Adrian-George Bostan	9de5fe644e	Add PdfFont text encoding methods (#257 ) * Add PdfFont method for encoding runes to charcode bytes * Add getter method for CMap nbits * Take CMap nbits into account when encoding text * Adapt font test cases to include text encoding testing	2020-02-17 22:54:20 +00:00
Adrian-George Bostan	e2b3c6e6ba	Add predefined CMaps for Type 0 composite fonts (#246 ) * Add packed predefined cmaps * Add cmap cid range parsing * Load base cmap for predefined cmaps * Refactor pdfFont to Unicode methods * Preserve CharcodeBytesToUnicode behavior * Add support for CID-keyed Type 0 fonts * Add method documentation for the cmap package * Refactor and document charcode to Unicode conversion code * Add more cmap parsing test cases * Add more method documentation in the cmap package. * Remove unused code from the bcmaps package * Improve cmap test case * Assume identity when encoder is missing on regenerating field appearance * Add missing encoder log message * Add inverse CMap mappings * Add CMap encoder * Address golint notices and small fix in the cmap package * Keep smaller charcodes when generating cmap inverse mappings * Update extractor test case * Keep latest supplement charcodes/CIDs when computing inverse mappings * Fix comment typo	2020-02-07 19:56:30 +00:00
Gunnsteinn Hall	81e3e14eb9	Merge pull request #242 from unidoc/master Master into development	2020-01-30 22:24:56 +00:00
Adrian-George Bostan	3bd083475d	Minor refactoring	2020-01-21 22:18:11 +02:00
Adrian-George Bostan	692ead8496	Improve outline destination parsing	2020-01-21 22:11:20 +02:00
Samuel Stauffer	d3a160ba41	Follow object indirections in PdfPage.GetMediaBox	2020-01-17 14:35:46 -08:00
Adrian-George Bostan	7c5d52cca5	Add outline test case	2020-01-16 20:52:59 +02:00
Adrian-George Bostan	029d4c34d8	Rename NewOutlineFromReaderOutline to GetOutlines and move it in the reader	2020-01-16 19:51:34 +02:00
Adrian-George Bostan	84dd2d145a	Add ToOutlineTree method for outline conversion	2020-01-16 19:31:54 +02:00
Adrian-George Bostan	dbd9e96abc	Fix method comment typo	2020-01-15 23:36:07 +02:00

1 2

82 Commits