14 Commits

Author SHA1 Message Date
Samuel Stauffer
5f19bfa269 Address comments on PR 2020-01-06 11:13:16 -08:00
Samuel Stauffer
e85397b57a Unify and optimize number parsing 2020-01-06 11:05:42 -08:00
Adrian-George Bostan
23aec77478 Add basic support for UTF-16 text encodings (#203)
* Add UTF-16 text encoder
2019-11-28 00:47:00 +00:00
Adrian-George Bostan
56e81d3a1a Take decode arrays into account when processing grayscale images (#159)
* Take decode arrays into account when processing grayscale images
* Adapt image extraction test case hashes
* Minor refactoring in the ColorAt image method
* Always return vanilla data from the jbig2 decoder
2019-08-30 19:16:23 +00:00
Jacek Kucharczyk
24648f4481 Issue #144 Fix - JBIG2 - Changed integer variables types (#148)
* Fixing platform indepenedent integer size
* Cleared test logs.
* Cleared unnecessary int32
* Defined precise integer size for jbig2 segments.
2019-08-29 19:12:18 +00:00
Adrian-George Bostan
febf633172 Image memory optimizations (#149)
* Add ColorAt method for images
* Avoid resample on image to Go image conversion
* Avoid resample when converting grayscale image to RGB
* Preserve old behavior of image to Go image conversion
* Add missing case in the ToGoImage method
* Fix grayscale to RGB image conversion
* Improve code documentation
* Fix color extraction for CMYK and 4 bit RGB
* Add test case for the ColorAt image method
* Avoid resampling when converting CMYK image to RGB
* Add notice comment for the GetSamples/SetSamples image methods
2019-08-22 20:15:16 +00:00
Adrian-George Bostan
cca04199e6 Add extract images test case, with memory profiling (#146)
* Add extract images test case, with memory profiling
* Use TotalAlloc insted of Alloc for memory profiling
* Remove calls to debug.FreeOSMemory from test cases
2019-08-19 22:37:16 +00:00
Peter Williams
9ebcfcf168 Finding bounding boxes of substrings of extracted text. (#109)
* Added text bounding box extraction.
* Add `font` field to textMark struct;
Create a new method `TextComponents` to retrieve all the text components of the extracted text in the page, with position and character informations
* Reorganizing extractor/text.go
* Added a text extraction position test.
* Added another text extraction location test.
* Text extraction location testing.
* Added tests for text extraction with location information.
* Cleaned up text extraction tests. No changes to functionality.
* Simplifying text extraction code.
* Simplified line construction in text.go
* Returning TextMark's in TextMarkArray which are based on PdfObjectArray but read-only, so not pointers.
* Added text extraction to show PDFs marked-up with bounding boxes of substring in extracted text.
* Add comments explaining how to calculate text bounding boxes.
* Made text_test.go naming consistent with function comments in text.go
* Use tm, pt, tl for textMark/TextMark PageText and TextLine receivers and local variables.
* uncommeted text stress test. Use go test --short to skip
* TextMark.Offset is now an index into the extracted text. It was an index into []rune(text)
2019-07-18 06:41:47 +00:00
Jacek Kucharczyk
4b1c345214 JBIG2 decoder benchmark patch 2019-07-16 15:40:22 +00:00
Jacek Kucharczyk
e85616cec2 JBIG2Decoder implementation (#67)
* Prepared skeleton and basic component implementations for the jbig2 encoding.
* Added Bitset. Implemented Bitmap.
* Decoder with old Arithmetic Decoder
* Partly working arithmetic
* Working arithmetic decoder.
* MMR patched.
* rebuild to apache.
* Working generic
* Decoded full document
* Decoded AnnexH document
* Minor issues fixed.
* Update README.md
* Fixed generic region errors. Added benchmark. Added bitmap unpadder. Added Bitmap toImage method.
* Fixed endofpage error
* Added integration test.
* Decoded all test files without errors. Implemented JBIG2Global.
* Merged with v3 version
* Fixed the EOF in the globals issue
* Fixed the JBIG2 ChocolateData Decode
* JBIG2 Added license information
* Minor fix in jbig2 encoding.
* Applied the logging convention
* Cleaned unnecessary imports
* Go modules clear unused imports
* checked out the README.md
* Moved trace to Debug. Fixed the build integrate tag in the document_decode_test.go
* Applied UniPDF Developer Guide. Fixed lint issues.
* Cleared documentation, fixed style issues.
* Added jbig2 doc.go files. Applied unipdf guide style.
* Minor code style changes.
* Minor naming and style issues fixes.
* Minor naming changes. Style issues fixed.
* Review r11 fixes.
* Integrate jbig2 tests with build system
* Added jbig2 integration test golden files.
* Minor jbig2 integration test fix
* Removed jbig2 integration image assertions
* Fixed jbig2 rowstride issue. Implemented jbig2 bit writer
* Changed golden files logic. Fixes r13 issues.
2019-07-14 21:18:40 +00:00
Adrian-George Bostan
d8dcc051b3 Fix annotation flatten when AcroForm does not exist (#93)
* Fix annotation flatten when AcroForm does not exist.
* Adapt test case file hashes to account for file flattening
2019-06-25 19:29:03 +00:00
Gunnsteinn Hall
7a9a8ff542
Add FDF merge test case for form filling and flattening with change detection (#98)
Manually verified that output PDFs look good and leave hash check to detect change. If there is a change in the future, the hash change will trigger a failure upon which the output PDFs need to be re-checked and hashes updated if appropriate.
2019-06-25 08:08:51 +00:00
Adrian-George Bostan
8425bf7c8f Update page resources Font dictionary when applying license information (#5)
* Make PdfObjectDictionary Merge method chainable
* Update page resources Font dictionary when applying license information
* Add license font to the page resources only when it does not exist
* Update hash for split test after verification
2019-05-30 10:52:05 +00:00
Adrian-George Bostan
c64812093d Remmove pdf folder and move packages up one level (#2) 2019-05-16 20:44:51 +00:00