82 Commits

Author SHA1 Message Date
Peter Williams
dc6f50aa93 Improvements to text extraction. 2018-09-20 11:49:44 +10:00
Peter Williams
19c2f84827 simplification 2018-09-18 12:18:04 +10:00
Peter Williams
39a91920e9 Merge branch 'render.v3.hungarian' into extract 2018-09-18 11:18:47 +10:00
Peter Williams
38da971f78 Cache PdfFont's in text extractor 2018-09-17 12:12:06 +10:00
Peter Williams
4d5156c4a0 Added NewStandard14FontMustCompile 2018-09-07 19:11:58 +10:00
Peter Williams
7f409fe4dc Cleaned up some comments 2018-09-03 16:38:58 +10:00
Peter Williams
8ff8665149 First attempt at extraction based on a full PDF text parser. 2018-08-22 12:29:34 +10:00
Peter Williams
84a4e0ebbf Removed GetArrayVal 2018-07-25 13:19:09 +10:00
Peter Williams
b1cf3494f7 Removed naked returns. Fixed godoc. Reorganized object extractors 2018-07-25 12:00:49 +10:00
Peter Williams
e886846c6a Changes after pull request review 2018-07-24 21:32:02 +10:00
Peter Williams
502836666d Merge remote-tracking branch 'upstream/v3' into render.v3 2018-07-21 21:20:39 +10:00
Gunnsteinn Hall
5b1b9bd504 PdfObjectArray change to struct and receivers added Elements, Get, Set, Len. Fixes to resulting broken code. 2018-07-15 17:52:53 +00:00
Peter Williams
3310b040db Don't import core anonymously 2018-07-15 17:22:00 +10:00
Peter Williams
8de07690ff allow change of text state outside BT..ET 2018-07-15 16:45:47 +10:00
Peter Williams
6582182078 reduced differences with compositefont branch 2018-07-15 16:28:56 +10:00
Gunnsteinn Hall
14ee80e1fe Preserve and allow output of hexadecimal strings
Refactored PdfObjectString into a struct with bool flag for hex.  Fixed any code broken by the change.
Unexported non-essential functions for crypto (not used by model).  Can unexport more later or refactor to internal package.
2018-07-14 02:25:29 +00:00
Peter Williams
bc1e9ae7b5 Refactored font code to improve text extraction 2018-07-13 17:40:27 +10:00
Peter Williams
c9f2b87def Added NewStandard14Font() to make existing fonts.Font code work with *PdfFont 2018-07-07 09:45:55 +10:00
Peter Williams
606a271d00 Show font object number in font string for debugging 2018-07-05 09:58:25 +10:00
Peter Williams
49674d6b63 Changed error handling. Allow partial encoding maps. Don't continue processing unsupported fonts 2018-07-04 18:00:37 +10:00
Peter Williams
33079bbb72 Parse FontFile entry in FontDescriptor 2018-07-03 14:26:42 +10:00
Peter Williams
d6bd8e3326 first attempt at parsing FontFile 2018-07-02 16:46:43 +10:00
Peter Williams
9de46c5b9f Noted that text extractor is an intermediate version 2018-06-28 11:11:43 +10:00
Peter Williams
4cc6c14a8e Fall back to font encoding when ToUnicode doesn't match 2018-06-27 22:01:17 +10:00
Peter Williams
2dcf8e0cdd Added more missing changes 2018-06-27 16:59:35 +10:00
Peter Williams
759a1dd882 changes left out of last commit 2018-06-27 16:46:33 +10:00
Peter Williams
d184031903 Updated the text extractor to use the new font code 2018-06-27 16:31:28 +10:00
Peter Williams
199a74dbd8 Major changes to font code
- Added Type1 font parsing.
- Added Standard 14 font parsing.
- Fixed some bugs in cmap code.
- Started re-structuring of font code. Moved common font fields to `fontSkeleton`
2018-06-27 12:25:59 +10:00
Gunnsteinn Hall
a4fe3bded2 Add LICENSE.md with reference to AGPL and Commercial license. Add license header info to code. 2018-03-22 14:03:47 +00:00
Gunnsteinn Hall
d5396dd893 Fixes in extractor testing 2018-03-22 13:53:12 +00:00
Gunnsteinn Hall
4af19b929a License handling in extractor 2018-03-22 13:17:09 +00:00
Gunnsteinn Hall
817ea404b9 Extractor package with powerful text extraction capabilities and CMap handling. Closes #17 2018-03-22 13:01:04 +00:00