Peter Williams
dc6f50aa93
Improvements to text extraction.
2018-09-20 11:49:44 +10:00
Peter Williams
19c2f84827
simplification
2018-09-18 12:18:04 +10:00
Peter Williams
39a91920e9
Merge branch 'render.v3.hungarian' into extract
2018-09-18 11:18:47 +10:00
Peter Williams
38da971f78
Cache PdfFont's in text extractor
2018-09-17 12:12:06 +10:00
Peter Williams
4d5156c4a0
Added NewStandard14FontMustCompile
2018-09-07 19:11:58 +10:00
Peter Williams
7f409fe4dc
Cleaned up some comments
2018-09-03 16:38:58 +10:00
Peter Williams
8ff8665149
First attempt at extraction based on a full PDF text parser.
2018-08-22 12:29:34 +10:00
Peter Williams
84a4e0ebbf
Removed GetArrayVal
2018-07-25 13:19:09 +10:00
Peter Williams
b1cf3494f7
Removed naked returns. Fixed godoc. Reorganized object extractors
2018-07-25 12:00:49 +10:00
Peter Williams
e886846c6a
Changes after pull request review
2018-07-24 21:32:02 +10:00
Peter Williams
502836666d
Merge remote-tracking branch 'upstream/v3' into render.v3
2018-07-21 21:20:39 +10:00
Gunnsteinn Hall
5b1b9bd504
PdfObjectArray change to struct and receivers added Elements, Get, Set, Len. Fixes to resulting broken code.
2018-07-15 17:52:53 +00:00
Peter Williams
3310b040db
Don't import core anonymously
2018-07-15 17:22:00 +10:00
Peter Williams
8de07690ff
allow change of text state outside BT..ET
2018-07-15 16:45:47 +10:00
Peter Williams
6582182078
reduced differences with compositefont branch
2018-07-15 16:28:56 +10:00
Gunnsteinn Hall
14ee80e1fe
Preserve and allow output of hexadecimal strings
...
Refactored PdfObjectString into a struct with bool flag for hex. Fixed any code broken by the change.
Unexported non-essential functions for crypto (not used by model). Can unexport more later or refactor to internal package.
2018-07-14 02:25:29 +00:00
Peter Williams
bc1e9ae7b5
Refactored font code to improve text extraction
2018-07-13 17:40:27 +10:00
Peter Williams
c9f2b87def
Added NewStandard14Font() to make existing fonts.Font code work with *PdfFont
2018-07-07 09:45:55 +10:00
Peter Williams
606a271d00
Show font object number in font string for debugging
2018-07-05 09:58:25 +10:00
Peter Williams
49674d6b63
Changed error handling. Allow partial encoding maps. Don't continue processing unsupported fonts
2018-07-04 18:00:37 +10:00
Peter Williams
33079bbb72
Parse FontFile entry in FontDescriptor
2018-07-03 14:26:42 +10:00
Peter Williams
d6bd8e3326
first attempt at parsing FontFile
2018-07-02 16:46:43 +10:00
Peter Williams
9de46c5b9f
Noted that text extractor is an intermediate version
2018-06-28 11:11:43 +10:00
Peter Williams
4cc6c14a8e
Fall back to font encoding when ToUnicode doesn't match
2018-06-27 22:01:17 +10:00
Peter Williams
2dcf8e0cdd
Added more missing changes
2018-06-27 16:59:35 +10:00
Peter Williams
759a1dd882
changes left out of last commit
2018-06-27 16:46:33 +10:00
Peter Williams
d184031903
Updated the text extractor to use the new font code
2018-06-27 16:31:28 +10:00
Peter Williams
199a74dbd8
Major changes to font code
...
- Added Type1 font parsing.
- Added Standard 14 font parsing.
- Fixed some bugs in cmap code.
- Started re-structuring of font code. Moved common font fields to `fontSkeleton`
2018-06-27 12:25:59 +10:00
Gunnsteinn Hall
a4fe3bded2
Add LICENSE.md with reference to AGPL and Commercial license. Add license header info to code.
2018-03-22 14:03:47 +00:00
Gunnsteinn Hall
d5396dd893
Fixes in extractor testing
2018-03-22 13:53:12 +00:00
Gunnsteinn Hall
4af19b929a
License handling in extractor
2018-03-22 13:17:09 +00:00
Gunnsteinn Hall
817ea404b9
Extractor package with powerful text extraction capabilities and CMap handling. Closes #17
2018-03-22 13:01:04 +00:00