1162 Commits

Author SHA1 Message Date
Peter Williams
aaf47e1479 Font reading code return partial font info for unsupported fonts.
This allows calling code to check font types which is useful for giving information about PDF files.
2019-01-01 17:29:49 +11:00
Peter Williams
ca2b73bd7a Removed combineDiacritics from text extraction because it was causing ' and " to be combined with the letters proceeding them.
Need to fix this and reinstate combineDiacritics.
2019-01-01 12:22:39 +11:00
Denys Smirnov
83d8086657 model: reformat TODOs 2018-12-28 16:48:38 +02:00
Gunnsteinn Hall
e1f2286f9c
Merge pull request #279 from dennwc/runes
Get metrics by rune instead of a glyph name
2018-12-28 13:09:51 +00:00
Gunnsteinn Hall
99b944b64e
Merge branch 'v3' into runes 2018-12-28 12:41:43 +00:00
Gunnsteinn Hall
84607f9914
Merge pull request #278 from unidoc/v3-update-jenkinsfile
Require extractor private testdata in builds
2018-12-28 12:41:20 +00:00
Denys Smirnov
f6506204d7 fonts: simplify code by getting width of runes in font instead of glyphs 2018-12-28 01:38:48 +02:00
Denys Smirnov
107718c711 fonts: comment about Wy font metric 2018-12-28 01:08:50 +02:00
Denys Smirnov
eb04b2d594 fonts: remove unused name field in char metrics 2018-12-28 01:08:47 +02:00
Denys Smirnov
87ebf6af8f creator: don't use fmt if not needed 2018-12-28 01:03:15 +02:00
Gunnsteinn Hall
12af4cf62a Jenkinsfile: Require extractor tests with private testdata in build 2018-12-27 22:47:39 +00:00
Gunnsteinn Hall
15b9123536
Merge pull request #256 from peterwilliams97/extract.text
Text extraction
2018-12-27 17:55:46 +00:00
Gunnsteinn Hall
99a19b0b8d remove duplicate log 2018-12-27 17:42:12 +00:00
Gunnsteinn Hall
8f031e7bdb remove panic in extractor 2018-12-27 17:18:52 +00:00
Denys Smirnov
dbbef4fd05 Merge remote-tracking branch 'peterwilliams97/extract.text' into extract.text
# Conflicts:
#	pdf/extractor/text.go
2018-12-27 12:40:55 +02:00
Denys Smirnov
8835230856 model: fix tests after the merge 2018-12-27 12:37:32 +02:00
Peter Williams
c70b66a00d Fixed incorrectly named variable. 2018-12-27 21:33:31 +11:00
Denys Smirnov
53687f854e Merge remote-tracking branch 'origin/v3' into extract.text
# Conflicts:
#	pdf/contentstream/processor.go
#	pdf/extractor/text.go
#	pdf/extractor/utils.go
#	pdf/internal/textencoding/winansi.go
#	pdf/model/font.go
#	pdf/model/font_composite.go
#	pdf/model/font_simple.go
#	pdf/model/font_test.go
#	pdf/model/fontfile.go
#	pdf/model/fonts/ttfparser.go
#	pdf/model/structures.go
2018-12-27 12:17:28 +02:00
Peter Williams
2fe54a4269 Merge branch 'extract.text' of https://github.com/peterwilliams97/unidoc into extract.text 2018-12-27 20:53:59 +11:00
Peter Williams
28957d37b8 fixed comment 2018-12-27 20:53:37 +11:00
Peter Williams
af99ee41db Recurse through form XObjects for text extractions. 2018-12-27 20:51:34 +11:00
Denys Smirnov
e729fa618d model: refactor CharcodesToUnicode to return string and remove TODO 2018-12-26 17:11:41 +02:00
Peter Williams
686a6e511e Merge branch 'v3-peterwilliams97-default-fontdescriptors' of https://github.com/unidoc/unidoc into extract.text 2018-12-21 16:32:33 +11:00
Gunnsteinn Hall
650dbf800c
Merge pull request #270 from dennwc/std14font
Replace Standard14Font with fonts.StdFont
2018-12-20 21:36:34 +00:00
Denys Smirnov
db8e50e457 model: fix wording in the comments 2018-12-19 16:59:13 +05:00
Denys Smirnov
217f984033 fonts: make standard font names type-safe 2018-12-19 16:55:27 +05:00
Denys Smirnov
85e1a02ac8 model: define an unexported pdfFont interface and remove error cases 2018-12-19 13:54:45 +05:00
Denys Smirnov
7f667d8fbb model: remove Standard14Font in favor of fonts.StdFont; resolves #269 2018-12-19 13:43:09 +05:00
Gunnsteinn Hall
2b718c9ba6
Merge pull request #260 from dennwc/font_interface
Preparations for a new font interface
2018-12-18 16:08:01 +00:00
Denys Smirnov
5bf2527b57 creator: clarify use of the default encoding and a way to override it 2018-12-15 19:39:59 +05:00
Denys Smirnov
e3704defc7 rename Typ1 font to StdFont 2018-12-15 19:39:55 +05:00
Denys Smirnov
19f95527b8 creator: remove SetEncoder from top 2018-12-15 18:49:15 +05:00
Denys Smirnov
62420700db fix case typos in errors 2018-12-15 18:49:15 +05:00
Denys Smirnov
3687c83b37 errors should start with a lower case 2018-12-15 18:49:15 +05:00
Denys Smirnov
4abbe49007 remove unnecessary encoder override; add todo to check other code paths 2018-12-15 18:47:39 +05:00
Denys Smirnov
d5a69b817c model: move CID font width array code to function and add a test case 2018-12-15 18:47:39 +05:00
Denys Smirnov
d3664d0f85 fonts: make metric tables for type1 fonts more compact by sharing glyphs 2018-12-15 18:47:39 +05:00
Denys Smirnov
3c8e70256d fonts: reuse metrics tables where possible 2018-12-15 18:47:39 +05:00
Denys Smirnov
0ef989c713 fonts: group similar fonts to a single file 2018-12-15 18:47:39 +05:00
Denys Smirnov
3b1a92701f fonts: remove redundant Type1 font interface implementations 2018-12-15 18:47:39 +05:00
Denys Smirnov
59f694d99f fonts: remove broken SetEncoder method for most fonts 2018-12-15 18:47:39 +05:00
Denys Smirnov
4c99e7a692 textencoding: remove unused error value when making winansi encoding 2018-12-15 18:47:39 +05:00
Denys Smirnov
81bb03763b font: discovered a bug in SetEncoder 2018-12-15 18:47:39 +05:00
Denys Smirnov
7b4564aec5 model: clarify the usage of width map and ttf text encoder 2018-12-15 18:47:39 +05:00
Denys Smirnov
11081b20c5 fonts: clarify cid to gid mapping 2018-12-15 18:47:39 +05:00
Denys Smirnov
e07fa3b2c0 model: add a reference to width table format and simplify the code 2018-12-15 18:47:39 +05:00
Denys Smirnov
7e2a987f8a model: remove unused font width index 2018-12-15 18:47:39 +05:00
Denys Smirnov
2274cbdf8c fonts: add a function to make a text encoder from ttf font 2018-12-15 18:47:39 +05:00
Gunnsteinn Hall
1eed6fa36f
Merge pull request #267 from dennwc/linter
Fix code style issues
2018-12-12 10:30:15 +00:00
Gunnsteinn Hall
1fe74f5116 Merge branch 'linter' of https://github.com/dennwc/unidoc into v3-dennwc-linter 2018-12-12 09:47:28 +00:00