41 Commits

Author SHA1 Message Date
Gunnsteinn Hall
cd158ec9f3 Make encoding consistent when encoding map is not unique
Pick the lower charcode
2019-03-28 09:57:33 +00:00
Denys Smirnov
bf2afd3409 textencoding: merge differences if applied twice 2019-01-06 19:20:35 +02:00
Denys Smirnov
e740aba6c5 textencoding: fix a PDF output for simple encodings; fix #293 2019-01-06 13:51:00 +02:00
Denys Smirnov
4a376ec651 textencoding: define WinAnsi directly instead of using CP1252 2019-01-05 18:32:53 +02:00
Denys Smirnov
0fe2f0a27a textencoding: alias x/text/transform import to avoid confusion 2019-01-02 17:03:03 +02:00
Denys Smirnov
203b620067 textencoding: init other encodings once and reformat tables 2019-01-02 16:54:37 +02:00
Denys Smirnov
0327d18eb6 textencoding: remove all unrelated methods from the interface 2019-01-01 23:24:11 +02:00
Denys Smirnov
2e820f3ac5 textencoding: remove unused rune <-> glyph methods from the interface 2019-01-01 22:15:22 +02:00
Denys Smirnov
1742cb9c89 textencoding: drop old simpleEncoder, use the new implementation 2019-01-01 21:17:57 +02:00
Denys Smirnov
3c5fc18b01 textencoding: refactor encodings; better handling for differences 2019-01-01 17:20:01 +02:00
Denys Smirnov
622ae5668d textencoding: generate table for WinAnsi encoding from CP1252 2019-01-01 17:20:01 +02:00
Denys Smirnov
ac7696693b fonts: describe few issues with the code; remove unused cmap type 2019-01-01 17:19:58 +02:00
Denys Smirnov
53687f854e Merge remote-tracking branch 'origin/v3' into extract.text
# Conflicts:
#	pdf/contentstream/processor.go
#	pdf/extractor/text.go
#	pdf/extractor/utils.go
#	pdf/internal/textencoding/winansi.go
#	pdf/model/font.go
#	pdf/model/font_composite.go
#	pdf/model/font_simple.go
#	pdf/model/font_test.go
#	pdf/model/fontfile.go
#	pdf/model/fonts/ttfparser.go
#	pdf/model/structures.go
2018-12-27 12:17:28 +02:00
Denys Smirnov
3687c83b37 errors should start with a lower case 2018-12-15 18:49:15 +05:00
Denys Smirnov
4c99e7a692 textencoding: remove unused error value when making winansi encoding 2018-12-15 18:47:39 +05:00
Denys Smirnov
9f0df8945d don't use XXX for TODOs 2018-12-09 21:39:11 +02:00
Denys Smirnov
e286eecac9 remove unused functions and globals; add todos for unused params 2018-12-09 19:37:07 +02:00
Denys Smirnov
99f3184879 define slices with a var instead of an empty literal 2018-12-09 19:28:50 +02:00
Denys Smirnov
2658fe9c06 assert types for the new code as well 2018-12-07 18:43:24 +02:00
Denys Smirnov
7cdbb0c572 Merge remote-tracking branch 'origin/v3' into extract.text
# Conflicts:
#	pdf/internal/textencoding/truetype.go
#	pdf/model/font.go
#	pdf/model/font_composite.go
#	pdf/model/font_simple.go
#	pdf/model/font_test.go
#	pdf/model/fonts/ttfparser.go
2018-12-07 18:30:37 +02:00
Denys Smirnov
4e24c0280a textencoding: rename variables and add relevant notes 2018-12-06 20:22:06 +02:00
Denys Smirnov
0436f2c974 validate shex length in cmaps; add comments 2018-11-29 23:43:00 +02:00
Denys Smirnov
fb4a087a93 textencoding: introduce GlyphName type 2018-11-29 23:24:40 +02:00
Denys Smirnov
e79be78aae textencoding: simplify the code of computeTables 2018-11-29 04:45:39 +02:00
Denys Smirnov
8a4c4069b7 textencoding: unexport CodeToGlyph field 2018-11-29 04:42:35 +02:00
Denys Smirnov
6fddd80eba textencoding: assert the type of differences map 2018-11-29 04:40:25 +02:00
Denys Smirnov
7c8d88185c fonts: assert type of another map; add some comments 2018-11-29 04:30:37 +02:00
Denys Smirnov
46d22eac31 fonts: introduce types for GIDs and char codes; fix shadowing bug 2018-11-29 04:19:29 +02:00
Denys Smirnov
ab62ff5060 fonts: specify rune type as a key for Chars and runeToWidth 2018-11-29 04:19:29 +02:00
Peter Williams
92e3e455c2 Merge branch 'v3' of https://github.com/unidoc/unidoc into extract 2018-11-22 22:03:26 +11:00
Peter Williams
6e5e32dd92 Fixed encoding selection for standard 14 fonts. 2018-11-22 22:01:04 +11:00
Peter Williams
a9019a50a3 Fixes for text extraction corpus testing.
- Correct matrix multiplication order in text.go
- Look up standard 14 font widths after applying custom encoding.
2018-11-18 17:21:30 +11:00
Peter Williams
851aa267b1 Added test for position based text extraction 2018-11-12 11:04:09 +11:00
Peter Williams
70e65eb941 Merge branch 'render.v3.hungarian' into extract
Treat æ, Æ as letters rather than ligatures.
2018-11-09 09:25:36 +11:00
Denys Smirnov
d06cbae6c5 textencoding: simplify code of IdentityEncoder 2018-11-08 02:33:48 +02:00
Denys Smirnov
991aa2727a textencoding: unify encoding functions 2018-11-08 02:33:48 +02:00
Peter Williams
3da4ffc5aa Merge 2018-11-01 21:33:51 +11:00
Gunnsteinn Hall
4e2e3defba Merge branch 'v3' into v3-enhance-forms 2018-10-23 12:09:01 +00:00
Gunnsteinn Hall
bc6391200a Avoid outputing invalid Encoding name for generated standard fonts (use font encoding instead) 2018-10-10 22:44:55 +00:00
Gunnsteinn Hall
aea91f1ba9 Merge branch 'v3' into v3-enhance-forms 2018-09-29 16:59:16 +00:00
Gunnsteinn Hall
7bac3c779c Merge branch 'v3' into enhance-forms 2018-08-03 21:15:21 +00:00