46 Commits

Author SHA1 Message Date
Denys Smirnov
ac7696693b fonts: describe few issues with the code; remove unused cmap type 2019-01-01 17:19:58 +02:00
Denys Smirnov
53687f854e Merge remote-tracking branch 'origin/v3' into extract.text
# Conflicts:
#	pdf/contentstream/processor.go
#	pdf/extractor/text.go
#	pdf/extractor/utils.go
#	pdf/internal/textencoding/winansi.go
#	pdf/model/font.go
#	pdf/model/font_composite.go
#	pdf/model/font_simple.go
#	pdf/model/font_test.go
#	pdf/model/fontfile.go
#	pdf/model/fonts/ttfparser.go
#	pdf/model/structures.go
2018-12-27 12:17:28 +02:00
Denys Smirnov
11081b20c5 fonts: clarify cid to gid mapping 2018-12-15 18:47:39 +05:00
Denys Smirnov
2274cbdf8c fonts: add a function to make a text encoder from ttf font 2018-12-15 18:47:39 +05:00
Denys Smirnov
0a8b46daff don't use generic receiver names; make sure receiver name is consistent 2018-12-09 21:47:15 +02:00
Denys Smirnov
9f0df8945d don't use XXX for TODOs 2018-12-09 21:39:11 +02:00
Denys Smirnov
6d2c39043c make sure comments begin with a type/function name 2018-12-09 20:22:33 +02:00
Denys Smirnov
99f3184879 define slices with a var instead of an empty literal 2018-12-09 19:28:50 +02:00
Denys Smirnov
7cdbb0c572 Merge remote-tracking branch 'origin/v3' into extract.text
# Conflicts:
#	pdf/internal/textencoding/truetype.go
#	pdf/model/font.go
#	pdf/model/font_composite.go
#	pdf/model/font_simple.go
#	pdf/model/font_test.go
#	pdf/model/fonts/ttfparser.go
2018-12-07 18:30:37 +02:00
Peter Williams
835f329c28 Merge branch 'extract.text' of https://github.com/peterwilliams97/unidoc into extract.text 2018-12-02 10:02:16 +11:00
Peter Williams
9c258551ad Documented font code. Fall back to StandardEncoding when no encoding is speficied for a font. 2018-12-02 09:14:58 +11:00
Gunnsteinn Hall
2b1c796a74 Addressing review comments 2018-11-30 23:01:04 +00:00
Gunnsteinn Hall
33843599f2 Another round of addressing review comments 2018-11-30 16:53:48 +00:00
Denys Smirnov
fb4a087a93 textencoding: introduce GlyphName type 2018-11-29 23:24:40 +02:00
Denys Smirnov
7c8d88185c fonts: assert type of another map; add some comments 2018-11-29 04:30:37 +02:00
Denys Smirnov
46d22eac31 fonts: introduce types for GIDs and char codes; fix shadowing bug 2018-11-29 04:19:29 +02:00
Denys Smirnov
ab62ff5060 fonts: specify rune type as a key for Chars and runeToWidth 2018-11-29 04:19:29 +02:00
Denys Smirnov
6c0fd1e780 cmap: mapped values are runes, not strings 2018-11-29 04:19:29 +02:00
Peter Williams
92e3e455c2 Merge branch 'v3' of https://github.com/unidoc/unidoc into extract 2018-11-22 22:03:26 +11:00
Peter Williams
8b964f2008 Set font even when Tf operator is not between BT and ET. 2018-11-21 13:14:11 +11:00
Peter Williams
cad144cec3 Handle missing widths in text extraction 2018-11-20 15:49:28 +11:00
Denys Smirnov
86a30df78c fonts: floats should be signed 2018-11-17 15:03:34 +01:00
Denys Smirnov
c8c7a03896 fonts: fix glyph id bounds check 2018-11-07 22:09:57 +02:00
Denys Smirnov
08c1fe4ed4 fonts: remove unused field 2018-11-07 22:09:57 +02:00
Peter Williams
3da4ffc5aa Merge 2018-11-01 21:33:51 +11:00
Peter Williams
5e8ca9c18c Fixed code->glyph mapping for TrueType fonts for raw number gid 2018-10-29 09:08:32 +11:00
Gunnsteinn Hall
aea91f1ba9 Merge branch 'v3' into v3-enhance-forms 2018-09-29 16:59:16 +00:00
Peter Williams
f953c11452 Don't return errors for TrueType font file tables with no PostScript entry in their "name" table.
This is needed for PDFs created with Tesseract.
2018-09-24 18:02:02 +10:00
Peter Williams
b0f5329425 Allow TrueType font files to not have PostScript entries in their "name" table. 2018-09-24 17:53:12 +10:00
Peter Williams
69be54d501 Cleaned up some comments. 2018-09-21 16:43:10 +10:00
Peter Williams
b18c8ca93d Add ToUnicode map when embedding Type0 CIDType2 fonts in PDF files. 2018-09-17 17:57:52 +10:00
Peter Williams
b7f1f3e291 Merge branch 'v3' of https://github.com/unidoc/unidoc into render.v3.hungarian 2018-08-22 22:01:00 +10:00
Peter Williams
c2feafdfdc Fixed some issues in creator code
Stopped double converting from Go strings to PDF encoded strings
Added TTF parse table format 12
2018-08-17 08:41:35 +10:00
Peter Williams
d64785a8ca Added more font tests 2018-08-14 21:28:57 +10:00
Gunnsteinn Hall
7bac3c779c Merge branch 'v3' into enhance-forms 2018-08-03 21:15:21 +00:00
Gunnsteinn Hall
6c34f32c7f Updating headers and package descriptions 2018-08-03 10:15:42 +00:00
Peter Williams
08c3211590 Refactored simple textencoding
Made GlyphToCode work for all tables
Moved more aliases into glyphAliases rather than leaving the duplicates in the base maps.
Use SimpleEncoder explictly for simple fonts
2018-07-31 11:52:24 +10:00
Peter Williams
b1cf3494f7 Removed naked returns. Fixed godoc. Reorganized object extractors 2018-07-25 12:00:49 +10:00
Peter Williams
e886846c6a Changes after pull request review 2018-07-24 21:32:02 +10:00
Peter Williams
879b07df16 Added a test for CharcodeBytesToUnicode for Type0 ToUnicode cmaps 2018-07-19 10:28:23 +10:00
Peter Williams
6582182078 reduced differences with compositefont branch 2018-07-15 16:28:56 +10:00
Peter Williams
ae87dc79f3 keep going when FontFile2 encoding is empty 2018-07-13 21:15:03 +10:00
Peter Williams
bc1e9ae7b5 Refactored font code to improve text extraction 2018-07-13 17:40:27 +10:00
Peter Williams
199a74dbd8 Major changes to font code
- Added Type1 font parsing.
- Added Standard 14 font parsing.
- Fixed some bugs in cmap code.
- Started re-structuring of font code. Moved common font fields to `fontSkeleton`
2018-06-27 12:25:59 +10:00
Gunnsteinn Hall
646329ff21 Initial support for composite fonts (Type0 and CIDFontType2).
Simplified creator paragraph handling of text encoding.
Character codes expanded to 16bit instead of 8bit.
2017-09-01 13:20:51 +00:00
Gunnsteinn Hall
1a5c3eb4ac Initial import of PDF creator with text, image adding capabilities 2017-07-05 23:10:57 +00:00