962 Commits

Author SHA1 Message Date
Adrian-George Bostan
9b6cd8b88a Start at the top of the page when table block is created on new page 2019-01-07 19:22:08 +02:00
Gunnsteinn Hall
1085bf1a2e
Merge branch 'v3' into patch-1 2019-01-07 16:43:50 +00:00
Denys Smirnov
bf2afd3409 textencoding: merge differences if applied twice 2019-01-06 19:20:35 +02:00
Emir Ribić
fb50cd2c0d
Update invoice_test.go 2019-01-06 16:50:22 +01:00
Denys Smirnov
e740aba6c5 textencoding: fix a PDF output for simple encodings; fix #293 2019-01-06 13:51:00 +02:00
Gunnsteinn Hall
bf47fc5b6e
Merge branch 'v3' into encodings 2019-01-05 17:18:16 +00:00
Denys Smirnov
4a376ec651 textencoding: define WinAnsi directly instead of using CP1252 2019-01-05 18:32:53 +02:00
Peter Williams
72c7fd37d0 (*pageText). -> pageText. 2019-01-05 14:10:54 +11:00
Peter Williams
6b1764c118 (*pt). -> pt. 2019-01-05 09:14:10 +11:00
Peter Williams
4aa7e5051e Changes missed in previous commit. 2019-01-04 16:07:03 +11:00
Peter Williams
e251b6b2f2 Made TextList an opaque struct and renamed it to PageText to reflect its purpose rather than its current implementation. 2019-01-04 16:02:22 +11:00
Peter Williams
4cb130c31f Fixed some typos. 2019-01-03 15:41:36 +11:00
Peter Williams
a493fce496 Merge branch 'v3' of https://github.com/unidoc/unidoc into text.fixes 2019-01-03 15:16:38 +11:00
Denys Smirnov
aeea76f4dd fonts: read ttf font data once 2019-01-02 17:18:43 +02:00
Denys Smirnov
0fe2f0a27a textencoding: alias x/text/transform import to avoid confusion 2019-01-02 17:03:03 +02:00
Denys Smirnov
203b620067 textencoding: init other encodings once and reformat tables 2019-01-02 16:54:37 +02:00
Peter Williams
2f2b5c6ec1 Made many fields text.go private. 2019-01-02 10:39:30 +11:00
Denys Smirnov
0327d18eb6 textencoding: remove all unrelated methods from the interface 2019-01-01 23:24:11 +02:00
Denys Smirnov
7a2cd35f48 fonts: rebuild font metrics tables based on runes for standard fonts 2019-01-01 22:40:11 +02:00
Denys Smirnov
2e820f3ac5 textencoding: remove unused rune <-> glyph methods from the interface 2019-01-01 22:15:22 +02:00
Denys Smirnov
1742cb9c89 textencoding: drop old simpleEncoder, use the new implementation 2019-01-01 21:17:57 +02:00
Denys Smirnov
3c5fc18b01 textencoding: refactor encodings; better handling for differences 2019-01-01 17:20:01 +02:00
Denys Smirnov
622ae5668d textencoding: generate table for WinAnsi encoding from CP1252 2019-01-01 17:20:01 +02:00
Denys Smirnov
ac7696693b fonts: describe few issues with the code; remove unused cmap type 2019-01-01 17:19:58 +02:00
Peter Williams
57e6b41ef1 Merge branch 'v3' of https://github.com/unidoc/unidoc into text.fixes 2019-01-01 17:34:04 +11:00
Peter Williams
aaf47e1479 Font reading code return partial font info for unsupported fonts.
This allows calling code to check font types which is useful for giving information about PDF files.
2019-01-01 17:29:49 +11:00
Peter Williams
ca2b73bd7a Removed combineDiacritics from text extraction because it was causing ' and " to be combined with the letters proceeding them.
Need to fix this and reinstate combineDiacritics.
2019-01-01 12:22:39 +11:00
Denys Smirnov
83d8086657 model: reformat TODOs 2018-12-28 16:48:38 +02:00
Denys Smirnov
f6506204d7 fonts: simplify code by getting width of runes in font instead of glyphs 2018-12-28 01:38:48 +02:00
Denys Smirnov
107718c711 fonts: comment about Wy font metric 2018-12-28 01:08:50 +02:00
Denys Smirnov
eb04b2d594 fonts: remove unused name field in char metrics 2018-12-28 01:08:47 +02:00
Denys Smirnov
87ebf6af8f creator: don't use fmt if not needed 2018-12-28 01:03:15 +02:00
Gunnsteinn Hall
99a19b0b8d remove duplicate log 2018-12-27 17:42:12 +00:00
Gunnsteinn Hall
8f031e7bdb remove panic in extractor 2018-12-27 17:18:52 +00:00
Denys Smirnov
dbbef4fd05 Merge remote-tracking branch 'peterwilliams97/extract.text' into extract.text
# Conflicts:
#	pdf/extractor/text.go
2018-12-27 12:40:55 +02:00
Denys Smirnov
8835230856 model: fix tests after the merge 2018-12-27 12:37:32 +02:00
Peter Williams
c70b66a00d Fixed incorrectly named variable. 2018-12-27 21:33:31 +11:00
Denys Smirnov
53687f854e Merge remote-tracking branch 'origin/v3' into extract.text
# Conflicts:
#	pdf/contentstream/processor.go
#	pdf/extractor/text.go
#	pdf/extractor/utils.go
#	pdf/internal/textencoding/winansi.go
#	pdf/model/font.go
#	pdf/model/font_composite.go
#	pdf/model/font_simple.go
#	pdf/model/font_test.go
#	pdf/model/fontfile.go
#	pdf/model/fonts/ttfparser.go
#	pdf/model/structures.go
2018-12-27 12:17:28 +02:00
Peter Williams
2fe54a4269 Merge branch 'extract.text' of https://github.com/peterwilliams97/unidoc into extract.text 2018-12-27 20:53:59 +11:00
Peter Williams
28957d37b8 fixed comment 2018-12-27 20:53:37 +11:00
Peter Williams
af99ee41db Recurse through form XObjects for text extractions. 2018-12-27 20:51:34 +11:00
Denys Smirnov
e729fa618d model: refactor CharcodesToUnicode to return string and remove TODO 2018-12-26 17:11:41 +02:00
Denys Smirnov
db8e50e457 model: fix wording in the comments 2018-12-19 16:59:13 +05:00
Denys Smirnov
217f984033 fonts: make standard font names type-safe 2018-12-19 16:55:27 +05:00
Denys Smirnov
85e1a02ac8 model: define an unexported pdfFont interface and remove error cases 2018-12-19 13:54:45 +05:00
Denys Smirnov
7f667d8fbb model: remove Standard14Font in favor of fonts.StdFont; resolves #269 2018-12-19 13:43:09 +05:00
Denys Smirnov
5bf2527b57 creator: clarify use of the default encoding and a way to override it 2018-12-15 19:39:59 +05:00
Denys Smirnov
e3704defc7 rename Typ1 font to StdFont 2018-12-15 19:39:55 +05:00
Denys Smirnov
19f95527b8 creator: remove SetEncoder from top 2018-12-15 18:49:15 +05:00
Denys Smirnov
62420700db fix case typos in errors 2018-12-15 18:49:15 +05:00