1257 Commits

Author SHA1 Message Date
Denys Smirnov
bf2afd3409 textencoding: merge differences if applied twice 2019-01-06 19:20:35 +02:00
Emir Ribić
fb50cd2c0d
Update invoice_test.go 2019-01-06 16:50:22 +01:00
Denys Smirnov
e740aba6c5 textencoding: fix a PDF output for simple encodings; fix #293 2019-01-06 13:51:00 +02:00
Gunnsteinn Hall
098019ac2c
Merge pull request #281 from dennwc/encodings
Refactor encodings
2019-01-05 17:27:04 +00:00
Gunnsteinn Hall
bf47fc5b6e
Merge branch 'v3' into encodings 2019-01-05 17:18:16 +00:00
Denys Smirnov
4a376ec651 textencoding: define WinAnsi directly instead of using CP1252 2019-01-05 18:32:53 +02:00
Gunnsteinn Hall
bc1005af71
Merge pull request #287 from peterwilliams97/text.fixes
A few small fixes for v3
2019-01-05 12:09:54 +00:00
Peter Williams
72c7fd37d0 (*pageText). -> pageText. 2019-01-05 14:10:54 +11:00
Peter Williams
6b1764c118 (*pt). -> pt. 2019-01-05 09:14:10 +11:00
Peter Williams
4aa7e5051e Changes missed in previous commit. 2019-01-04 16:07:03 +11:00
Peter Williams
e251b6b2f2 Made TextList an opaque struct and renamed it to PageText to reflect its purpose rather than its current implementation. 2019-01-04 16:02:22 +11:00
Peter Williams
4cb130c31f Fixed some typos. 2019-01-03 15:41:36 +11:00
Peter Williams
a493fce496 Merge branch 'v3' of https://github.com/unidoc/unidoc into text.fixes 2019-01-03 15:16:38 +11:00
Gunnsteinn Hall
e4802f56a2
Merge pull request #288 from dennwc/ttf
Read TTF font data once
2019-01-02 17:01:45 +00:00
Gunnsteinn Hall
a792826218
Merge branch 'v3' into ttf 2019-01-02 16:46:11 +00:00
Denys Smirnov
aeea76f4dd fonts: read ttf font data once 2019-01-02 17:18:43 +02:00
Denys Smirnov
0fe2f0a27a textencoding: alias x/text/transform import to avoid confusion 2019-01-02 17:03:03 +02:00
Denys Smirnov
203b620067 textencoding: init other encodings once and reformat tables 2019-01-02 16:54:37 +02:00
Gunnsteinn Hall
305ce84569 Add codecov to Jenkinsfile for test coverage reports 2019-01-02 14:30:09 +00:00
Peter Williams
2f2b5c6ec1 Made many fields text.go private. 2019-01-02 10:39:30 +11:00
Denys Smirnov
0327d18eb6 textencoding: remove all unrelated methods from the interface 2019-01-01 23:24:11 +02:00
Denys Smirnov
7a2cd35f48 fonts: rebuild font metrics tables based on runes for standard fonts 2019-01-01 22:40:11 +02:00
Denys Smirnov
2e820f3ac5 textencoding: remove unused rune <-> glyph methods from the interface 2019-01-01 22:15:22 +02:00
Denys Smirnov
1742cb9c89 textencoding: drop old simpleEncoder, use the new implementation 2019-01-01 21:17:57 +02:00
Denys Smirnov
3c5fc18b01 textencoding: refactor encodings; better handling for differences 2019-01-01 17:20:01 +02:00
Denys Smirnov
622ae5668d textencoding: generate table for WinAnsi encoding from CP1252 2019-01-01 17:20:01 +02:00
Denys Smirnov
ac7696693b fonts: describe few issues with the code; remove unused cmap type 2019-01-01 17:19:58 +02:00
Peter Williams
57e6b41ef1 Merge branch 'v3' of https://github.com/unidoc/unidoc into text.fixes 2019-01-01 17:34:04 +11:00
Peter Williams
aaf47e1479 Font reading code return partial font info for unsupported fonts.
This allows calling code to check font types which is useful for giving information about PDF files.
2019-01-01 17:29:49 +11:00
Peter Williams
ca2b73bd7a Removed combineDiacritics from text extraction because it was causing ' and " to be combined with the letters proceeding them.
Need to fix this and reinstate combineDiacritics.
2019-01-01 12:22:39 +11:00
Denys Smirnov
83d8086657 model: reformat TODOs 2018-12-28 16:48:38 +02:00
Gunnsteinn Hall
e1f2286f9c
Merge pull request #279 from dennwc/runes
Get metrics by rune instead of a glyph name
2018-12-28 13:09:51 +00:00
Gunnsteinn Hall
99b944b64e
Merge branch 'v3' into runes 2018-12-28 12:41:43 +00:00
Gunnsteinn Hall
84607f9914
Merge pull request #278 from unidoc/v3-update-jenkinsfile
Require extractor private testdata in builds
2018-12-28 12:41:20 +00:00
Denys Smirnov
f6506204d7 fonts: simplify code by getting width of runes in font instead of glyphs 2018-12-28 01:38:48 +02:00
Denys Smirnov
107718c711 fonts: comment about Wy font metric 2018-12-28 01:08:50 +02:00
Denys Smirnov
eb04b2d594 fonts: remove unused name field in char metrics 2018-12-28 01:08:47 +02:00
Denys Smirnov
87ebf6af8f creator: don't use fmt if not needed 2018-12-28 01:03:15 +02:00
Gunnsteinn Hall
12af4cf62a Jenkinsfile: Require extractor tests with private testdata in build 2018-12-27 22:47:39 +00:00
Gunnsteinn Hall
15b9123536
Merge pull request #256 from peterwilliams97/extract.text
Text extraction
2018-12-27 17:55:46 +00:00
Gunnsteinn Hall
99a19b0b8d remove duplicate log 2018-12-27 17:42:12 +00:00
Gunnsteinn Hall
8f031e7bdb remove panic in extractor 2018-12-27 17:18:52 +00:00
Denys Smirnov
dbbef4fd05 Merge remote-tracking branch 'peterwilliams97/extract.text' into extract.text
# Conflicts:
#	pdf/extractor/text.go
2018-12-27 12:40:55 +02:00
Denys Smirnov
8835230856 model: fix tests after the merge 2018-12-27 12:37:32 +02:00
Peter Williams
c70b66a00d Fixed incorrectly named variable. 2018-12-27 21:33:31 +11:00
Denys Smirnov
53687f854e Merge remote-tracking branch 'origin/v3' into extract.text
# Conflicts:
#	pdf/contentstream/processor.go
#	pdf/extractor/text.go
#	pdf/extractor/utils.go
#	pdf/internal/textencoding/winansi.go
#	pdf/model/font.go
#	pdf/model/font_composite.go
#	pdf/model/font_simple.go
#	pdf/model/font_test.go
#	pdf/model/fontfile.go
#	pdf/model/fonts/ttfparser.go
#	pdf/model/structures.go
2018-12-27 12:17:28 +02:00
Peter Williams
2fe54a4269 Merge branch 'extract.text' of https://github.com/peterwilliams97/unidoc into extract.text 2018-12-27 20:53:59 +11:00
Peter Williams
28957d37b8 fixed comment 2018-12-27 20:53:37 +11:00
Peter Williams
af99ee41db Recurse through form XObjects for text extractions. 2018-12-27 20:51:34 +11:00
Denys Smirnov
e729fa618d model: refactor CharcodesToUnicode to return string and remove TODO 2018-12-26 17:11:41 +02:00