Denys Smirnov
|
bf2afd3409
|
textencoding: merge differences if applied twice
|
2019-01-06 19:20:35 +02:00 |
|
Emir Ribić
|
fb50cd2c0d
|
Update invoice_test.go
|
2019-01-06 16:50:22 +01:00 |
|
Denys Smirnov
|
e740aba6c5
|
textencoding: fix a PDF output for simple encodings; fix #293
|
2019-01-06 13:51:00 +02:00 |
|
Gunnsteinn Hall
|
098019ac2c
|
Merge pull request #281 from dennwc/encodings
Refactor encodings
|
2019-01-05 17:27:04 +00:00 |
|
Gunnsteinn Hall
|
bf47fc5b6e
|
Merge branch 'v3' into encodings
|
2019-01-05 17:18:16 +00:00 |
|
Denys Smirnov
|
4a376ec651
|
textencoding: define WinAnsi directly instead of using CP1252
|
2019-01-05 18:32:53 +02:00 |
|
Gunnsteinn Hall
|
bc1005af71
|
Merge pull request #287 from peterwilliams97/text.fixes
A few small fixes for v3
|
2019-01-05 12:09:54 +00:00 |
|
Peter Williams
|
72c7fd37d0
|
(*pageText). -> pageText.
|
2019-01-05 14:10:54 +11:00 |
|
Peter Williams
|
6b1764c118
|
(*pt). -> pt.
|
2019-01-05 09:14:10 +11:00 |
|
Peter Williams
|
4aa7e5051e
|
Changes missed in previous commit.
|
2019-01-04 16:07:03 +11:00 |
|
Peter Williams
|
e251b6b2f2
|
Made TextList an opaque struct and renamed it to PageText to reflect its purpose rather than its current implementation.
|
2019-01-04 16:02:22 +11:00 |
|
Peter Williams
|
4cb130c31f
|
Fixed some typos.
|
2019-01-03 15:41:36 +11:00 |
|
Peter Williams
|
a493fce496
|
Merge branch 'v3' of https://github.com/unidoc/unidoc into text.fixes
|
2019-01-03 15:16:38 +11:00 |
|
Gunnsteinn Hall
|
e4802f56a2
|
Merge pull request #288 from dennwc/ttf
Read TTF font data once
|
2019-01-02 17:01:45 +00:00 |
|
Gunnsteinn Hall
|
a792826218
|
Merge branch 'v3' into ttf
|
2019-01-02 16:46:11 +00:00 |
|
Denys Smirnov
|
aeea76f4dd
|
fonts: read ttf font data once
|
2019-01-02 17:18:43 +02:00 |
|
Denys Smirnov
|
0fe2f0a27a
|
textencoding: alias x/text/transform import to avoid confusion
|
2019-01-02 17:03:03 +02:00 |
|
Denys Smirnov
|
203b620067
|
textencoding: init other encodings once and reformat tables
|
2019-01-02 16:54:37 +02:00 |
|
Gunnsteinn Hall
|
305ce84569
|
Add codecov to Jenkinsfile for test coverage reports
|
2019-01-02 14:30:09 +00:00 |
|
Peter Williams
|
2f2b5c6ec1
|
Made many fields text.go private.
|
2019-01-02 10:39:30 +11:00 |
|
Denys Smirnov
|
0327d18eb6
|
textencoding: remove all unrelated methods from the interface
|
2019-01-01 23:24:11 +02:00 |
|
Denys Smirnov
|
7a2cd35f48
|
fonts: rebuild font metrics tables based on runes for standard fonts
|
2019-01-01 22:40:11 +02:00 |
|
Denys Smirnov
|
2e820f3ac5
|
textencoding: remove unused rune <-> glyph methods from the interface
|
2019-01-01 22:15:22 +02:00 |
|
Denys Smirnov
|
1742cb9c89
|
textencoding: drop old simpleEncoder, use the new implementation
|
2019-01-01 21:17:57 +02:00 |
|
Denys Smirnov
|
3c5fc18b01
|
textencoding: refactor encodings; better handling for differences
|
2019-01-01 17:20:01 +02:00 |
|
Denys Smirnov
|
622ae5668d
|
textencoding: generate table for WinAnsi encoding from CP1252
|
2019-01-01 17:20:01 +02:00 |
|
Denys Smirnov
|
ac7696693b
|
fonts: describe few issues with the code; remove unused cmap type
|
2019-01-01 17:19:58 +02:00 |
|
Peter Williams
|
57e6b41ef1
|
Merge branch 'v3' of https://github.com/unidoc/unidoc into text.fixes
|
2019-01-01 17:34:04 +11:00 |
|
Peter Williams
|
aaf47e1479
|
Font reading code return partial font info for unsupported fonts.
This allows calling code to check font types which is useful for giving information about PDF files.
|
2019-01-01 17:29:49 +11:00 |
|
Peter Williams
|
ca2b73bd7a
|
Removed combineDiacritics from text extraction because it was causing ' and " to be combined with the letters proceeding them.
Need to fix this and reinstate combineDiacritics.
|
2019-01-01 12:22:39 +11:00 |
|
Denys Smirnov
|
83d8086657
|
model: reformat TODOs
|
2018-12-28 16:48:38 +02:00 |
|
Gunnsteinn Hall
|
e1f2286f9c
|
Merge pull request #279 from dennwc/runes
Get metrics by rune instead of a glyph name
|
2018-12-28 13:09:51 +00:00 |
|
Gunnsteinn Hall
|
99b944b64e
|
Merge branch 'v3' into runes
|
2018-12-28 12:41:43 +00:00 |
|
Gunnsteinn Hall
|
84607f9914
|
Merge pull request #278 from unidoc/v3-update-jenkinsfile
Require extractor private testdata in builds
|
2018-12-28 12:41:20 +00:00 |
|
Denys Smirnov
|
f6506204d7
|
fonts: simplify code by getting width of runes in font instead of glyphs
|
2018-12-28 01:38:48 +02:00 |
|
Denys Smirnov
|
107718c711
|
fonts: comment about Wy font metric
|
2018-12-28 01:08:50 +02:00 |
|
Denys Smirnov
|
eb04b2d594
|
fonts: remove unused name field in char metrics
|
2018-12-28 01:08:47 +02:00 |
|
Denys Smirnov
|
87ebf6af8f
|
creator: don't use fmt if not needed
|
2018-12-28 01:03:15 +02:00 |
|
Gunnsteinn Hall
|
12af4cf62a
|
Jenkinsfile: Require extractor tests with private testdata in build
|
2018-12-27 22:47:39 +00:00 |
|
Gunnsteinn Hall
|
15b9123536
|
Merge pull request #256 from peterwilliams97/extract.text
Text extraction
|
2018-12-27 17:55:46 +00:00 |
|
Gunnsteinn Hall
|
99a19b0b8d
|
remove duplicate log
|
2018-12-27 17:42:12 +00:00 |
|
Gunnsteinn Hall
|
8f031e7bdb
|
remove panic in extractor
|
2018-12-27 17:18:52 +00:00 |
|
Denys Smirnov
|
dbbef4fd05
|
Merge remote-tracking branch 'peterwilliams97/extract.text' into extract.text
# Conflicts:
# pdf/extractor/text.go
|
2018-12-27 12:40:55 +02:00 |
|
Denys Smirnov
|
8835230856
|
model: fix tests after the merge
|
2018-12-27 12:37:32 +02:00 |
|
Peter Williams
|
c70b66a00d
|
Fixed incorrectly named variable.
|
2018-12-27 21:33:31 +11:00 |
|
Denys Smirnov
|
53687f854e
|
Merge remote-tracking branch 'origin/v3' into extract.text
# Conflicts:
# pdf/contentstream/processor.go
# pdf/extractor/text.go
# pdf/extractor/utils.go
# pdf/internal/textencoding/winansi.go
# pdf/model/font.go
# pdf/model/font_composite.go
# pdf/model/font_simple.go
# pdf/model/font_test.go
# pdf/model/fontfile.go
# pdf/model/fonts/ttfparser.go
# pdf/model/structures.go
|
2018-12-27 12:17:28 +02:00 |
|
Peter Williams
|
2fe54a4269
|
Merge branch 'extract.text' of https://github.com/peterwilliams97/unidoc into extract.text
|
2018-12-27 20:53:59 +11:00 |
|
Peter Williams
|
28957d37b8
|
fixed comment
|
2018-12-27 20:53:37 +11:00 |
|
Peter Williams
|
af99ee41db
|
Recurse through form XObjects for text extractions.
|
2018-12-27 20:51:34 +11:00 |
|
Denys Smirnov
|
e729fa618d
|
model: refactor CharcodesToUnicode to return string and remove TODO
|
2018-12-26 17:11:41 +02:00 |
|