Gunnsteinn Hall
|
ee1416433c
|
Use StandardEncoding for builtin standard fonts (not WinAnsiEncoding). Fix testcases.
Add test cases and fix the encoding table also based on observed errors
|
2019-03-28 09:57:33 +00:00 |
|
Gunnsteinn Hall
|
9665959bcf
|
Move model/fonts to model/internal/fonts - reducing export surface
- Move the folder
- Update imports
- Add type aliases to access needed types from model (fonts.StdFont, fonts.CharMetrics and the font names)
|
2019-03-12 19:08:37 +00:00 |
|
Peter Williams
|
ca2b73bd7a
|
Removed combineDiacritics from text extraction because it was causing ' and " to be combined with the letters proceeding them.
Need to fix this and reinstate combineDiacritics.
|
2019-01-01 12:22:39 +11:00 |
|
Denys Smirnov
|
53687f854e
|
Merge remote-tracking branch 'origin/v3' into extract.text
# Conflicts:
# pdf/contentstream/processor.go
# pdf/extractor/text.go
# pdf/extractor/utils.go
# pdf/internal/textencoding/winansi.go
# pdf/model/font.go
# pdf/model/font_composite.go
# pdf/model/font_simple.go
# pdf/model/font_test.go
# pdf/model/fontfile.go
# pdf/model/fonts/ttfparser.go
# pdf/model/structures.go
|
2018-12-27 12:17:28 +02:00 |
|
Gunnsteinn Hall
|
f04f83b271
|
Merge branch 'extract.text' of https://github.com/peterwilliams97/unidoc into v3-peterwilliams97-extract.text
|
2018-11-28 23:33:31 +00:00 |
|
Gunnsteinn Hall
|
520ab09a72
|
Addressing review comments
|
2018-11-28 23:25:17 +00:00 |
|
Peter Williams
|
36a1148962
|
Combine diacritics in text extraction.
|
2018-11-28 18:06:03 +11:00 |
|
Peter Williams
|
536c688001
|
Fixed orientation handling in text extraction.
|
2018-11-26 17:17:17 +11:00 |
|
Peter Williams
|
a815ca7271
|
Premultiply coordinate transforms to text matrix in text extraction.
|
2018-11-26 08:09:52 +11:00 |
|
Peter Williams
|
8b964f2008
|
Set font even when Tf operator is not between BT and ET.
|
2018-11-21 13:14:11 +11:00 |
|
Peter Williams
|
dcb2b14d55
|
Handle standard 14 TrueType fonts and stanard 14 font aliases in text extraction.
|
2018-11-20 17:49:37 +11:00 |
|
Peter Williams
|
cad144cec3
|
Handle missing widths in text extraction
|
2018-11-20 15:49:28 +11:00 |
|
Peter Williams
|
a9019a50a3
|
Fixes for text extraction corpus testing.
- Correct matrix multiplication order in text.go
- Look up standard 14 font widths after applying custom encoding.
|
2018-11-18 17:21:30 +11:00 |
|
Peter Williams
|
851aa267b1
|
Added test for position based text extraction
|
2018-11-12 11:04:09 +11:00 |
|
Peter Williams
|
a2342ec6c6
|
First attempt at getting font metrics by character code.
|
2018-11-08 15:20:12 +11:00 |
|
Peter Williams
|
a6ce81c001
|
Merge branch 'render.v3.hungarian' into extract
|
2018-11-02 15:13:48 +11:00 |
|
Peter Williams
|
3310b040db
|
Don't import core anonymously
|
2018-07-15 17:22:00 +10:00 |
|
Peter Williams
|
6582182078
|
reduced differences with compositefont branch
|
2018-07-15 16:28:56 +10:00 |
|
Peter Williams
|
c9f2b87def
|
Added NewStandard14Font() to make existing fonts.Font code work with *PdfFont
|
2018-07-07 09:45:55 +10:00 |
|
Peter Williams
|
d184031903
|
Updated the text extractor to use the new font code
|
2018-06-27 16:31:28 +10:00 |
|
Gunnsteinn Hall
|
a4fe3bded2
|
Add LICENSE.md with reference to AGPL and Commercial license. Add license header info to code.
|
2018-03-22 14:03:47 +00:00 |
|
Gunnsteinn Hall
|
d5396dd893
|
Fixes in extractor testing
|
2018-03-22 13:53:12 +00:00 |
|
Gunnsteinn Hall
|
817ea404b9
|
Extractor package with powerful text extraction capabilities and CMap handling. Closes #17
|
2018-03-22 13:01:04 +00:00 |
|