22 Commits

Author SHA1 Message Date
Gunnsteinn Hall
9665959bcf Move model/fonts to model/internal/fonts - reducing export surface
- Move the folder
- Update imports
- Add type aliases to access needed types from model (fonts.StdFont, fonts.CharMetrics and the font names)
2019-03-12 19:08:37 +00:00
Peter Williams
ca2b73bd7a Removed combineDiacritics from text extraction because it was causing ' and " to be combined with the letters proceeding them.
Need to fix this and reinstate combineDiacritics.
2019-01-01 12:22:39 +11:00
Denys Smirnov
53687f854e Merge remote-tracking branch 'origin/v3' into extract.text
# Conflicts:
#	pdf/contentstream/processor.go
#	pdf/extractor/text.go
#	pdf/extractor/utils.go
#	pdf/internal/textencoding/winansi.go
#	pdf/model/font.go
#	pdf/model/font_composite.go
#	pdf/model/font_simple.go
#	pdf/model/font_test.go
#	pdf/model/fontfile.go
#	pdf/model/fonts/ttfparser.go
#	pdf/model/structures.go
2018-12-27 12:17:28 +02:00
Gunnsteinn Hall
f04f83b271 Merge branch 'extract.text' of https://github.com/peterwilliams97/unidoc into v3-peterwilliams97-extract.text 2018-11-28 23:33:31 +00:00
Gunnsteinn Hall
520ab09a72 Addressing review comments 2018-11-28 23:25:17 +00:00
Peter Williams
36a1148962 Combine diacritics in text extraction. 2018-11-28 18:06:03 +11:00
Peter Williams
536c688001 Fixed orientation handling in text extraction. 2018-11-26 17:17:17 +11:00
Peter Williams
a815ca7271 Premultiply coordinate transforms to text matrix in text extraction. 2018-11-26 08:09:52 +11:00
Peter Williams
8b964f2008 Set font even when Tf operator is not between BT and ET. 2018-11-21 13:14:11 +11:00
Peter Williams
dcb2b14d55 Handle standard 14 TrueType fonts and stanard 14 font aliases in text extraction. 2018-11-20 17:49:37 +11:00
Peter Williams
cad144cec3 Handle missing widths in text extraction 2018-11-20 15:49:28 +11:00
Peter Williams
a9019a50a3 Fixes for text extraction corpus testing.
- Correct matrix multiplication order in text.go
- Look up standard 14 font widths after applying custom encoding.
2018-11-18 17:21:30 +11:00
Peter Williams
851aa267b1 Added test for position based text extraction 2018-11-12 11:04:09 +11:00
Peter Williams
a2342ec6c6 First attempt at getting font metrics by character code. 2018-11-08 15:20:12 +11:00
Peter Williams
a6ce81c001 Merge branch 'render.v3.hungarian' into extract 2018-11-02 15:13:48 +11:00
Peter Williams
3310b040db Don't import core anonymously 2018-07-15 17:22:00 +10:00
Peter Williams
6582182078 reduced differences with compositefont branch 2018-07-15 16:28:56 +10:00
Peter Williams
c9f2b87def Added NewStandard14Font() to make existing fonts.Font code work with *PdfFont 2018-07-07 09:45:55 +10:00
Peter Williams
d184031903 Updated the text extractor to use the new font code 2018-06-27 16:31:28 +10:00
Gunnsteinn Hall
a4fe3bded2 Add LICENSE.md with reference to AGPL and Commercial license. Add license header info to code. 2018-03-22 14:03:47 +00:00
Gunnsteinn Hall
d5396dd893 Fixes in extractor testing 2018-03-22 13:53:12 +00:00
Gunnsteinn Hall
817ea404b9 Extractor package with powerful text extraction capabilities and CMap handling. Closes #17 2018-03-22 13:01:04 +00:00