Peter Williams
|
5933a3dd81
|
Added duplicate text detection.
|
2020-06-23 15:33:34 +10:00 |
|
Peter Williams
|
acb5caaf6c
|
Big changes to columns text extraction code for PR.
Performance improvements in several places.
Commented code.
|
2020-06-22 17:49:19 +10:00 |
|
Peter Williams
|
b4d90b6402
|
Absorb text to the left of paras into paras e.g. Footnote numbers
|
2020-06-05 21:43:09 +10:00 |
|
Peter Williams
|
30fc953954
|
Check for textParas that are on the same line when writing out extracted text.
|
2020-06-05 15:44:31 +10:00 |
|
Peter Williams
|
af9508cc5c
|
Added tests for columns extraction.
|
2020-06-05 14:01:31 +10:00 |
|
Peter Williams
|
29f2d9b8cf
|
Merge branch 'development' of https://github.com/unidoc/unipdf into columns
|
2020-06-05 11:43:04 +10:00 |
|
Peter Williams
|
40806d7f96
|
Adding tables to text extractor.
|
2020-06-01 14:04:32 +10:00 |
|
Peter Williams
|
49bbef0442
|
More verbose logging
|
2020-05-29 08:58:23 +10:00 |
|
Peter Williams
|
418f859d44
|
Reinstated hyphen suppression
|
2020-05-27 21:11:47 +10:00 |
|
Peter Williams
|
d21e2f83c4
|
Got text_test.go passing.
|
2020-05-27 18:15:18 +10:00 |
|
Peter Williams
|
fad1552009
|
Fixed text state save/restore.
|
2020-05-26 13:26:09 +10:00 |
|
Peter Williams
|
c515472849
|
Abstracted textWord depth calculation. This required change textMark to *textMark in a lot of code.
|
2020-05-25 09:39:30 +10:00 |
|
Peter Williams
|
6b13a99b82
|
First version of text extraction that recognizes columns
|
2020-05-24 21:00:37 +10:00 |
|