11 Commits

Author SHA1 Message Date
Peter Williams
5933a3dd81 Added duplicate text detection. 2020-06-23 15:33:34 +10:00
Peter Williams
17bee4d907 Commented code and removed unused functions. 2020-06-23 11:39:01 +10:00
Peter Williams
91479a7c2b Cleaned up some comments and removed a panic 2020-06-22 21:17:39 +10:00
Peter Williams
acb5caaf6c Big changes to columns text extraction code for PR.
Performance improvements in several places.
Commented code.
2020-06-22 17:49:19 +10:00
Peter Williams
b4d90b6402 Absorb text to the left of paras into paras e.g. Footnote numbers 2020-06-05 21:43:09 +10:00
Peter Williams
30fc953954 Check for textParas that are on the same line when writing out extracted text. 2020-06-05 15:44:31 +10:00
Peter Williams
29f2d9b8cf Merge branch 'development' of https://github.com/unidoc/unipdf into columns 2020-06-05 11:43:04 +10:00
Peter Williams
40806d7f96 Adding tables to text extractor. 2020-06-01 14:04:32 +10:00
Peter Williams
d21e2f83c4 Got text_test.go passing. 2020-05-27 18:15:18 +10:00
Peter Williams
603b5ff4e7 Added function comments. 2020-05-25 14:00:00 +10:00
Peter Williams
6b13a99b82 First version of text extraction that recognizes columns 2020-05-24 21:00:37 +10:00