unipdf

mirror of https://github.com/unidoc/unipdf.git synced 2025-04-26 13:48:55 +08:00

History

Peter Williams 9ebcfcf168 Finding bounding boxes of substrings of extracted text. (#109 )

* Added text bounding box extraction.
* Add `font` field to textMark struct;
Create a new method `TextComponents` to retrieve all the text components of the extracted text in the page, with position and character informations
* Reorganizing extractor/text.go
* Added a text extraction position test.
* Added another text extraction location test.
* Text extraction location testing.
* Added tests for text extraction with location information.
* Cleaned up text extraction tests. No changes to functionality.
* Simplifying text extraction code.
* Simplified line construction in text.go
* Returning TextMark's in TextMarkArray which are based on PdfObjectArray but read-only, so not pointers.
* Added text extraction to show PDFs marked-up with bounding boxes of substring in extracted text.
* Add comments explaining how to calculate text bounding boxes.
* Made text_test.go naming consistent with function comments in text.go
* Use tm, pt, tl for textMark/TextMark PageText and TextLine receivers and local variables.
* uncommeted text stress test. Use go test --short to skip
* TextMark.Offset is now an index into the extracted text. It was an index into []rune(text)

2019-07-18 06:41:47 +00:00

testdata

Remmove pdf folder and move packages up one level (#2 )

2019-05-16 20:44:51 +00:00

const.go

Remmove pdf folder and move packages up one level (#2 )

2019-05-16 20:44:51 +00:00

doc.go

Remmove pdf folder and move packages up one level (#2 )

2019-05-16 20:44:51 +00:00

extractor.go

Remmove pdf folder and move packages up one level (#2 )