unipdf

mirror of https://github.com/unidoc/unipdf.git synced 2025-04-29 13:48:54 +08:00

Author	SHA1	Message	Date
Peter Williams	aea4cb1d55	Make PageText.sortPosition() sort order deterministic. (#153 )	2019-08-29 18:26:53 +00:00
Gunnsteinn Hall	21141a9d3e	Add Append to TextMarkArray Useful when processing and grouping text marks.	2019-08-04 09:29:21 +00:00
Gunnsteinn Hall	1d7b969b91	Simplify license loading and support environment variables	2019-08-04 09:28:42 +00:00
Peter Williams	9ebcfcf168	Finding bounding boxes of substrings of extracted text. (#109 ) * Added text bounding box extraction. * Add `font` field to textMark struct; Create a new method `TextComponents` to retrieve all the text components of the extracted text in the page, with position and character informations * Reorganizing extractor/text.go * Added a text extraction position test. * Added another text extraction location test. * Text extraction location testing. * Added tests for text extraction with location information. * Cleaned up text extraction tests. No changes to functionality. * Simplifying text extraction code. * Simplified line construction in text.go * Returning TextMark's in TextMarkArray which are based on PdfObjectArray but read-only, so not pointers. * Added text extraction to show PDFs marked-up with bounding boxes of substring in extracted text. * Add comments explaining how to calculate text bounding boxes. * Made text_test.go naming consistent with function comments in text.go * Use tm, pt, tl for textMark/TextMark PageText and TextLine receivers and local variables. * uncommeted text stress test. Use go test --short to skip * TextMark.Offset is now an index into the extracted text. It was an index into []rune(text)	2019-07-18 06:41:47 +00:00
Adrian-George Bostan	c64812093d	Remmove pdf folder and move packages up one level (#2 )	2019-05-16 20:44:51 +00:00

5 Commits