mirror of
https://codeberg.org/SeedDMS/paperless
synced 2025-02-06 06:55:28 +00:00
more documentation on similar document search
This commit is contained in:
parent
72471b96e3
commit
16a3083f33
14
README.md
14
README.md
|
@ -93,4 +93,16 @@ The root folder can be set in the configuration or can be the user's home folder
|
|||
|
||||
There is some experimental support for searching for similar documents. This
|
||||
is done by extracting the most frequent words from the content and using them
|
||||
to query for documents.
|
||||
to issue a second query with this list of words. Since this list of most frequent
|
||||
words can be very long it will be reduced. For a word to qualify for the
|
||||
query
|
||||
|
||||
* it must be longer than 4 characters
|
||||
* have a frequency greater 2
|
||||
|
||||
If less than five words meet these conditions, the list will be filled up with
|
||||
subsequent words from the most frequent word list. If the than executed query
|
||||
doesn't yield a result the list will be diminished again word by word until the
|
||||
search succeeds or the query is empty.
|
||||
|
||||
|
||||
|
|
Loading…
Reference in New Issue
Block a user