shellinford 0.3.1
Wavelet Matrix/Tree succinct data structure for full text search (using shellinford C++ library)
shellinford
Shellinford is an implementation of a Wavelet Matrix/Tree succinct data structure for document retrieval.
Based on shellinford C++ library.
Installation
$ pip install shellinford
Usage
Create a new FM-index instance
>>> import shellinford >>> fm = shellinford.FMIndex()
- shellinford.Shellinford([use_wavelet_tree=True, filename=None])
- When given a filename, Shellinford loads FM-index data from the file
Build FM-index
>>> fm.build(['Milky Holmes', 'Sherlock "Sheryl" Shellingford', 'Milky'], 'milky.fm')
- build([docs, filename])
- When given a filename, Shellinford stores FM-index data to the file
Search word from FM-index
>>> for doc in fm.search('Milky'):
>>> print 'doc_id:', doc.doc_id
>>> print 'count:', doc.count
>>> print 'text:', doc.text
doc_id: 0
count: 1
text: Milky Holmes
doc_id: 2
count: 1
text: Milky
>>> for doc in fm.search(['Milky', 'Holmes']):
>>> print 'doc_id:', doc.doc_id
>>> print 'count:', doc.count
>>> print 'text:', doc.text
doc_id: 1
count: 1
text: Milky Holmes
- search(query, [_or=False, ignores=[]])
- If _or = True, then “OR” search is executed, else “AND” search
- Given ignores, “NOT” search is also executed
- NOTE: The search function is available after FM-index is built or loaded
Add a document
>>> fm.push_back('Baritsu')
- push_back(doc)
- NOTE: A document added by this method is not available to search until build
Read FM-index from a binary file
>>> fm.read('milky_holmes.fm')
- read(path)
Write FM-index binary to a file
>>> fm.write('milky_holmes.fm')
- write(path)
License
- Wrapper code is licensed under the New BSD License.
- Bundled shellinford C++ library (c) 2012 echizen_tm is licensed under the New BSD License.
CHANGES
0.3 (2014-11-24)
- “OR” search and “NOT” search are available in FMIndex.search().
- FMIndex.size and FMIndex.docsize are available as property
0.2 (2014-03-28)
“AND” search is available by giving Sequence (list, tuple, etc.) FMIndex.search()
0.1 (2014-03-11)
First release.
| File | Type | Py Version | Uploaded on | Size | |
|---|---|---|---|---|---|
| shellinford-0.3.1.tar.gz (md5) | Source | 2014-11-23 | 63KB | ||
- Downloads (All Versions):
- 14 downloads in the last day
- 59 downloads in the last week
- 247 downloads in the last month
- Author: Yukino Ikegami
- Home Page: https://github.com/ikegami-yukino/shellinford-python
- Keywords: full text search,FM-index,Wavelet Matrix
-
Categories
- Development Status :: 3 - Alpha
- Intended Audience :: Developers
- Intended Audience :: Science/Research
- License :: OSI Approved :: BSD License
- Programming Language :: C++
- Programming Language :: Python :: 2.6
- Programming Language :: Python :: 2.7
- Programming Language :: Python :: 3.3
- Programming Language :: Python :: 3.4
- Topic :: Scientific/Engineering :: Information Analysis
- Topic :: Software Development :: Libraries :: Python Modules
- Topic :: Text Processing :: Indexing
- Topic :: Text Processing :: Linguistic
- Package Index Owner: yukino
- DOAP record: shellinford-0.3.1.xml