Indexes and metadata¶
How to program your custom fields and data queriable through portal_catalog.
- What does indexing mean?
- Viewing indexes and indexed data
- Creating an index
- Creating an index through the web
- Adding index using add-on product installer
- Custom index methods
- Index types
- Default Plone indexes and metadata columns
- Full-text searching
Indexing is the action to make object data searchable. Plone stores available indexes in the database. You can create them through-the-web and inspect existing indexes in portal_catalog on Index tab.
The Catalog Tool can be configured through the ZMI or programatically in Python but current best practice in the CMF world is to use GenericSetup to configure it using the declarative catalog.xml file. The GenericSetup profile for Plone, for example, uses the CMFPlone/profiles/default/catalog.xml XML data file to configure the Catalog Tool when a Plone site is created. It is fairly readable so taking a quick look through it can be very informative.
When using a GenericSetup extension profile to customize the Catalog Tool in your portal, you only need to include XML for the pieces of the catalog you are changing. To add an index for the Archetypes location field, as in the example below, a policy package could include the following profiles/default/catalog.xml:
<?xml version="1.0"?> <object name="portal_catalog" meta_type="Plone Catalog Tool"> <index name="location" meta_type="FieldIndex"> <indexed_attr value="location"/> </index> </object>
The GenericSetup import handler for the Catalog Tool also supports removing indexes from the catalog if present using the "remove" attribute of the <index> element. To remove the "start" and "end" indexes used for events, for example, a policy package could include the following profiles/default/catalog.xml:
<?xml version="1.0"?> <object name="portal_catalog" meta_type="Plone Catalog Tool"> <index name="start" remove="True" /> <index name="end" remove="True" /> </object>
Care must be taken when setting up indexes with GenericSetup - if the import step for a catalog.xml is run a second time (for example when you reinstall the product), the indexes specified will be destroyed, losing all currently indexed entries, and then re-created fresh (and empty!). If you want to workaround this behaviour, you can either update the catalog afterwards or add the indexes yourself in Python code using a custom import handler.
For more info, see this setuphandler https://github.com/plone/plone.app.event/blob/master/plone/app/event/setuphandlers.py in plone.app.event or these discussions on more about this problem:
You can do this through portal_catalog tool in ZMI.
- Click portal_catalog in the portal root
- Click Catalog tab
- Click any object
To perform queries on custom data, you need to add the corresponding index to portal_catalog first.
E.g. If your Archetypes content type has a field:
schema = [ DateField("revisitDate", widget = atapi.DateWidget( label="Revisit date"), description="When you are alarmed this content should be revisited (one month beforehand this date)", schemata="revisit" ), ] class MyContent(...): # This is automatically run-time generated function accessor method, # but could be any hand-written method as well # def getMyCustomValue(self): # pass
You can add a new index which will index the value of this field, so you can make queries based on it later.
See more information about accessor methods.
If you want to create an index for content type you do not control yourself or if you want to do some custom logic in your indexer, please see Custom index method below.
This method is suitable during development time - you can create an index to your Plone database locally.
Click Indexes tab
On top right corner, you have a drop down menu to add new indexes. Choose the index type you need to add.
- Type: FieldIndex
- Id: getMyCustomValue
- Indexed attributes: getMyCustomValue
You can use Archetypes accessors methods directly as an indexed attribute.
In example we use
getMyCustomValue for AT field
The type of index you need depends on what kind queries you need to do on the data. E.g. direct value matching, ranged date queries, free text search, etc. need different kind of indexes.
After this you can query portal_catalog:
my_brains = contex.portal_catalog(getMyCustomValue=111) for brain in my_brains: print brain["getMyCustomValue"]
You need to have your own add-on product which registers new indexes when the add-on installer is run. This is the recommended method for repeated installations.
You can create an index
- Using catalog.xml where XML is written by hand
- Create the index through the web and export catalog data from a development site using portal_setup tool Export functionality. The index is created through-the-web as above, XML is generated for you and you can fine tune the resulting XML before dropping it in to your add-on product.
- Create indexes in Python code of add-on custom import step.
- As a prerequisitement, your add-on product must have GenericSetup profile support.
This way is repeatable: index gets created every time an add-on product is installed. It is more cumbersome, however.
There is a known issue of indexed data getting pruned when an add-on product is reinstalled. If you want to avoid this then you need to create new indexes in add-on installer custom setup step (Python code).
The example below is not safe for data prune on reinstall.
This file is
It installs a new index called
of DateIndex type.
<?xml version="1.0"?> <object name="portal_catalog" meta_type="Plone Catalog Tool"> <index name="revisit_date" meta_type="DateIndex"> <property name="index_naive_time_as_local">True</property> </index> </object>
For more information see
The plone.indexer package provides method to create custom indexing functions.
Sometimes you want to index "virtual" attributes of an object computed from existing ones, or just want to customize the way certain attributes are indexed, for example, saving only the 10 first characters of a field instead of its whole content.
To do so in an elegant and flexible way, Plone>=3.3 includes a new package, plone.indexer, which provides a series of primitives to delegate indexing operations to adapters.
Let's say you have a content-type providing the interface
IMyType. To define an indexer for your type which takes the
first 10 characters from the body text, just type (assuming the
attribute's name is 'text'):
from plone.indexer.decorator import indexer @indexer(IMyType) def mytype_description(object, **kw): return object.text[:10]
Finally, register this factory function as a named adapter using
ZCML. Assuming you've put the code above into a file named
<adapter name="description" factory=".indexers.mytype_description" />
And that's all! Easy, wasn't it?
Note you can omit the
for attribute because you passed this to
@indexer decorator, and you can omit the
attribute because the thing returned by the decorator is actually a
class providing the required
To learn more about the plone.indexer package, read its doctest.
For more info about how to create content-types, refer to the Archetypes Developer Manual.
Important note: If you want to adapt a out-of-the-box
Archetypes content-type like Event or News Item, take into account
you will have to feed the
indexer decorator with the Zope 3
interfaces defined in
files, not with the deprecated Zope 2 ones into the
The same rules and methods apply for metadata columns as creating index above. The difference with metadata is that
- It is not used for searching, only displaying the search result
- You store always a value copy as is
To create metadata colums in your
<?xml version="1.0"?> <object name="portal_catalog" meta_type="Plone Catalog Tool"> <!-- Add a new metadata column which will read from context.getSignificant() function --> <column value="getSignificant"/> </object>
Content item reindexing is run when
Plone calls reindexObject() if
- The object is modified by the user using the standard edit forms
- portal_catalog rebuild is run (from Advanced tab)
- If you add a new index you need to run Rebuild catalog to get the existing values from content objects to new index.
- You might also want to call reindexObject() method manually in some cases. This method is defined in the ICatalogAware interface.
You must call reindexObject() if you
- Directly call object field mutators
- Otherwise directly change object data
Unit test warning: Usually Plone reindexes modified objects at the end of each request (each transaction). If you modify the object yourself you are responsible to notify related catalogs about the new object data.
reindexObject() method takes the optional argument idxs which will list the changed indexes. If idxs is not given, all related indexes are updated even though they were not changed.
object.setTitle("Foobar") # Object.reindexObject() method is called to reflect the changed data in portal_catalog. # In our example, we change the title. The new title is not updated in the navigation, # since the navigation tree and folder listing pulls object title from the catalog. object.reindexObject(idxs=["Title"])
Also, if you modify security related parameters (permissions), you need to call reindexObjectSecurity().
Zope 2 product PluginIndexes defines various portal_catalog index types used by Plone.
- FieldIndex stores values as is
- DateIndex and DateRangeIndex store dates (Zope 2 DateTime objects) in searchable format. The latter provides ranged searches.
- KeywordIndex allows keyword-style look-ups (query term is matched against all the values of a stored list)
- ZCTextIndex is used for full text indexing
- ExtendedPathIndex is used for indexing content object locations.
Some interesting indexes
- start and end: Calendar event timestamps, used to make up calendar portlet
- sortable_title: Title provided for sorting
- portal_type: Content type as it appears in portal_types
- Type: Translated, human readable, type of the content
- path: Where the object is (getPhysicalPath accessor method).
- object_provides: What interfaces and marker interfaces object has. KeywordIndex of interface full names.
- is_default_page: is_default_page is method in CMFPlone/CatalogTool.py handled by plone.indexer, so there is nothing like object.is_default_page and this method calls ptool.isDefaultPage(obj)
Some interesting columns
- getRemoteURL: Where to go when the object is clicked
- getIcon: Which content type icon is used for this object in the navigation
- exclude_from_nav: If True the object won't appear in sitemap, navigation tree
sortable_title is type of FieldIndex (raw value) and normal
Title index is type of searchable text.
sortable_title is generated from
You can override
sortable_title by providing an indexer adapter with a specific interface of your content type.
from plone.indexer import indexer from xxx.researcher.interfaces import IResearcher @indexer(IResearcher) def sortable_title(obj): """ Provide custom sorting title. This is used by various folder functions of Plone. This can differ from actual Title. """ # Remember to handle None value if the object has not been edited yet first_name = obj.getFirst_name() or "" last_name = obj.getLast_name() or "" return last_name + " " + first_name
<adapter factory=".indexes.sortable_title" name="sortable_title" />
TextIndexNG3 is advanced text indexing solution for Zope.
Please read TextIndexNG3 README.txt regarding how to add support for custom fields. Besides installing TextIndexNG3 in GenericSetup XML you need to provide a custom indexing adapter.
# Add TextIndexNG3 in catalog.xml. Example:
<index name="getYourFieldName" meta_type="TextIndexNG3"> <field value="getYourFieldName"/> <autoexpand value="off"/> <autoexpand_limit value="4"/> <dedicated_storage value="False"/> <default_encoding value="utf-8"/> <index_unknown_languages value="True"/> <language value="en"/> <lexicon value="txng.lexicons.default"/> <query_parser value="txng.parsers.en"/> <ranking value="True"/> <splitter value="txng.splitters.simple"/> <splitter_additional_chars value="_-"/> <splitter_casefolding value="True"/> <storage value="txng.storages.term_frequencies"/> <use_normalizer value="False"/> <use_stemmer value="False"/> <use_stopwords value="False"/> </index>
# Create adapter which will add TextIndexNG3 indexing support for your custom fields. Example:
import logging from Products.TextIndexNG3.adapters.cmf_adapters import CMFContentAdapter from zope.component import adapts logger = logging.getLogger("Plone") class TextIndexNG3SearchAdapter(CMFContentAdapter): """ Adapter which provides custom field specific index information for TextIndexNG3 """ # Your content marker interface here adapts(IDescriptionBase) def indexableContent(self, fields): """ Produce TextIndexNG3 indexing information for the object Traceback:: ZCatalog.py(536)catalog_object() -> update_metadata=update_metadata) Catalog.py(360)catalogObject() -> blah = x.index_object(index, object, threshold) Products/TextIndexNG3/TextIndexNG3.py(91)index_object() -> result = self.index.index_object(obj, docid) Products/TextIndexNG3/src/textindexng/index.py(114)index_object() -> default_language=self.languages) Products/TextIndexNG3/src/textindexng/content.py(99)extract_content() -> icc = adapter.indexableContent(fields) > indexableContent() """ logging.debug("Indexing" + str(self.context)) # Use superclass to construct generic field adapters (id, title, description, SearchableText) icc = CMFContentAdapter.indexableContent(self, fields) # These fields have their own TextIndexNG3 indexes which # are queried separately from SearchableText accessors = [ "getClassifications", "getOtherNames" ] for accessor in accessors: try: method = getattr(self.context, accessor) except AttributeError: logger.warn("Declared indexing for unsuppoted accessor:" + accesor) continue value = method() # We might have a value which is not a real string, # but must be first stringified try: value = unicode(value) except UnicodeDecodeError, e: # Bad things happen here? logger.warn("Failed to index field:" + accessor) logger.exception(e) continue # Convert value to text format (utf-8) expected # by the indexer text = self._c(value) icc.addContent(accessor, text, self.language) return icc
# Add adapter in your ZCML:
Plone provides special index called
SearchableText which is used on the site full-text search.
Your content types can override
SearchableText index with custom method to populate this index
with the text they want to go into full-text searching.
Below is an example of having
SearchableText on a custom Archetypes content class.
This class has some methods which are not part of AT schema and thus must be manually
def SearchableText(self): """ Override searchable text logic based on the requirements. This method constructs a text blob which contains all full-text searchable text for this content item. This method is called by portal_catalog to populate its SearchableText index. """ # Test this by enable pdb here and run catalog rebuild in ZMI # xxx # Speed up string concatenation ops by using a buffer entries =  # plain text fields we index from ourself, # a list of accessor methods of the class plain_text_fields = ("Title", "Description") # HTML fields we index from ourself # a list of accessor methods of the class html_fields = ("getSummary", "getBiography") def read(accessor): """ Call a class accessor method to give a value for certain Archetypes field. """ try: value = accessor() except: value = "" if value is None: value = "" return value # Concatenate plain text fields as is for f in plain_text_fields: accessor = getattr(self, f) value = read(accessor) entries.append(value) transforms = getToolByName(self, 'portal_transforms') # Run HTML valued fields through text/plain conversion for f in html_fields: accessor = getattr(self, f) value = read(accessor) if value != "": stream = transforms.convertTo('text/plain', value, mimetype='text/html') value = stream.getData() entries.append(value) # Plone accessor methods assume utf-8 def convertToUTF8(text): if type(text) == unicode: return text.encode("utf-8") return text entries = [ convertToUTF8(entry) for entry in entries ] # Concatenate all strings to one text blob return " ".join(entries)