public class HtmlMetadataExtracter extends AbstractMappingMetadataExtracter
author: -- cm:author title: -- cm:title description: -- cm:descriptionTIKA note - all metadata will be present, but will need to search for the varient names ourselves as tika puts them in as-is.
MetadataExtracter.OverwritePolicy
Modifier and Type | Field and Description |
---|---|
static Set<String> |
MIMETYPES |
logger, MEGABYTE_SIZE, metadataExtracterConfig, NAMESPACE_PROPERTY_PREFIX, PROPERTY_COMPONENT_EMBED, PROPERTY_COMPONENT_EXTRACT, PROPERTY_PREFIX_METADATA
Constructor and Description |
---|
HtmlMetadataExtracter() |
Modifier and Type | Method and Description |
---|---|
protected Map<String,Serializable> |
extractRaw(org.alfresco.service.cmr.repository.ContentReader reader)
Override to provide the raw extracted metadata values.
|
checkIsEmbedSupported, checkIsSupported, embed, embedInternal, extract, extract, extract, filterSystemProperties, getBeanName, getDefaultEmbedMapping, getDefaultMapping, getEmbedMapping, getExecutorService, getLimits, getMapping, getMimetypeService, init, isEmbeddingSupported, isSupported, makeDate, newRawMap, putRawValue, readEmbedMappingProperties, readEmbedMappingProperties, readGlobalEmbedMappingProperties, readGlobalExtractMappingProperties, readMappingProperties, readMappingProperties, register, setApplicationContext, setBeanName, setDictionaryService, setEmbedMapping, setEmbedMappingProperties, setEnableStringTagging, setExecutorService, setFailOnTypeConversion, setInheritDefaultEmbedMapping, setInheritDefaultMapping, setMapping, setMappingProperties, setMetadataExtracterConfig, setMimetypeLimits, setMimetypeService, setOverwritePolicy, setProperties, setRegistry, setSupportedDateFormats, setSupportedEmbedMimetypes, setSupportedMimetypes
protected Map<String,Serializable> extractRaw(org.alfresco.service.cmr.repository.ContentReader reader) throws Throwable
AbstractMappingMetadataExtracter
default mapping
doesn't handle all properties, it is
possible for each instance of the extracter to be configured differently and more or
less of the properties may be used in different installations.
Raw values must not be trimmed or removed for any reason. Null values and empty strings are
Properties extracted and their meanings and types should be thoroughly described in the class-level javadocs of the extracter implementation, for example:
editor: - the document editor --> cm:author title: - the document title --> cm:title user1: - the document summary user2: - the document description --> cm:description user3: - user4: -
extractRaw
in class AbstractMappingMetadataExtracter
reader
- the document to extract the values from. This stream provided by
the reader must be closed if accessed directly.Throwable
- All exception conditions can be handled.AbstractMappingMetadataExtracter.getDefaultMapping()
Copyright © 2005–2017 Alfresco Software. All rights reserved.