public abstract class TikaPoweredContentTransformer extends AbstractContentTransformer2
ContentTransformer
implementations which are powered by Apache Tika.
To use Tika to transform some content into Text, Html or XML, create an
implementation of this / use the Auto Detect transformer.
For now, all transformers are registered as regular, rather than explicit
transformations. This should allow you to register your own explicit
transformers and have them nicely take priority.Modifier and Type | Field and Description |
---|---|
protected org.apache.tika.extractor.DocumentSelector |
documentSelector |
protected static String |
LINE_BREAK
Windows carriage return line feed pair.
|
protected List<String> |
sourceMimeTypes |
static String |
WRONG_FORMAT_MESSAGE_ID |
transformerDebug
transformerConfig
Modifier | Constructor and Description |
---|---|
protected |
TikaPoweredContentTransformer(List<String> sourceMimeTypes) |
protected |
TikaPoweredContentTransformer(String[] sourceMimeTypes) |
Modifier and Type | Method and Description |
---|---|
protected org.apache.tika.parser.ParseContext |
buildParseContext(org.apache.tika.metadata.Metadata metadata,
String targetMimeType,
TransformationOptions options)
By default returns a ParseContent that does not recurse
|
String |
getComments(boolean available)
Overridden to supply a comment or String of commented out transformation properties
that specify any (hard coded or implied) supported transformations.
|
protected ContentHandler |
getContentHandler(String targetMimeType,
Writer output)
Returns an appropriate Tika ContentHandler for the
requested content type.
|
protected org.apache.tika.extractor.DocumentSelector |
getDocumentSelector(org.apache.tika.metadata.Metadata metadata,
String targetMimeType,
TransformationOptions options)
Gets the document selector, used for determining whether to parse embedded resources,
null by default so parse all.
|
protected abstract org.apache.tika.parser.Parser |
getParser()
Returns the correct Tika Parser to process
the document.
|
boolean |
isTransformableMimetype(String sourceMimetype,
String targetMimetype,
TransformationOptions options)
Can we do the requested transformation via Tika?
We support transforming to HTML, XML or Text
|
void |
setDocumentSelector(org.apache.tika.extractor.DocumentSelector documentSelector)
Sets the document selector, used for determining whether to parse embedded resources.
|
void |
transformInternal(org.alfresco.service.cmr.repository.ContentReader reader,
org.alfresco.service.cmr.repository.ContentWriter writer,
TransformationOptions options)
Method to be implemented by subclasses wishing to make use of the common infrastructural code
provided by this class.
|
checkTransformable, getExecutorService, getRetryTransformOnDifferentMimeType, getStrictMimeTypeCheck, getTransformationTime, getTransformationTime, isTransformationLimitedInternally, recordError, recordTime, recordTime, register, setAdditionalThreadTimout, setExecutorService, setMetadataExtracterConfig, setRegisterTransformer, setRegistry, setRetryTransformOnDifferentMimeType, setStrictMimeTypeCheck, setUseTimeoutThread, toString, transform, transform, transform
getLimits, getLimits, getLimits, getMaxPages, getMaxSourceSizeKBytes, getMaxSourceSizeKBytes, getPageLimit, getReadLimitKBytes, getReadLimitTimeMs, getTimeoutMs, isPageLimitSupported, isTransformable, isTransformable, isTransformableSize, setLimits, setMaxPages, setMaxSourceSizeKBytes, setMimetypeLimits, setPageLimit, setPageLimitsSupported, setReaderLimits, setReadLimitKBytes, setReadLimitTimeMs, setTimeoutMs, setTransformerDebug
deprecatedSetter, equals, getBeanName, getCommentsOnlySupports, getExtensionOrAny, getMimetype, getMimetypeService, getName, getSimpleName, hashCode, isExplicitTransformation, isSupportedTransformation, onlySupports, setBeanName, setExplicitTransformations, setMimetypeService, setSupportedTransformations, setTransformerConfig, setUnsupportedTransformations
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
getName, isExplicitTransformation
protected org.apache.tika.extractor.DocumentSelector documentSelector
protected static final String LINE_BREAK
public static final String WRONG_FORMAT_MESSAGE_ID
protected TikaPoweredContentTransformer(String[] sourceMimeTypes)
protected abstract org.apache.tika.parser.Parser getParser()
TikaAutoContentTransformer
which
makes use of the Tika auto-detection.public boolean isTransformableMimetype(String sourceMimetype, String targetMimetype, TransformationOptions options)
isTransformableMimetype
in interface ContentTransformer
isTransformableMimetype
in class AbstractContentTransformerLimits
sourceMimetype
- the source mimetypetargetMimetype
- the target mimetypeoptions
- the transformation optionspublic String getComments(boolean available)
ContentTransformerHelper
AbstractContentTransformerLimits.isTransformableMimetype(String, String, TransformationOptions)
or ContentTransformerWorker.isTransformable(String, String, TransformationOptions)
have been overridden.
See ContentTransformerHelper.getCommentsOnlySupports(List, List, boolean)
which may be used to help construct a comment.getComments
in interface ContentTransformer
getComments
in class ContentTransformerHelper
available
- indicates if the transformer has been registered and is available to be selected.
false
indicates that the transformer is only available as a component of a
complex transformer.protected ContentHandler getContentHandler(String targetMimeType, Writer output) throws TransformerConfigurationException
public void setDocumentSelector(org.apache.tika.extractor.DocumentSelector documentSelector)
documentSelector
- DocumentSelectorprotected org.apache.tika.extractor.DocumentSelector getDocumentSelector(org.apache.tika.metadata.Metadata metadata, String targetMimeType, TransformationOptions options)
metadata
- MetadatatargetMimeType
- Stringoptions
- TransformationOptionsprotected org.apache.tika.parser.ParseContext buildParseContext(org.apache.tika.metadata.Metadata metadata, String targetMimeType, TransformationOptions options)
public void transformInternal(org.alfresco.service.cmr.repository.ContentReader reader, org.alfresco.service.cmr.repository.ContentWriter writer, TransformationOptions options) throws Exception
AbstractContentTransformer2
transformInternal
in class AbstractContentTransformer2
reader
- the source of the content to transformwriter
- the target to which to write the transformed contentoptions
- a map of options to use when performing the transformation. The map
will never be null.Exception
- exceptions will be handled by this class - subclasses can throw anythingCopyright © 2005–2017 Alfresco Software. All rights reserved.