public class PdfBoxContentTransformer extends TikaPoweredContentTransformer
Modifier and Type | Field and Description |
---|---|
protected org.apache.tika.parser.pdf.PDFParserConfig |
pdfParserConfig |
documentSelector, LINE_BREAK, sourceMimeTypes, WRONG_FORMAT_MESSAGE_ID
transformerDebug
transformerConfig
Constructor and Description |
---|
PdfBoxContentTransformer() |
Modifier and Type | Method and Description |
---|---|
protected org.apache.tika.parser.ParseContext |
buildParseContext(org.apache.tika.metadata.Metadata metadata,
String targetMimeType,
TransformationOptions options)
By default returns a ParseContent that does not recurse
|
protected org.apache.tika.parser.Parser |
getParser()
Returns the correct Tika Parser to process
the document.
|
void |
setPdfParserConfig(org.apache.tika.parser.pdf.PDFParserConfig pdfParserConfig)
Sets the PDFParserConfig for inclusion in the ParseContext sent to the PDFBox parser,
useful for setting config like spacingTolerance.
|
getComments, getContentHandler, getDocumentSelector, isTransformableMimetype, setDocumentSelector, transformInternal
checkTransformable, getExecutorService, getRetryTransformOnDifferentMimeType, getStrictMimeTypeCheck, getTransformationTime, getTransformationTime, isTransformationLimitedInternally, recordError, recordTime, recordTime, register, setAdditionalThreadTimout, setExecutorService, setMetadataExtracterConfig, setRegisterTransformer, setRegistry, setRetryTransformOnDifferentMimeType, setStrictMimeTypeCheck, setUseTimeoutThread, toString, transform, transform, transform
getLimits, getLimits, getLimits, getMaxPages, getMaxSourceSizeKBytes, getMaxSourceSizeKBytes, getPageLimit, getReadLimitKBytes, getReadLimitTimeMs, getTimeoutMs, isPageLimitSupported, isTransformable, isTransformable, isTransformableSize, setLimits, setMaxPages, setMaxSourceSizeKBytes, setMimetypeLimits, setPageLimit, setPageLimitsSupported, setReaderLimits, setReadLimitKBytes, setReadLimitTimeMs, setTimeoutMs, setTransformerDebug
deprecatedSetter, equals, getBeanName, getCommentsOnlySupports, getExtensionOrAny, getMimetype, getMimetypeService, getName, getSimpleName, hashCode, isExplicitTransformation, isSupportedTransformation, onlySupports, setBeanName, setExplicitTransformations, setMimetypeService, setSupportedTransformations, setTransformerConfig, setUnsupportedTransformations
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
getName, isExplicitTransformation
protected org.apache.tika.parser.Parser getParser()
TikaPoweredContentTransformer
TikaAutoContentTransformer
which
makes use of the Tika auto-detection.getParser
in class TikaPoweredContentTransformer
public void setPdfParserConfig(org.apache.tika.parser.pdf.PDFParserConfig pdfParserConfig)
pdfParserConfig
- protected org.apache.tika.parser.ParseContext buildParseContext(org.apache.tika.metadata.Metadata metadata, String targetMimeType, TransformationOptions options)
TikaPoweredContentTransformer
buildParseContext
in class TikaPoweredContentTransformer
Copyright © 2005–2017 Alfresco Software. All rights reserved.