Changes from 1.41.0 to 1.42.0

Core

* Added the attribute `extType` on the `AltTranslation` class (used for example in XTM XLIFF files).
* Major refactor of `syncronizeCodeIds` and `alignAndCopyCodeMetadata`
* Major refactor of all core Okapi resources to consistently handle `Properties` and `IAnnotations`
* Add `TextPart.whiteSpaceStrategy` in order to preserve whitespace handling in original formats (xliff2 specifically)
* Deprecate ILayerProvider for removal next release

Connectors

  • MyMemory

    • Actually use the key parameter to get results from your own translation memories created trough the web site.
    • Option to send and email to get more quota.
    • Removed IP sending now useless.
    • Use the max_hits parameter when querying the service.
    • Set creation_date attribute in results.
    • Streamlined internal logic.

Filters

  • IDML Filter

  • Markdown Filter

    • Changed MIME type to text/markdown as officially registered with IETF RFC 7763 since 2016. The old MIME type was text/x-markdown.
    • Add the “Translate Indented Code Blocks” option to control extraction of indented code blocks, which had previously always been extracted.
  • OpenXML Filter

    • Issue #927: Alignment and RTL handling improved.
    • Issue #982: Worksheet inline strings extraction provided.
    • Issue #1010: Excluded or hidden presentation slides and their related parts got excluded or hidden as well.
    • Issue #1058: DrawingML text line break positioning fixed.
    • Issue #1059: The extraction of worksheet and row groups provided.
    • Issue #1060: Rows exclusion configuration provided.
    • Issue #1061: New columns exclusion configuration provided.
    • Issue #1062: Metadata rows and columns configuration provided.
    • Issue #1080: Documents processing with cross-structure revisions in tables fixed.
    • Issue #1083: The handling of multiple instructions in complex fields improved.
    • Issue #1085: Empty structural document tag content handling fixed.
    • Issue #1095: The processing of tables with blank rows at the end fixed.
    • Issue #1102: The merge of paragraphs with absent properties fixed.
  • XLIFF Filter

    • Issue #1018: Expose the cdataSubfilter option in the filter config UI.
  • XLIFF2 Filter

    • Add mrk tag support
    • Fix loss of roundtrip whitespace info
    • Fix loss of Segment id
    • Update XliffWriter to always output xml:space value
    • Add setTagType to MTag
    • Add full support for subtype and type
    • Fix merge bug with ignorable segments being misplaced after merge
  • XML Filter

    • Issue #1024: On merge, correctly escape markup inside CDATA sections that were extracted using the inlineCData option.
    • Added a PROP_XLIFF_FLAVOR property to the StartSubDocument object/event (triggered by <file>) indicating the flavor of the document.
    • Added a PROP_REPETITION property at the segment level indicating (for both SDL and XTM flavors) if the segment was marked as repetition.
  • PO Filter

    • Correctly detect plurals when the ‘Plural-Forms’ entry is split on two physical lines.
    • Decode escaped characters (\, ", tabs, newlines, carriage returns, etc.) in message ids and message strings upon reading and encode them back while writing. Unescaped characters are read unaltered but encoded while writing.
    • Default inline code finder rules do not capture escaped sequences anymore.
    • Update POWriter to use new encoder.
  • TS Filter

    • Added the ability to pick up comment and extracomment elements from TS files as default into annotations
  • TMX Filter

    • Standardize mapping of TMX inline code id's to Okapi Code.id and Code.originalId
    • Fix various bugs with matching bpt and ept inline codes. Especially if codes are overlapping.
    • Simplify redundant code
  • SDLPackage Filter

    • Issue #1093: SDLXLIFF Files in sub-folders are now processed.

Libraries

  • Segmentation

    • Fixed bug with icu4j segmentation rules option. All icu4j rules should now work when combined with SRX rules

Steps

  • Rainbow Translation Kit Merging Step

  • Simple TM Batch Leveraging Step

    • Issue #1015: Now both IQuery and ITMQuery connectors can be used with the step.
  • Text Modification Step

    • A greater set of ASCII characters are replaced with Extended Latin characters.

Connectors

  • Pensieve TM

    • Issue #837: Pensieve TM now uses Lucene 8.8 libraries, upgraded from 3.3.
    • Slight changes in the TM behavior are expected but the TM should largely behave similarly. Testing before production use is strongly encouraged.
    • Note: These public classes have been removed from okapi-lib-search: AlphabeticNgramTokenizer, ConcordanceFuzzyQuery, ConcordanceFuzzyScorer, FuzzySimilarity, SimpleConcordanceFuzzyQuery, SimpleConcordanceFuzzyScorer, SortableToken

Applications

  • Tikal

    • Moved sources from okapi to applications, maven groupId changed from net.sf.okapi to net.sf.okapi.applications.
  • Tikal & Rainbow

    • Using our own slf4j logger (so that we can change level, and show results in GUI).

OSes

  • Windows

    • Updated the launchers (.exe files) to use the java in JAVA_HOME or PATH, if available.
  • macOS

    • Added build for aarch64 (ARM 64 bit, Apple M1 chip)
  • Build

    • Cleaned and unified the various build scripts, for all platforms.
      Now deployment/maven contains only two scripts: build (with parameters, try build help), and clean (.bat and .sh versions).
    • We merged the integration tests from their own separate repository into the Okapi repo (integration-tests folder).