Blog Post: Heterogeneity in Context Extraction

 

Carlos Coutinho, Caixa Mágica Software

"Extracting information in TIMBUS is more than stating that the application shall be open-sourced, or that it shall be running in Java so that it is able to run in multiple environments and platforms. Each environment has its own needs and requirements, and may be supported by generically established tools and custom extraction mechanisms as well."

 

carlos_cr

tiny_bar

Market  terms  and  conditions  lead  businesses  to  a  constant  need  to  change  and  adapt  to  new environment conditions, new growing paradigms and solutions. The fast evolution of technological trends causes diversity to be stronger than ever, and the tendency is for these changes to be always more significant, both in complexity and in impact. Several motivations are source for this concern, e.g., an increase in the performance, the ability to tackle problems, the aim to address new markets, compatibility with different platforms, trends and fashions, specific requirements, and so many others. This heterogeneity leads to a lot of difficulties in the TIMBUS intent to perform an automatic capture of some of the knowledge and assets about the business. This context capture needs to be very flexible and able to address different needs and requirements. It needs to address open-source and proprietary environments, new and legacy applications, and be prepared to handle new platforms and systems, as well as different types of security and secrecy demands.

On one side, the information model needs to be flexible enough to handle the heterogeneity in concepts and business information. This has been already discussed in a previous blog post. But on the other hand, being able to capture some of this information automatically into this information system requires a context extraction architecture which is also challenging. The major problems here are to have a structure flexible enough to be able to access a business environment, consisting of multiple information systems, computers and other devices, people and other sources of information.

The approach taken by TIMBUS relates to having an open architecture which is able to expand its ability to contact these information sources. In this sense it is able to retrieve information with more or less human intervention (i.e., although it is capable of accessing the information sources directly, it must be also able to receive information retrieved by other tools or by human intervention; this is relevant to handle some tricky security problems and to enhance confidence on the target systems). The proposed method is to develop generic extraction mechanisms, ones which are able to attach numerous specifically implemented extractors or adapters, which then are able to retrieve the information correctly.

As some of the target implementations require secrecy, some of these adapters can be developed in-house in a controlled environment, or even be run by the target users and their extraction results then be communicated to the extracting environment, which is able to model the knowledge coming from automated and manual tools into a context which can then be analysed and queried for risk analysis and digital preservation.

context acquisition

Hence, extracting information in TIMBUS is more than stating that the application shall be open-sourced, or that it shall be running in Java so that it is able to run in multiple environments and platforms. Each environment has its own needs and requirements, and may be supported by generically established tools and custom extraction mechanisms as well.

It will support the injection of code in target machines, and additionally, the establishment of adapters and workflow mechanisms to be able to retrieve information from others. All this information is being properly “digested” and modelled into the TIMBUS information model, and a set of reports also need to be made available, to cover modelling and data inconsistencies, duplication and conflicts in information, and other similar experiences.

Please register in order to comment

You have no rights to post comments