The goal of the project is development of intelligent system that can improve detection of human cognitive and behavior disorders. Example is dementia - the loss of intellectual functions, accompanied with personality disturbances (its most common form is the Alzheimer disease) and deterioration of individual’s quality of personal and professional life. Through the last decades, an abundance of medical data was gathered about dementia. These data include multimedia objects (images, sound, etc.), temporal data and time-series, and text documents; their analyses require state-of-the-art methods from the fields of data analysis. The expected project contributions include a prototype of the prediction system for analysis of disorders that will be generalized for several related health problems (such as Brain ischemia, Brain haemorrhagia, Brain tumors, Neurodegenerative disorders etc.) and evaluating the prototype on the available dementia data acquired from the Neurological (Novi Sad) and Oncological institute (Ljubljana).
The coronary artery disease, also known as ischemic heart disease, frequently results in a heart attack and is the most common cause of death globally. Ischemia (a restriction in blood supply to tissues) is generally caused by problems with blood vessels with resultant damage and dysfunction of tissue. Currently, the application of state-of-the-art Finite Element Methods (FEM) ischemia modelling techniques in urgent clinical situations, such as alerting patient in case of ischemia, is not possible as it requires knowledge of patient-specific anatomy which requires long measurements. We will develop a two¬step solution which alleviates these problems: 1. creation of large virtual FEM based database, 2. detection of ischemic beats and prediction of ischemia location using the developed database and data mining. The final solution will select classifiers, which are successful in detecting ischemic pulses based on potentials measured on the body surface (i.e., ECG), which will allow fast ischemia detection in urgent situations. If the ECG is classified as ischemic by the first stage classifier, it will be processed by the second stage data mining model, which will predict the location of the ischemic area.
The aim of the project is to develop a prototype system for automated video measurements of ski jumps distances on smaller ramps (up to K = 50m), which will have a simple hardware requirements (one video system and a laptop) and will enable accurate measurements in real or near-real time (the permissible delay of 4-6 seconds). During the development phase, the system will be evaluated on ski jumps practices, and later on competitions of younger age groups in regional, national and international competitions (Cockta cup, Alpe-Adria cup, etc.).
Research topics include: machine learning and knowledge discovery, estimating the reliability of multitarget predictions and the reliability of unsupervised learning, the adaptation of archetypal analysis for various machine learning subtasks na text mining, deep learning and matrix factorization, generation of (semi) artificial data, evolutionary learning, recommender systems for e-learning, automatic essay grading, web user profiling, modeling of sport games, statistical modeling of sports-betting market, analysing of oceanographic time-spatial data. The research is constantly motivated with practical uses and concrete applications.
AARS Applied Research Project. Development of new algorithms for statistical computation, speed up on graphics cards and application in various fields. With Faculty of Arts, Faculty of Sport, Geographical Institute and Optilab llc.
With ARSO (Slovenian Environment Agency).
Application of quantitative analysis methods to various problems. With Garex llc.
Basketball shot map location data acquisition, analysis and visualization. With Starta llc.
Statistical analysis and re-calibration of particulate matter measuring devices. With ARSO (Slovenian Environment Agency).
Statistical fraud detection in health insurance claims, statistical consulting and knowledge transfer. With Optilab llc.
The center is a research unit of University of Ljubljana intended for scientific research of language, creation and maintenance of practically useful digital language resources and technologies for modern Slovene language, available on the web to all Slovene language users. The center is organized within the ARRS-financed Network of research infrastructure centers at University of Ljubljana. The collaborating partners in the center are the Faculty of Social Sciences, Faculty of Arts, Faculty of Education, Faculty of Electrical Engineering, and Faculty of Computer and Information Science.
Research areas of the center include description of modern Slovene language from all points of view, and computer-aided learning and teaching of Slovene and foreign languages. Practically oriented tasks include continuous and user-friendly access to corpora, lexical, terminological and other databases, creation and maintenance of web-based language learning and teaching environments, as well as distribution of publicly financed and open source language resources and tools. An important aim of the center is to provide publicly available information about Slovene language resources and language technologies in Slovenia in order to enhance their public perception, and dissemination of language resources and tools.
Gigafida is a reference corpus of Slovene language containing Slovene texts from daily newspapers, magazines, all sorts of books, web pages, parliamentary speeches transcripts etc., all together around 1.2 billion words in 40,000 documents. It is a basis for balanced corpus Kres, and freely available corpora ccGigaFida and ccKres. Currently these corpora contain documents created until 2012. More information about the corpora is available at http://www.slovenscina.eu/korpusi/.
The project to upgrade these corpora has three goals: collecting new materials, machine processing of new and existing documents, and public availability of upgraded corpora, their distribution and public dissemination. The collection of new materials is going to focus on currently underrepresented texts (like textbooks and other primary and secondary schools materials), news portals and daily newspapers. The aim is to increase the Gigafida corpus to 1.5 billon words. Machine processing shall automatically tag all the documents in a uniform way and store them in a standardized format. The documents will be deduplicated. The updated corpuses will be publicly available through concordancers in CLARIN infrastructure and presented to general public and professional community.
In practice, m-learning is not necessarily tied to a classroom, students and teaching, but also to alternative forms of knowledge acquisition, such are also tourist applications for discovering unseen places. Within the project titled Tourist guide based on the “Treasure hunt” game the students have developed an educational game FRIStep (based on its predecessor, game Geostep). The game is based on a principle of »treasure seek«, where the »hidden treasure« is represented by an educational riddle that educates the player about various facts (e.g. about sights and places to see). Within the project, the basic game was upgraded with multilingual support, knowledge evaluation and scoring, use of artificial intelligence for recommending the most appropriate game, different game types and a modern user interface.
One of the civilization responsibilities is the need for increasing the quality of life for blind and weaksighted people using the modern technology. Daily activities which for healthy subjects seem to be trivial, such as finding the required destination, using various objects, safe navigation through complex paths, recognizing the local environment and an appropriate orientation in it, contextual comprehension of the environment etc. are much demanding and often unsolvable tasks for blind or weaksighted subject. Most of the approaches to solve those problems are based on GPS technology combined with inertial sensors and cameras. A basic limitation is information complexity of unmarked environments which can be avoided by providing a small number of useful data with respect to the current position, orientation and activity that a blind or weaksighted subject wants to perform (such as crossing the road). We expect that the use of machine learning methods, i.e. supervised and unsupervised learning on imbalanced datasets with combination of GPS, inertia sensors, cameras and an audiorendering system, will successfully solve the problems when using the existing approaches for helping during movement of blind and weaksighted people.
The goal of the proposed project is the development of appropriate machine learning algorithms that form the basis for future development of a modular mobile navigation system which will enable the blind and weaksighted people to autonomously, safely and accurate movement in known and unknown environments.
Cancer research is currently one of the leading fields of clinical research. One major issue in this field is cancer classification for accurate diagnosis and treatment and the other is utilizing an increasing amount of microarray raw data available. Up to now, majority of cancer classification studies were based on the patients overall clinical picture including histological findings at the tissue level, causing very limited diagnostic precision. Many existing tumor classes are heterogeneous, molecularly distinct and follow different clinical courses. Therefore, a differential diagnosis among a group of histologically similar cancers poses a challenging problem in clinical medicine.
An extensive use of DNA microarray technology for characterization of cellular processes is leading to an increasing amount of microarray data from cancer studies. Today, there exists a tremendous amount of array data available; however, much of it remains as raw and only a small percent of its potential is being utilized. Cancer classification based on gene expression analysis derived from microarray data is a way to utilize this raw data in order to get the most accurate cancer diagnosis.
The task of the project is to prepare an experimental data set describing patients with known cancer type and known exon mutations, to supplement this databases with negative examples (healthy patients) and additionally enhance it using the controlled sampling. Such enhanced database will be analysed with machine learning algorithms.
Mobile learning (m-learning) is developing rapidly in the world as the use of mobile devices is becoming popular and accessible financially and technologicaly. At the end of 2012, there were more than 6,8 billion mobile users in the world, which comprises 96% of the world population (source: The International Telecommunication Union, 2013). The contemporary way of living allows to be persistenlty connected to Internet, which enables students to seek for untraditional learning environments: dislocated from the classrooms, flexible in time, ubiquitously available and tailored to advances in technology. However, regardless of the development strategies for the period up to the year 2016, in practice, development of m-learning applications is based mostly on the enthusiasm of individual educational institutions or teachers. In this bilateral cooperation we propose to establish the first standard theoretical/educational framework for development of game-based learning and to develop and evaluate a set of such sample game-based learning applications.
AgroIT is an EU funded project that will implement an open platform based on open standards. Project will deliver applications and services to various stakeholders: farmers, local communities, state institutions, consulting institutions in farming (government founded and private) and EU institutions. Integrated platform will enable farmers to get all applications they need: ERP for SME’s (with all accountancy functionalities), mobile applications for easier data entry and report review, decision support system for better farm management and automatic data collection through sensors and other devices on the farm.
An important task of the electricity distribution companies is to forecast the electrical load (demand) for a given sub-network of consumers, as this is a relevant for identification of critical points and making decisions to buy or sell energy. From a data mining perspective, this problem is characterized by a large number of variables (data from the electrical sensors), produced in a continuous flow in a dynamic non-stationary environment. Standard prediction techniques in such circumstances fail and more complicated dynamic models are required – models that evolve over time and are able to adapt to changes in the distribution generating examples. Besides only forecasting the electricity load, the prediction should also contain an explanation of the phenomenon and the quality of the predictor must be clearly assessed with prediction reliability estimates in order to support user’s final decision.
In the framework of the project, the both research groups will aim to develop decision explanation methodology for concept drift detection in data stream learning, develop and test reliability estimates suitable for data streams, develop online summarization techniques for electrical time series, and study the utility of reliability estimates and decision explanations in data streams. The deliverables will ensure better predictions and robust decisions in electrical network management.
Coronary heart disease (CHD) is one of the world's most frequent causes of mortality and an important problem in medical practice. It is disease where one, two or all three coronary arteries are narrowed or obstructed mainly by atherosclerotic plaque(s). The consequence is diminished blood supply causing diminished oxygen supply of the dependent region of the myocardium, manifesting as angina pectoris. The most extreme consequences are myocardial infarction and cardiac death.
The object of project is collaboration in deployed plaque characterization algorithms and the patient-specific prediction model which can be enhanced with knowledge accumulated by heterogeneous data and resources. Blood element concentrations (e.g. LDL/HDL), patient medical condition (e.g. diabetic, hypertension), and patient habitual behavior (e.g. smoking/no smoking, exercising/no exercising) will be correlated with features extracted from patient’s medical images (e.g. MRI) and used for plaque characterization. The corresponding data will be situated in different repositories and even different systems. Both research groups will implement the treatment support system that will improve quality of medical services by providing cardiologists with suggestions on the best possible treatment scenario. The high-end expert system has a potential of application in medical practice (cardiology).
An important group of methods solving these problems is based on constructing ensembles of base learners. Bagging, boosting, random forests, and their variants are the most popular examples of this methodology and due to their classification strength and efficiency they belong to the state-of-the-art methods in classification and regression. When constructing an efficient prediction model these methods evaluate the predictive power of descriptive attributes and construct recursive partitioning of the problem space. The aim of our cooperation is to develop and test a new class of distance-based and uniform distribution based attribute evaluation measures which are better suited to the problem of imbalanced class distribution. Additionally we will investigate their multiclass extensions. For testing the proposed methods we will design complex artificial data sets for testing wide range of types of configurations of regions with different classification. The developed attribute evaluation measures will be used in state-of-the-art learning system based on random forests.
The above scientific goals need fast and efficient data mining algorithms and good exploratory data analysis tools. In data mining community the open-source R statistical environment is getting more and more attention for its wide applicability, availability, ease of use and good visualization capabilities. It is our aim in this project to develop new machine learning algorithms in this environment and therefore make it ready available for further scientific and commercial exploitation.
We are expecting that this joint research proposal will produce valuable research and practical results that will be of interest to the data mining and machine learning community and practitioners solving complex problems involving imbalanced data sets, e.g., in medicine, engineering, finance, public sector, etc.
Research topics: Machine learning, data mining, evolutionary computing, search algorithms, constraint programming, combinatorical optimization, methods for qualitative reasoning, application of machine learning in biomedical informatics, intelligent agents and semantic webs. The selection of these topics is motivated based on the possibilities of usage and application in practice.
Intensio Computer Engineering Ltd.
The main aim of the project is to take advantage of the complementary experience on machine learning issues the two groups posses and produce the synergetic effect. The exchange of knowledge, experience and know-how is certainly very important here. In following we will indicate present expertise of Portugeese and Slovene side with P and S, respectively. The scientific goals of the project are to pursue certain topics of machine learning and data mining, in particular: investigate current machine learning approaches to learning probabilities (P), improve them with ensemble techniques (S) and extend them to new application areas in medicine, economy and analysis of web data (P and S) develop incremental mining techniques for selection of useful information in data streams (P) and test the algorithms on the problem of profiling web sites users (S) research and develop of innovative dimensionality reduction techniques for large databases (S) and their application on clikstream analysis (S) and in a web-based tutoring system (P).
Usually machine learning algorithms provide only bare predictions (classifications) for new unclassified examples (test cases). While there are ways for almost all machine learning algorithms to at least partially provide quantitative assessment of the particular classification, so far there is no general method to assess the quality (confidence, reliability) of the classification decision made for a new case. In this collaborative research project we elaborated on the very important issues of how to assess the performance of a classifier on a single case and not on the average performance for a certain set of cases and solved some of the problems so that the method became closer to generally applicable method.
Aim of the cooperation in this research programme was the development of methods for evaluation of reliability of prediction in classification and regression and their incorporation into existing machine learning systems; the development of methods for parametrization of images and for image mining on different levels of abstraction and their application in various medical diagnostic problems (such as a whole body scintigraphy, and scintigraphy of coronary vessels); the development of algorithms for feature evaluation in classification and regression that take into account various interactions between attributes, and their integration into the existing machine learning and data mining systems; the development and practical implementation of the methodology for connection between machine learning and data bases; the development of methods for explaining the prediction in classification and regression and various applications of developed methods (for example in medical diagnosis).
with University of Hasselt, Belgium
The cooperation with the group of Prof Koen Vanhoof focused on a context sensitive approach to analysis of ordered features applicable to surveys data and especially relevant to the marketing research. We applied the approach to costumer satisfaction research and country-of-origin data. The important aspect of this approach is visualization and extension of the results with confidence intervals.
Principal investigator: Marko Robnik Šikonja
Many important practical problems from intelligent data analysis assume that all outcomes are not equally important i.e., that they have different costs assigned. The inductive learning with this assumption is currently a hot research topic.
Algorithms Relief are among the best algorithms for attribute estrimation. They were successfully used in many machine learning tasks. So far these algorithms are not adapted for cost-sensitive classifiication and such an adaptation together with its implementation in a system for intelligent data analysis significantly improved the success of solving the cost sensitive problems. With this project we will analysed various extensions and adaptations of ReliefF algorithm for cost sensitive intelligent data analysis. Theoretical derivations and analyses were implemented in the open source system for intelligent data analysis which we adapted for cost sensitive problems. We tested the solutions on several artificial and real world problems
Basic research project funded by Slovenian Ministry of Education, Science and Sports.
Generali SKB Insurance company.
Basic research project funded by Slovenian Ministry of Science and Technology.