Project Description / Goal

The project focuses on improving Anti-Money Laundering (AML) processes for Buffetti Finance using automation and machine learning. The existing manual AML processes, reliant on Excel, are inefficient and error-prone. The goal is to implement machine learning techniques, particularly the Isolation Forest algorithm, to enhance anomaly detection in financial transactions, thereby increasing accuracy, efficiency, and scalability. The project also aims to integrate data analytics tools like Microsoft Power BI and explore the use of Microsoft Fabric for real-time data management and scalability

Technology Stack

The technology stack employed includes: Data Processing and Analytics: Python scripts and libraries for machine learning, particularly the Isolation Forest algorithm. Visualization: Microsoft Power BI for creating dashboards and visualizing anomalies. Data Management: Consideration of Microsoft Fabric for unified data storage (OneLake), automated pipelines, and integration of machine learning models. Cloud Services: Azure services such as Azure Data Factory and Azure Synapse Analytics for data pipeline automation and scalability

Project Planning / Architecting

Planning: The project involved six months of transactional data analysis from Buffetti's 13 transaction tables. Key attributes, such as transaction location, date, amount, and payment method, were identified and consolidated into a streamlined dataset. Architecting: The system architecture supports anomaly detection through Python-based machine learning algorithms, integrated with Power BI for visualization. The potential addition of Microsoft Fabric aims to enhance scalability and real-time anomaly detection capabilities​

Project Journal / History

Initial Question: How can automation and machine learning improve the efficiency of Buffetti’s AML processes compared to manual methods? Discovery: Manual AML methods were found to be time-intensive and error-prone, prompting a shift toward anomaly detection via machine learning. Implementation: The Isolation Forest algorithm was selected for its efficiency in identifying outliers in complex datasets. Challenges were addressed, including data preprocessing and system integration. Outcomes: A Power BI dashboard was created to present anomalies visually. Recommendations included adopting Microsoft Fabric to further optimize data handling and real-time monitoring​

Obstacles

Key challenges faced during the project include: Regulatory Feedback Absence: The lack of feedback from regulators hindered model validation and refinement. Data Complexity: Handling 13 transaction tables with inconsistent attributes required significant preprocessing. Technology Limitations: Power BI struggled with large datasets, necessitating data consolidation and compression. Validation Challenges: Without real-time deployment or confusion matrix validation, assessing the model’s practical accuracy was limited. Operational Costs: Initial costs for implementing automation and machine learning systems were not fully explored

Launch PowerBI View Source Code Download Full Thesis