Machine learning (ML) is changing virtually every aspect of our lives. Today ML algorithms accomplish tasks that until recently only expert humans could perform. As it relates to finance, this is the most exciting time to adopt a disruptive technology that will transform how everyone invests for generations. Readers will learn how to structure Big data in a way that is amenable to ML algorithms; how to conduct research with ML algorithms on that data; how to use supercomputing methods; how to backtest your discoveries while avoiding false positives. The book addresses real life problems faced by practitioners on a daily basis, and explains scientifically sound solutions using math, supported by code and examples. Readers become active users who can test the proposed solutions in their particular setting. Written by a recognized expert and portfolio manager, this book will equip investment professionals with the groundbreaking tools needed to succeed in modern finance. Cover Title Page Copyright Contents About the Author Preamble Chapter 1 Financial Machine Learning as a Distinct Subject 1.1 Motivation 1.2 The Main Reason Financial Machine Learning Projects Usually Fail 1.2.1 The Sisyphus Paradigm 1.2.2 The Meta-Strategy Paradigm 1.3 Book Structure 1.3.1 Structure by Production Chain 1.3.2 Structure by Strategy Component 1.3.3 Structure by Common Pitfall 1.4 Target Audience 1.5 Requisites 1.6 FAQs 1.7 Acknowledgments Exercises References Bibliography Part 1 Data Analysis Chapter 2 Financial Data Structures 2.1 Motivation 2.2 Essential Types of Financial Data 2.2.1 Fundamental Data 2.2.2 Market Data 2.2.3 Analytics 2.2.4 Alternative Data 2.3 Bars 2.3.1 Standard Bars 2.3.2 Information-Driven Bars 2.4 Dealing with Multi-Product Series 2.4.1 The ETF Trick 2.4.2 PCA Weights 2.4.3 Single Future Roll 2.5 Sampling Features 2.5.1 Sampling for Reduction 2.5.2 Event-Based Sampling Exercises References Chapter 3 Labeling 3.1 Motivation 3.2 The Fixed-Time Horizon Method 3.3 Computing Dynamic Thresholds 3.4 The Triple-Barrier Method 3.5 Learning Side and Size 3.6 Meta-Labeling 3.7 How to Use Meta-Labeling 3.8 The Quantamental Way 3.9 Dropping Unnecessary Labels Exercises Bibliography Chapter 4 Sample Weights 4.1 Motivation 4.2 Overlapping Outcomes 4.3 Number of Concurrent Labels 4.4 Average Uniqueness of a Label 4.5 Bagging Classifiers and Uniqueness 4.5.1 Sequential Bootstrap 4.5.2 Implementation of Sequential Bootstrap 4.5.3 A Numerical Example 4.5.4 Monte Carlo Experiments 4.6 Return Attribution 4.7 Time Decay 4.8 Class Weights Exercises References Bibliography Chapter 5 Fractionally Differentiated Features 5.1 Motivation 5.2 The Stationarity vs. Memory Dilemma 5.3 Literature Review 5.4 The Method 5.4.1 Long Memory 5.4.2 Iterative Estimation 5.4.3 Convergence 5.5 Implementation 5.5.1 Expanding Window 5.5.2 Fixed-Width Window Fracdiff 5.6 Stationarity with Maximum Memory Preservation 5.7 Conclusion Exercises References Bibliography Part 2 Modelling Chapter 6 Ensemble Methods 6.1 Motivation 6.2 The Three Sources of Errors 6.3 Bootstrap Aggregation 6.3.1 Variance Reduction 6.3.2 Improved Accuracy 6.3.3 Observation Redundancy 6.4 Random Forest 6.5 Boosting 6.6 Bagging vs. Boosting in Finance 6.7 Bagging for Scalability Exercises References Bibliography Chapter 7 Cross-Validation in Finance 7.1 Motivation 7.2 The Goal of Cross-Validation 7.3 Why K-Fold CV Fails in Finance 7.4 A Solution: Purged K-Fold CV 7.4.1 Purging the Training Set 7.4.2 Embargo 7.4.3 The Purged K-Fold Class 7.5 Bugs in Sklearn's Cross-Validation Exercises Bibliography Chapter 8 Feature Importance 8.1 Motivation 8.2 The Importance of Feature Importance 8.3 Feature Importance with Substitution Effects 8.3.1 Mean Decrease Impurity 8.3.2 Mean Decrease Accuracy 8.4 Feature Importance without Substitution Effects 8.4.1 Single Feature Importance 8.4.2 Orthogonal Features 8.5 Parallelized vs. Stacked Feature Importance 8.6 Experiments with Synthetic Data Exercises References Chapter 9 Hyper-Parameter Tuning with Cross-Validation 9.1 Motivation 9.2 Grid Search Cross-Validation 9.3 Randomized Search Cross-Validation 9.3.1 Log-Uniform Distribution 9.4 Scoring and Hyper-parameter Tuning Exercises References Bibliography Part 3 Backtesting Chapter 10 Bet Sizing 10.1 Motivation 10.2 Strategy-Independent Bet Sizing Approaches 10.3 Bet Sizing from Predicted Probabilities 10.4 Averaging Active Bets 10.5 Size Discretization 10.6 Dynamic Bet Sizes and Limit Prices Exercises References Bibliography Chapter 11 The Dangers of Backtesting 11.1 Motivation 11.2 Mission Impossible: The Flawless Backtest 11.3 Even If Your Backtest Is Flawless, It Is Probably Wrong 11.4 Backtesting Is Not a Research Tool 11.5 A Few General Recommendations 11.6 Strategy Selection Exercises References Bibliography Chapter 12 Backtesting through Cross-Validation 12.1 Motivation 12.2 The Walk-Forward Method 12.2.1 Pitfalls of the Walk-Forward Method 12.3 The Cross-Validation Method 12.4 The Combinatorial Purged Cross-Validation Method 12.4.1 Combinatorial Splits 12.4.2 The Combinatorial Purged Cross-Validation Backtesting Algorithm 12.4.3 A Few Examples 12.5 How Combinatorial Purged Cross-Validation Addresses Backtest Overfitting Exercises References Chapter 13 Backtesting on Synthetic Data 13.1 Motivation 13.2 Trading Rules 13.3 The Problem 13.4 Our Framework 13.5 Numerical Determination of Optimal Trading Rules 13.5.1 The Algorithm 13.5.2 Implementation 13.6 Experimental Results 13.6.1 Cases with Zero Long-Run Equilibrium 13.6.2 Cases with Positive Long-Run Equilibrium 13.6.3 Cases with Negative Long-Run Equilibrium 13.7 Conclusion Exercises References Chapter 14 Backtest Statistics 14.1 Motivation 14.2 Types of Backtest Statistics 14.3 General Characteristics 14.4 Performance 14.4.1 Time-Weighted Rate of Return 14.5 Runs 14.5.1 Returns Concentration 14.5.2 Drawdown and Time under Water 14.5.3 Runs Statistics for Performance Evaluation 14.6 Implementation Shortfall 14.7 Efficiency 14.7.1 The Sharpe Ratio 14.7.2 The Probabilistic Sharpe Ratio 14.7.3 The Deflated Sharpe Ratio 14.7.4 Efficiency Statistics 14.8 Classification Scores 14.9 Attribution Exercises References Bibliography Chapter 15 Understanding Strategy Risk 15.1 Motivation 15.2 Symmetric Payouts 15.3 Asymmetric Payouts 15.4 The Probability of Strategy Failure 15.4.1 Algorithm 15.4.2 Implementation Exercises References Chapter 16 Machine Learning Asset Allocation 16.1 Motivation 16.2 The Problem with Convex Portfolio Optimization 16.3 Markowitz’s Curse 16.4 From Geometric to Hierarchical Relationships 16.4.1 Tree Clustering 16.4.2 Quasi-Diagonalization 16.4.3 Recursive Bisection 16.5 A Numerical Example 16.6 Out-of-Sample Monte Carlo Simulations 16.7 Further Research 16.8 Conclusion Appendices 16.A.1 Correlation-based Metric 16.A.2 Inverse Variance Allocation 16.A.3 Reproducing the Numerical Example 16.A.4 Reproducing the Monte Carlo Experiment Exercises References Part 4 Useful Financial Features Chapter 17 Structural Breaks 17.1 Motivation 17.2 Types of Structural Break Tests 17.3 CUSUM Tests 17.3.1 Brown-Durbin-Evans CUSUM Test on Recursive Residuals 17.3.2 Chu-Stinchcombe-White CUSUM Test on Levels 17.4 Explosiveness Tests 17.4.1 Chow-Type Dickey-Fuller Test 17.4.2 Supremum Augmented Dickey-Fuller 17.4.3 Sub- and Super-Martingale Tests Exercises References Chapter 18 Entropy Features 18.1 Motivation 18.2 Shannon's Entropy 18.3 The Plug-in (or Maximum Likelihood) Estimator 18.4 Lempel-Ziv Estimators 18.5 Encoding Schemes 18.5.1 Binary Encoding 18.5.2 Quantile Encoding 18.5.3 Sigma Encoding 18.6 Entropy of a Gaussian Process 18.7 Entropy and the Generalized Mean 18.8 A Few Financial Applications of Entropy 18.8.1 Market Efficiency 18.8.2 Maximum Entropy Generation 18.8.3 Portfolio Concentration 18.8.4 Market Microstructure Exercises References Bibliography Chapter 19 Microstructural Features 19.1 Motivation 19.2 Review of the Literature 19.3 First Generation: Price Sequences 19.3.1 The Tick Rule 19.3.2 The Roll Model 19.3.3 High-Low Volatility Estimator 19.3.4 Corwin and Schultz 19.4 Second Generation: Strategic Trade Models 19.4.1 Kyle's Lambda 19.4.2 Amihud's Lambda 19.4.3 Hasbrouck's Lambda 19.5 Third Generation: Sequential Trade Models 19.5.1 Probability of Information-based Trading 19.5.2 Volume-Synchronized Probability of Informed Trading 19.6 Additional Features from Microstructural Datasets 19.6.1 Distibution of Order Sizes 19.6.2 Cancellation Rates, Limit Orders, Market Orders 19.6.3 Time-Weighted Average Price Execution Algorithms 19.6.4 Options Markets 19.6.5 Serial Correlation of Signed Order Flow 19.7 What Is Microstructural Information? Exercises References Part 5 High-Performance Computing Recipes Chapter 20 Multiprocessing and Vectorization 20.1 Motivation 20.2 Vectorization Example 20.3 Single-Thread vs. Multithreading vs. Multiprocessing 20.4 Atoms and Molecules 20.4.1 Linear Partitions 20.4.2 Two-Nested Loops Partitions 20.5 Multiprocessing Engines 20.5.1 Preparing the Jobs 20.5.2 Asynchronous Calls 20.5.3 Unwrapping the Callback 20.5.4 Pickle/Unpickle Objects 20.5.5 Output Reduction 20.6 Multiprocessing Example Exercises Reference Bibliography Chapter 21 Brute Force and Quantum Computers 21.1 Motivation 21.2 Combinatorial Optimization 21.3 The Objective Function 21.4 The Problem 21.5 An Integer Optimization Approach 21.5.1 Pigeonhole Partitions 21.5.2 Feasible Static Solutions 21.5.3 Evaluating Trajectories 21.6 A Numerical Example 21.6.1 Random Matrices 21.6.2 Static Solution 21.6.3 Dynamic Solution Exercises References Chapter 22 High-Performance Computational Intelligence and Forecasting Technologies 22.1 Motivation 22.2 Regulatory Response to the Flash Crash of 2010 22.3 Background 22.4 HPC Hardware 22.5 HPC Software 22.5.1 Message Passing Interface 22.5.2 Hierarchical Data Format 5 22.5.3 In Situ Processing 22.5.4 Convergence 22.6 Use Cases 22.6.1 Supernova Hunting 22.6.2 Blobs in Fusion Plasma 22.6.3 Intraday Peak Electricity Usage 22.6.4 The Flash Crash of 2010 22.6.5 Volume-synchronized Probability of Informed Trading Calibration 22.6.6 Revealing High Frequency Events with Non-uniform Fast Fourier Transform 22.7 Summary and Call for Participation 22.8 Acknowledgments References Index EULA Intro -- Advances in Financial Machine Learning -- Contents -- About the Author -- Preamble -- 1 Financial Machine Learning as a Distinct Subject -- 1.1 Motivation -- 1.2 The Main Reason Financial Machine Learning Projects Usually Fail -- 1.2.1 The Sisyphus Paradigm -- 1.2.2 The Meta-Strategy Paradigm -- 1.3 Book Structure -- 1.3.1 Structure by Production Chain -- 1.3.2 Structure by Strategy Component -- 1.3.3 Structure by Common Pitfall -- 1.4 Target Audience -- 1.5 Requisites -- 1.6 FAQs -- 1.7 Acknowledgments -- Exercises -- References -- Bibliography -- PART 1 Data Analysis -- 2 Financial Data Structures -- 2.1 Motivation -- 2.2 Essential Types of Financial Data -- 2.2.1 Fundamental Data -- 2.2.2 Market Data -- 2.2.3 Analytics -- 2.2.4 Alternative Data -- 2.3 Bars -- 2.3.1 Standard Bars -- 2.3.2 Information-Driven Bars -- 2.4 Dealing with Multi-Product Series -- 2.4.1 The ETF Trick -- 2.4.2 PCA Weights -- 2.4.3 Single Future Roll -- 2.5 Sampling Features -- 2.5.1 Sampling for Reduction -- 2.5.2 Event-Based Sampling -- Exercises -- References -- 3 Labeling -- 3.1 Motivation -- 3.2 The Fixed-Time Horizon Method -- 3.3 Computing Dynamic Thresholds -- 3.4 The Triple-Barrier Method -- 3.5 Learning Side and Size -- 3.6 Meta-Labeling -- 3.7 How to Use Meta-Labeling -- 3.8 The Quantamental Way -- 3.9 Dropping Unnecessary Labels -- Exercises -- Bibliography -- 4 Sample Weights -- 4.1 Motivation -- 4.2 Overlapping Outcomes -- 4.3 Number of Concurrent Labels -- 4.4 Average Uniqueness of a Label -- 4.5 Bagging Classifiers and Uniqueness -- 4.5.1 Sequential Bootstrap -- 4.5.2 Implementation of Sequential Bootstrap -- 4.5.3 A Numerical Example -- 4.5.4 Monte Carlo Experiments -- 4.6 Return Attribution -- 4.7 Time Decay -- 4.8 Class Weights -- Exercises -- References -- Bibliography -- 5 Fractionally Differentiated Features -- 5.1 Motivation Machine Learning (ml) Is Changing Virtually Every Aspect Of Our Lives. Today Ml Algorithms Accomplish Tasks That Until Recently Only Expert Humans Could Perform. As It Relates To Finance, This Is The Most Exciting Time To Adopt A Disruptive Technology That Will Transform How Everyone Invests For Generations. Readers Will Learn How To Structure Big Data In A Way That Is Amenable To Ml Algorithms; How To Conduct Research With Ml Algorithms On That Data; How To Use Supercomputing Methods; How To Backtest Your Discoveries While Avoiding False Positives. The Book Addresses Real-life Problems Faced By Practitioners On A Daily Basis, And Explains Scientifically Sound Solutions Using Math, Supported By Code And Examples. Readers Become Active Users Who Can Test The Proposed Solutions In Their Particular Setting. Written By A Recognized Expert And Portfolio Manager, This Book Will Equip Investment Professionals With The Groundbreaking Tools Needed To Succeed In Modern Finance--,this Book Begins By Structuring Financial Data In A Way That Is Amenable To Machine Learning (ml) Algorithms. Then, The Author Discusses How To Conduct Research With Ml Algorithms On That Data And How To Backtest Your Discoveries. Most Of The Problems And Solutions Are Explained Using Math, Supported By Code. This Makes The Book Very Practical And Hands-on. Readers Become Active Users Who Can Test The Solutions Proposed In Their Work. Readers Will Learn How To Structure, Label, Weight, And Backtest Data. Machine Learning Is The Future, And This Book Will Equip Investment Professionals With The Tools To Utilize It Moving Forward-- Learn to understand and implement the latest machine learning innovations to improve your investment performance Machine learning (ML) is changing virtually every aspect of our lives. Today, ML algorithms accomplish tasks that – until recently – only expert humans could perform. And finance is ripe for disruptive innovations that will transform how the following generations understand money and invest. In the book, readers will learn how to: Structure big data in a way that is amenable to ML algorithms Conduct research with ML algorithms on big data Use supercomputing methods and back test their discoveries while avoiding false positives Advances in Financial Machine Learning addresses real life problems faced by practitioners every day, and explains scientifically sound solutions using math, supported by code and examples. Readers become active users who can test the proposed solutions in their individual setting. Written by a recognized expert and portfolio manager, this book will equip investment professionals with the groundbreaking tools needed to succeed in modern finance. "Machine learning (ML) is changing virtually every aspect of our lives. Today ML algorithms accomplish tasks that until recently only expert humans could perform. As it relates to finance, this is the most exciting time to adopt a disruptive technology that will transform how everyone invests for generations. Readers will learn how to structure Big data in a way that is amenable to ML algorithms; how to conduct research with ML algorithms on that data; how to use supercomputing methods; how to backtest your discoveries while avoiding false positives. The book addresses real-life problems faced by practitioners on a daily basis, and explains scientifically sound solutions using math, supported by code and examples. Readers become active users who can test the proposed solutions in their particular setting. Written by a recognized expert and portfolio manager, this book will equip investment professionals with the groundbreaking tools needed to succeed in modern finance"-- Provided by publisher
Machine learning (ML) is changing virtually every aspect of our lives. Today ML algorithms accomplish tasks that until recently only expert humans could perform. As it relates to finance, this is the most exciting time to adopt a disruptive technology that will transform how everyone invests for generations. Readers will learn how to structure Big data in a way that is amenable to ML algorithms; how to conduct research with ML algorithms on that data; how to use supercomputing methods; how to backtest your discoveries while avoiding false positives. The book addresses real-life problems faced by practitioners on a daily basis, and explains scientifically sound solutions using math, supported by code and examples. Readers become active users who can test the proposed solutions in their particular setting. Written by a recognized expert and portfolio manager, this book will equip investment professionals with the groundbreaking tools needed to succeed in modern finance.
"This book begins by structuring financial data in a way that is amenable to machine learning (ML) algorithms. Then, the author discusses how to conduct research with ML algorithms on that data and how to backtest your discoveries. Most of the problems and solutions are explained using math, supported by code. This makes the book very practical and hands-on. Readers become active users who can test the solutions proposed in their work. Readers will learn how to structure, label, weight, and backtest data. Machine learning is the future, and this book will equip investment professionals with the tools to utilize it moving forward"-- Provided by publisher This book constitutes the thoroughly refereed post-workshop proceedings of the 6th International Workshop on Big Data Benchmarking, WBDB 2015, held in Toronto, ON, Canada, in June 2015 and the 7th International Workshop, WBDB 2015, held in New Delhi, India, in December 2015. The 8 full papers presented in this book were carefully reviewed and selected from 22 submissions. They deal with recent trends in big data and HPC convergence, new proposals for big data benchmarking, as well as tooling and performance results.