Central banks are paying more and more attention to big data and related emerging information technologies, such as artificial intelligence (AI) and machine learning (ML), etc. This has been emphasized in a survey conducted by the International Financial Corporation (hereinafter referred to as IFC) in 2015. More noteworthy are the three key developments in the field of big data since 2015: (1) Big data sets are becoming more and more available; (2) New technologies for processing big data in practical ways have emerged; (3) Central banks have been actively establish the IT infrastructure needed to cope with the "big data era"- the new big data platform can promote the storage and processing of very large data sets, and High-Performance Computing (HPC) can achieve faster data processing and in-depth statistical analysis and complex data simulation.
In 2020, IFC decided to update its 2015 survey on the use and interest of big data by central banks. Its purpose is to review the experience and development of the central bank in the use of big data and related emerging technologies.
Definition of big data by the central bank
Big data is usually defined by the so-called three "V"s, namely high volume (such as the number of records), velocity (the speed at which the data is generated), and variety (such as the structure and format of the data set). However, the actual situation is more complicated. Big data can contain information generated by multiple programs, such as social media, web-based activities, machine sensors or financial, administrative or business operations. About one-third of central banks believe that the concept of big data includes only non-traditional data. However, for two-thirds of the central banks surveyed, it should also include "traditional" large data sets. For example, data collected for administrative or regulatory/supervisory purposes is often referred to as "financial big data."
The complete definition of big data covers all types of data sets for non-standard analysis. The reason is that when applied to large data sets, traditional statistical methods such as descriptive analysis, inductive statistics (such as econometrics) or non-parametric analysis have certain limitations. It is more difficult when dealing with unstructured data (such as storing them as text or images). Analyzing big data requires extracting information that can be converted into structured data, such as using natural language processing algorithms to digitally process human language.
The central bank can distinguish several different types of big data:
The first type of big data is unstructured data sets, such as text messages (such as social media information), images captured from the Internet, and information sent by sensors or other connected devices. This type of information cannot be easily managed by traditional statistical techniques. The following three examples can be used to illustrate this type. The first example is the "travel report", which can provide overall commuting trends obtained through GPS, and can record the entry of residents into workplaces and entertainment venues during the new crown virus epidemic. The second example involves Internet searches, such as Google Trends. It can be used to clarify certain economic factors, such as expectations for labor market dynamics. The third example is printed text, such as newspaper articles, corporate financial statements, official press releases, etc.
The second type of big data is related to a large amount of time series observation data, including large structured financial big data sets. More than three-quarters of the central banks surveyed also highlighted two other specific categories of structured data sets. The first type has not been defined as traditional statistical data by the central bank in the past. Payment transactions are a good example because these data are mainly collected for market monitoring purposes. In recent years, central banks are trying to use them more effectively for economic analysis. The second specific category involves cross-sectional data sets, which provide observation data of the entire target population and provide multiple aspects of information at the same point in time.
The central bank's application of big data
Compared to 2015, the current central bank is more dependent on big data. Currently, more than 80% of surveyed central banks use big data to support their work, up from 30% five years ago. Of course, half of the respondents only use it for exploratory purposes, such as conducting pilot projects. Central banks in developed economies use big data more frequently, and almost half of them make decisions based on big data. One in five central banks surveyed in emerging market economies does not use big data at all.
(1) Big data supporting the main functions of the central bank
Central banks will obtain big data through various channels, which is consistent with their comprehensive understanding of the concept of big data. The first type of big data mainly includes unstructured data sets. Many central banks use natural language processing techniques to extract text from newspapers, such as quantifying qualitative factors (for example, emotional influence and uncertainty in economic development), or using Internet-based information (for example, search queries). The second category mainly includes financial big data sets, such as credit information system data or payment data collected based on transaction processing levels.
(2) Four major big data applications
The central bank's use of big data mainly involves four applications: natural language processing technology, nowcasting models, programs that can extract economic information from sophisticated financial big data sets, and regulatory technology (Suptech and Regtech).
The first is natural language processing technology, which is mainly used to process text information. Usually, its purpose is to collect qualitative text-based information and quantitatively summarize it. For example, by calculating the economic policy uncertainty (EPU) index to assess the degree of uncertainty faced by economic entities.
The second is the nowcasting model -"real-time" high-frequency analysis of economic conditions. As many as one-third of interviewed central banks mentioned that using big data for this purpose can provide inflation and economic growth estimates faster and more frequently. In addition, nowcasting models can help fill in statistical gaps. The real estate market during the COVID-19 pandemic is a typical example. This is because there is usually a lack of official data during the epidemic, and housing prices can be obtained from the Internet relatively easily.
The third category includes various applications developed by the central bank. They are usually used to extract information about the entire economic field from a collection of fine financial big data, and their main purpose is to support macro-stability policies. For example, credit information system data can be used for detailed credit evaluation. These data have played an important role in the function of the financial system.
The fourth type is regulatory technology that supports micro-regulatory strategies. Usually, part of the procedure is concentrated in the field of micro-risk assessment. For example, company information collected from financial statements or newspapers can be used for early warning campaigns or to enhance credit scores.
The benefits of international cooperation between central banks in the use of big data
The survey report shows that central banks are willing to cooperate to obtain greater benefits from the use of big data. In fact, half of the central banks interviewed stated that they were interested in cooperating on one or more specific projects and envisaged three types of cooperation.
First, share knowledge among specialized agencies. The topics involved can be very diverse, including big data technologies (such as data visualization, network analysis, machine learning tools), information management issues (such as source code development, data sharing protocols, encryption and anonymization technologies using confidential data), and some specific applications of central banks (for example, in the field of regulatory technology).
Second, use big data to solve global problems such as international spillovers, global value chains, and cross-border payments. The resolution of such problems depends on full international cooperation, such as sharing information between countries.
Third, through the development of joint research projects to benefit from economies of scale, and limited sharing of financial and human resources.
The central bank's cooperation has the following advantages. One is that most central banks face similar problems when building big data platforms, managing human capital, or developing correct algorithms. Therefore, the exchange of opinions between central banks can help them determine best practices, which is especially useful for central banks that are still in the early stages of the big data "journey" and must make strategic decisions. The second is that early users of big data can formulate technical assistance plans based on their own experience. In fact, more and more central banks have adopted such plans as part of their international promotion. The third is that cooperation can promote cross-border data sharing, thereby enhancing data availability to support policy formulation.
International financial institutions can strongly support these cooperation methods. They can promote technological innovation by providing technical solutions that coordinate data standards and processes between regions, and they can also share knowledge through experience and data from routine meetings or pilot projects. Specifically, the Bank for International Settlements Innovation Center has been established to understand the development trend of financial technology related to the central bank and explore the future development of public products, and use it as a network point of contact for central bank technology developers. This innovation center can undoubtedly play an important role in promoting international cooperation to make better use of big data and its technology.