![]() How do we work productively in this data, whilst still ensuring that we fill in any gaps in our long data with NAs? We usually work with long or “tidy” data – where each observation is an observation for a stock for a given day. If you work with “wide” matrix-like data, these challenges are obvious because you have one row for every date in your data set, and the columns represent an observation for each ticker. ![]() you can easily sense check the size of your data to have trading_days * number_of_stocks rows.any grouped aggregations or rolling window aggregations will be operating on the date range for every ticker.If you’re working with a universe of 1,000 stocks life is a lot easier if you have an observation for each stock for each trading date, regardless of whether it actually traded that day. Quant analysis gets very hard if you have missing or misaligned data. One significant challenge is gaps in data. The challenges are well understood, but dealing with them is not always straightforward. Companies grow and shrink: the “top 100 stocks by market cap” in 1990 looks very different to the same group in 2020 “growth stocks” in 1990 look very different to “growth stocks” in 2020 etc. ![]() Stocks can be suspended or halted for a period of time, leading to trading gaps. ![]() Stocks are delisted, and many datasets do not include the price history of delisted stocks.New stocks are listed all the time – you won’t have as much history for these stocks as for other stocks.Stocks are subject to splits and other corporate actions which also have to be accounted for.Stocks pay dividends and other distributions that have to be accounted for.When you’re working with large universes of stock data you’ll come across a lot of challenges: ![]()
0 Comments
Leave a Reply. |