Category: Tools and Data

S&P 500 Without Magnificent 7: A Rebalanced Index Experiment
Introduction
What happens to the S&P 500 without Magnificent 7 stocks if those companies are removed and the remaining constituents are rebalanced back to 100%?
Since the start of 2023, the S&P 500 has delivered a very strong price return. But a large part of that performance has been driven by a small group of mega-cap technology and growth companies: Nvidia (NVDA), Apple (AAPL), Amazon (AMZN), Microsoft (MSFT), Alphabet Class A (GOOGL), Alphabet Class C (GOOG), Meta Platforms (META) and Tesla (TSLA).
In this post, I use open-spx to create a rebalanced synthetic version of the S&P 500 without Magnificent 7 stocks. The goal is educational: to better understand how much these companies contributed, how concentrated the index became, and how the rest of the market performed when treated as a fully invested portfolio.
This is not investment advice, and it is not a recommendation to buy or sell any stock, index product or ETF. The analysis is based on price returns only, not total returns, so dividends are not included.
The setup
The previous post on this site explained how open-spx can be used to estimate S&P 500 return contributions from constituent-level prices and inferred weights. In this post, I extend that idea by creating a synthetic index.
The S&P 500 without Magnificent 7 experiment is a counterfactual index construction exercise. It does not ask whether the excluded companies are good or bad investments. Instead, it asks a narrower question: how would the S&P 500 have behaved if those stocks were removed and the remaining constituents were scaled back to a full 100% allocation?
The workflow is:
1. Run a regular S&P 500 replication.
2. Estimate the constituent weights through time.
3. Remove the excluded tickers.
4. Rebalance the remaining weights back to 100%.
5. Compute a new synthetic price-return series.
6. Reuse the same contribution analysis on the residual index.
For this experiment, the excluded tickers are:
- NVDA — Nvidia
- AAPL — Apple
- AMZN — Amazon
- MSFT — Microsoft
- GOOGL — Alphabet Class A
- GOOG — Alphabet Class C
- META — Meta Platforms
- TSLA — Tesla
There are eight tickers because Alphabet appears through two share classes: Alphabet Class A (GOOGL) and Alphabet Class C (GOOG). In the charts these are shown separately.
The displayed period is 2023-01-04 to 2026-06-15, using the first available trading observation in the dataset.
A simplified version of the command is:
```
open-spx --start 2023-01-01 \
  --end 2026-06-15 \
  --exclude-tickers NVDA,AAPL,AMZN,MSFT,GOOGL,GOOG,META,TSLA \
  --synthetic-name ex_mag7
```
The requested start date is 2023-01-01, but the first available trading date in this run is 2023-01-04.
How the synthetic ex-Magnificent 7 index is constructed
The construction is straightforward.
At every point in time, open-spx starts with the inferred S&P 500 constituent weights. It then removes the excluded tickers and rescales the remaining weights so that the residual universe sums to 100%.
The formula is:
```
new weight = original weight / sum of remaining weights
```
For example, if the Magnificent 7 represent 36% of the index on a given date, the remaining stocks represent 64%. The residual stocks are then scaled by:
```
1 / 0.64 = 1.56x
```
This is the key mechanical detail. The synthetic index is not the S&P 500 minus the Magnificent 7 with the removed weight held in cash. Instead, the removed weight is reallocated to the remaining S&P 500 constituents.
That distinction is essential for interpreting the results.
S&P 500 without Magnificent 7 versus the full index
The S&P 500 price index and replicated S&P 500 both returned about 96% from 2023-01-04 to 2026-06-15. The rebalanced ex-Magnificent 7 synthetic index returned about 61%.
The first chart compares three series, all normalized to 100 at the start:
- the S&P 500 price index,
- the replicated S&P 500,
- and the rebalanced ex-Magnificent 7 synthetic index.
From 2023-01-04 to 2026-06-15, the S&P 500 price index increased by about 96%. The replicated S&P 500 closely tracks it, also ending around 96%.
The ex-Magnificent 7 synthetic index returned about 61%.
That is a lower return, but it is still a strong positive result. The main message is not that the Magnificent 7 did not matter. They clearly did. The message is that the rest of the S&P 500 still delivered meaningful returns when treated as a fully invested portfolio.
In other words, the S&P 500 without Magnificent 7 lagged the full index, but it did not collapse. The residual market still compounded strongly.
How much did the Magnificent 7 contribute to the S&P 500?
The excluded Magnificent 7 tickers contributed about 37 percentage points to the S&P 500 price return. Nvidia was the largest contributor, followed by Apple, Amazon, Microsoft, Alphabet Class A, Alphabet Class C, Meta Platforms, and Tesla.
The excluded Magnificent 7 tickers contributed about 37 percentage points to the S&P 500 price return over this period.
Nvidia (NVDA) was the largest contributor, adding about 11 percentage points. Apple (AAPL) contributed about 6 percentage points. Amazon (AMZN) and Microsoft (MSFT) each contributed about 4 percentage points. Alphabet Class A (GOOGL), Alphabet Class C (GOOG), Meta Platforms (META) and Tesla (TSLA) each contributed about 3 percentage points when rounded to whole percentages.
Together, these excluded tickers contributed about 37 percentage points. Relative to the S&P 500’s approximately 96% price return, that means the Magnificent 7 represented roughly 39% of the index return in this run.
That is a large number. It confirms that recent S&P 500 performance was highly dependent on a small group of mega-cap companies.
It also explains why the S&P 500 without Magnificent 7 performed differently from the original index. The excluded companies represented a large share of total return contribution.
Why the S&P 500 without Magnificent 7 is not just subtraction
Removing the Magnificent 7 reduced the return by about 35 percentage points, less than their raw contribution of about 37 percentage points, because the removed weights were reallocated to the remaining S&P 500 stocks.
The bridge chart shows the central result of the experiment:
- S&P 500 cumulative price return: about 96%
- Ex-Magnificent 7 synthetic return: about 61%
- Raw excluded contribution: about 37 percentage points
- Return reduction from exclusion: about 35 percentage points
At first glance, this might look surprising. If the Magnificent 7 contributed 37 percentage points, why does the synthetic index lag by 35 percentage points?
The answer is rebalancing.
This is the most important mechanical detail in the S&P 500 without Magnificent 7 analysis. The excluded weight is not moved to cash. It is reallocated across the residual S&P 500 universe.
When the Magnificent 7 are removed, their weight is redistributed across the remaining constituents. Those companies receive a larger portfolio weight than they had in the original capitalization-weighted index.
So the ex-Magnificent 7 index is not simply:
```
S&P 500 return - Magnificent 7 contribution
```
Instead, it is:
```
S&P 500 without Magnificent 7, with the remaining stocks scaled back to 100%
```
That is why the return reduction is smaller than the raw contribution of the excluded stocks.
The contribution did not arrive in a straight line
The Magnificent 7 contribution did not arrive in a straight line. Total contribution rose strongly from 2023 through 2026, with Nvidia standing out as the largest individual contributor.
The Magnificent 7 contribution was not smooth.
The cumulative contribution rose strongly in 2023 and 2024, pulled back at several points, and then continued to rise into 2025 and 2026. Nvidia (NVDA) stands out as the largest individual contributor, while the other names contributed more gradually.
This matters because concentration risk is not only about the final number. It is also about the path.
A portfolio that becomes increasingly dependent on a narrow set of companies can perform very well when those companies lead. But it may also become more vulnerable if leadership reverses.
Concentration increased over time
The inferred Magnificent 7 weight increased from roughly one-fifth of the index to around one-third of the replicated S&P 500, highlighting the concentration risk inside a market-cap-weighted index.
The inferred combined weight of the excluded Magnificent 7 tickers increased significantly over the period.
At the start of the sample, the excluded group represented roughly one-fifth of the replicated S&P 500. By later in the period, it represented roughly one-third of the index. At some points, the excluded group represented more than one-third of the replicated S&P 500.
This is the core concentration-risk issue.
The S&P 500 contains hundreds of companies, but it is market-cap weighted. When a small group of companies becomes very large, a broad-market index can become increasingly exposed to the same few names.
That does not make those companies bad investments. Nvidia (NVDA), Apple (AAPL), Amazon (AMZN), Microsoft (MSFT), Alphabet (GOOGL and GOOG), Meta Platforms (META) and Tesla (TSLA) are major companies with major economic importance. But it does mean that investors using the S&P 500 as broad-market exposure should understand how concentrated that exposure can become.
A simple rebalance example
After the excluded Magnificent 7 weight is removed, the residual S&P 500 universe is scaled back to 100%. In this example, the remaining weights are multiplied by about 1.55x.
The rebalance mechanics chart shows a point-in-time example.
On the selected date, the excluded Magnificent 7 tickers represented about 36% of the index. The residual universe represented about 64%. After removing the excluded tickers, the remaining weights were scaled back to 100%.
In this example, the residual weights were multiplied by about 1.55x.
This is the mechanical reason why the ex-Magnificent 7 index can still compound strongly. The remaining companies are not left at their original reduced weights. They become the whole portfolio.
Relative performance: when did the ex-Magnificent 7 index lag?
The ex-Magnificent 7 synthetic index lagged the S&P 500 over the full period, but relative performance varied across shorter rolling windows.
The ex-Magnificent 7 synthetic index lagged the S&P 500 over the full period, but relative performance varied across shorter windows.
The cumulative excess return line was mostly negative, meaning the synthetic index generally trailed the original S&P 500. However, the rolling return differences show that this underperformance was not constant.
There were periods where the ex-Magnificent 7 index performed closer to the S&P 500, and some shorter windows where it performed better.
This is a useful reminder that market leadership changes over time. The Magnificent 7 dominated much of the period, but not every month and not every quarter looked the same.
Did removing the Magnificent 7 reduce drawdowns?
The ex-Magnificent 7 index reduces mega-cap concentration, but its drawdowns were not always smaller than the original S&P 500 over this period.
A common assumption is that reducing concentration should automatically reduce risk. The drawdown comparison is more nuanced.
The ex-Magnificent 7 synthetic index does reduce exposure to mega-cap technology and growth stocks. However, its realized drawdowns were not always smaller than the original S&P 500 over this sample. In some periods, the ex-Magnificent 7 index had comparable or even larger drawdowns.
This is important for interpretation.
Lower concentration does not guarantee lower realized volatility or smaller drawdowns in every market environment. It reduces one type of risk: dependence on a small group of mega-cap companies. But the residual universe has its own risks.
A less concentrated portfolio can still decline.
What drove the residual index?
After removing the Magnificent 7 and rebalancing the residual index, Broadcom, Micron Technology, Eli Lilly, Advanced Micro Devices, and Walmart were among the largest contributors.
After removing and rebalancing away from the Magnificent 7 the top residual contributors were led by Broadcom (AVGO), Micron Technology (MU), Eli Lilly (LLY), Advanced Micro Devices (AMD) and Walmart (WMT).
Broadcom (AVGO) was the largest residual contributor, followed by Micron Technology (MU). Eli Lilly (LLY) and Advanced Micro Devices (AMD) also made large positive contributions. Walmart (WMT), JPMorgan Chase (JPM), Intel (INTC), RTX (RTX), Oracle (ORCL) and Lam Research (LRCX) were also among the important contributors.
This is one of the more interesting parts of the experiment.
When the Magnificent 7 are removed, the residual S&P 500 is not empty. Other large companies and sectors still contribute meaningfully.
The ex-Magnificent 7 result is therefore not a “no growth” portfolio. It still includes semiconductor companies, healthcare companies, financials, industrials, retailers and other large businesses.
Which companies detracted?
The largest detractors in the residual index included Pfizer, Marsh & McLennan, UnitedHealth Group, Nike, Moderna, United Parcel Service, PepsiCo, Estée Lauder, Bristol Myers Squibb, and MSCI.
The largest residual detractors included Pfizer (PFE), Marsh & McLennan (MMC), UnitedHealth Group (UNH), Nike (NKE), Moderna (MRNA), United Parcel Service (UPS), PepsiCo (PEP), Estée Lauder (EL), Bristol Myers Squibb (BMY) and MSCI (MSCI).
These detractors were much smaller in absolute contribution than the largest positive contributors. For example, the largest negative contribution in the residual index was roughly half a percentage point.
This asymmetry is useful. The residual index’s positive return was not driven by an absence of losers. There were still detractors. But the positive contributors outweighed them.
What this means for investors
The experiment suggests three main takeaways.
First, the Magnificent 7 mattered enormously. Over this period, Nvidia (NVDA), Apple (AAPL), Amazon (AMZN), Microsoft (MSFT), Alphabet Class A (GOOGL), Alphabet Class C (GOOG), Meta Platforms (META) and Tesla (TSLA) together contributed about 37 percentage points to the S&P 500.
Second, the rest of the market still performed well. After removing those tickers and rebalancing the remaining S&P 500 constituents, the synthetic index returned about 61%.
Third, concentration risk is real, but it should be described carefully. Removing the Magnificent 7 would have reduced exposure to the dominant mega-cap stocks, but it would also have reduced returns over this period.
The benefit is not that the ex-Magnificent 7 index was better in hindsight. It was not. The benefit is that it was less dependent on a small set of companies.
That may be valuable in scenarios where:
- market leadership broadens,
- mega-cap valuations compress,
- technology leadership weakens,
- or investors want less dependence on the same few names.
But it may hurt in scenarios where the Magnificent 7 continue to dominate index returns.
This is why the experiment is best understood as an educational concentration-risk study, not as a trading recommendation.
Limitations
There are several important limitations.
First, this is a price-return analysis. Dividends are not included.
Second, the weights are inferred. They are not official S&P Dow Jones Indices constituent weights.
Third, the analysis depends on data quality, ticker mapping, corporate actions and the available constituent history.
Fourth, this is a counterfactual index experiment. It does not include trading costs, taxes, slippage, liquidity constraints or product implementation details.
Fifth, the result depends on the chosen date range. A different starting point or ending point could produce different conclusions.
Finally, Alphabet Class A (GOOGL) and Alphabet Class C (GOOG) are treated separately because they are separate listed tickers in the dataset, even though they represent the same company.
Conclusion
From 2023-01-04 through 2026-06-15, the S&P 500 price index returned about 96% in this run. The replicated S&P 500 closely matched that result. The Magnificent 7 contributed about 37 percentage points, or roughly 39% of the index return.
When those tickers were removed and the remaining S&P 500 constituents were rebalanced back to 100%, the synthetic ex-Magnificent 7 index still returned about 61%.
That is the key result.
The Magnificent 7 were extremely important. But the rest of the S&P 500 was not irrelevant. A rebalanced residual universe still produced a strong return.
The S&P 500 without Magnificent 7 is therefore best understood as a concentration-risk experiment. It shows how much the index depended on a small group of companies, while also showing that the residual market still delivered meaningful returns.
For investors, the lesson is not that one version is obviously better than the other. The lesson is that index construction matters. A market-cap-weighted index can become highly concentrated, and understanding that concentration is essential when interpreting broad-market returns.
Related links
The S&P 500 is maintained by S&P Dow Jones Indices.
This experiment uses price-index style analysis rather than official total-return index methodology. More information about S&P Dow Jones Indices methodology is available from S&P Dow Jones Indices methodology resources.
The code used for this experiment is available in the open-spx project on GitHub.
06/18/2026
S&P 500 Return Contribution Analysis: Which Stocks Are Really Driving the Index?
S&P 500 return contribution analysis helps answer a simple question: which stocks are actually driving the index?
The S&P 500 is usually discussed as one number. The index is up, the index is down, the market rallied, or the market sold off. But in a market where mega-cap companies account for a growing share of index weight, that single number can hide what is happening underneath the surface.
That is why S&P 500 concentration has become such a popular topic. Visual Capitalist’s “The Entire S&P 500 in 2026 in One Chart” makes the concentration visible, showing that just 13 companies make up over 40% of the S&P 500. Slickcharts’ S&P 500 companies by weight page shows the same issue from another angle by listing current S&P 500 constituents and their weights.
Those resources are useful for seeing what the index looks like today. But current constituent weights do not fully answer the historical performance question:
Which individual stocks actually contributed to S&P 500 price-index returns through time?
That is the gap open-spx is designed to explore. open-spx is open Python tooling for approximate, bottom-up S&P 500 price-index replication and constituent-level contribution analysis using local CSV inputs.
Example open-spx output comparing the S&P 500 price index with an approximate bottom-up replicated SPX series.
Why S&P 500 Concentration Is Everywhere Right Now
The S&P 500 contains around 500 companies, but the index is not equally weighted. The largest companies matter far more than the smallest companies.
When Nvidia, Apple, Microsoft, Amazon, Alphabet, Meta, Broadcom, Tesla, Berkshire Hathaway, or JPMorgan move, the index feels it much more than when a smaller constituent moves.
That is not a bug. It is how a market-cap-weighted index works.
But it does mean that “owning the S&P 500” is not the same thing as owning 500 companies in equal proportion.
This is why so much market commentary now focuses on concentration risk, narrow market leadership, and the role of the Magnificent Seven.
The important follow-up question is not only whether the index is concentrated.
The better question is:
How much did each stock actually contribute to the index return?
The LinkedIn Conversation: S&P 500 Concentration Is Already Mainstream
The concentration debate is not abstract. It is already showing up across LinkedIn finance commentary, advisor posts, asset-management discussions, and market research threads.
A few examples:
- Morningstar: S&P 500 concentration, risks from tech stocks and inflation
  Morningstar frames the issue around the top 10 stocks making up a very large share of the S&P 500 and the risk of technology-heavy leadership moving together.
- James Eagle: How 7 stocks dominate the S&P 500
  This post focuses on the Magnificent Seven and how their share of the index has grown over time.
- Mohamed El-Erian: MAG 7 concentration in the S&P 500
  El-Erian frames the market question around whether Mag 7 concentration reflects AI-driven growth, a bubble, or a broader transition.
- Wayne Ewan / Capital Group: S&P 500 concentration in the top 10 companies remains high
  This post connects top-10 concentration with the question of passive versus active U.S. equity exposure.
- Dominic Pappalardo: High concentration of top 10 companies in the S&P 500 Index
  This post highlights the practical diversification concern: a passive index investor can have a large share of exposure concentrated in only a handful of names.
- Samantha Watson: Top 10 stocks now 40% of index
  This post summarizes the concentration concern around the top 10 stocks and technology-heavy exposure.
- David Bear: S&P 500’s top 10 stocks now hold 40% of index value
  This post frames the issue around the market’s dependence on a small number of large companies.
- MIRA Money: The Magnificent Seven dominating the S&P 500
  This post asks whether Mag 7 dominance is sustainable and whether investors are taking on more concentration risk than they realize.
- Nicola Wealth: Market concentration has reached historic levels
  Nicola Wealth connects concentration to a practical investor question: why might a long-term portfolio not look like the S&P 500?
- Thomas Ketchell: The S&P 500 holds 500 companies, but just a few drive much of the risk
  This post makes an important distinction: index weight is not necessarily the same thing as the contribution to day-to-day risk and return.
- The Family Office: Concentration shift in the S&P 500 alters diversification
  This post discusses how rising top-10 concentration changes what diversification means for public market exposure.
- Ludovic Phalippou: Concentration in the S&P 500 from a historical perspective
  This post provides a useful counterbalance: concentration is partly a reflection of relative performance between mega-caps and the rest of the market.
These posts all point toward the same underlying question:
If the S&P 500 is becoming more concentrated, can we inspect which constituents actually contributed to the index return?
That is where open-spx fits.
Many public discussions stop at current weights or concentration charts. Visual Capitalist makes the size distribution of the index easy to see. Slickcharts provides a current constituent-weight snapshot.
Those are useful starting points.
But current weight is not the same as historical return contribution.
open-spx is aimed at the next layer down: approximate, constituent-level S&P 500 price-index return contribution analysis through time.
What Is S&P 500 Return Contribution Analysis?
S&P 500 return contribution analysis is a way to decompose index performance into the stocks that drove it.
In simple terms:
```
stock contribution = stock weight × stock return
```
If a company has a large index weight and a strong return, it can contribute meaningfully to the S&P 500’s return. If a company has a small weight, even a very large stock move may have only a small index-level impact.
This is also called:
- S&P 500 contribution analysis
- S&P 500 performance attribution
- constituent-level return attribution
- stock-level index attribution
- S&P 500 return decomposition
- bottom-up S&P 500 replication
The concept is simple.
The implementation is not.
Why This Is Hard to Find for Free
Current S&P 500 weights are relatively easy to inspect. Slickcharts, ETF holdings pages, and other public sources can show a current snapshot of component weights.
But stock-level contribution analysis through time requires more than current weights.
To estimate historical S&P 500 constituent contribution, you need:
1. Point-in-time index membership.
2. Point-in-time constituent weights.
3. Price returns for each constituent.
4. Correct handling of ticker changes, mergers, additions, deletions, spin-offs, share-class events, splits, and other corporate actions.
5. A target index series to compare against.
The S&P 500 is maintained using float-adjusted market capitalization and official index methodology. Exact official weights, float adjustments, divisor changes, and corporate-action treatments are not fully observable from free public data.
That is the gap.
You can read many articles and LinkedIn posts saying that the S&P 500 is concentrated. You can view today’s largest weights. You can see beautiful visualizations of the index by company size.
But if you want an open, reproducible table showing approximate stock-level return contributions over time, the options are much thinner.
Introducing open-spx
open-spx is open Python tooling for approximate bottom-up replication and contribution analysis of S&P 500 price-index returns from user-provided CSV inputs.
It helps answer questions like:
- Which stocks contributed most to the S&P 500 price-index return?
- Which stocks detracted most?
- How much of the index move came from the largest names?
- How did contribution change through time?
- How closely can a bottom-up approximation replicate a supplied S&P 500 price-index series?
- Where might data-quality issues, ticker mappings, or corporate actions be affecting the result?
The project is designed around a practical reality:
Official S&P 500 contribution data is not freely available as a complete point-in-time dataset, but an approximate, transparent, bottom-up workflow is still useful.
What open-spx Does
At a high level, open-spx performs five tasks.
1. Builds a point-in-time membership matrix
A static list of today’s S&P 500 constituents is not enough.
The index changes. Companies are added and removed. Tickers change. Share classes appear. Mergers and spin-offs happen. Historical analysis needs a point-in-time view of who was in the index on each date.
open-spx builds a membership matrix from historical constituent snapshots and optional ticker mappings.
2. Loads constituent prices from local CSV files
The project expects users to provide their own local price data.
This matters because open-spx is not a data vendor. It does not redistribute licensed constituent price histories or official index data. Users remain responsible for the data they are allowed to use.
3. Builds prior weights from market caps or shares outstanding
A stock’s contribution depends on both return and weight.
open-spx estimates prior weights from user-provided market-cap CSVs or from shares outstanding combined with close prices.
These are approximate prior weights, not official S&P Dow Jones Indices weights.
4. Computes bottom-up return contributions
Once membership, prices, returns, and prior weights are aligned, the tool computes stock-level return contributions.
The output can be used to inspect:
- largest cumulative contributors
- largest cumulative detractors
- daily contribution tables
- prior-weight replication
- replicated index returns versus the supplied S&P 500 price-index series
5. Optionally fits a constrained RNN adjustment layer
One daily index return cannot uniquely identify hundreds of constituent weights.
Because of that, open-spx optionally fits a regularized, prior-constrained masked RNN weight path as one smooth explanation of the supplied return series.
This fitted layer should not be treated as official index data. It is a diagnostic tool, not a source of truth.
The model-implied weights are ex-post and in-sample unless the user implements a holdout or walk-forward validation.
What open-spx Does Not Do
open-spx is intentionally explicit about its limitations.
It does not:
- reproduce the official S&P 500 methodology
- provide official S&P 500 constituent weights
- model the official index divisor
- reproduce all float adjustments
- model all corporate-action treatments
- recover official investable weight factors
- compute total-return index contribution
- redistribute licensed data
The project focuses on the S&P 500 price index, not the total-return index. Ordinary dividends should not be mixed into the input series casually.
This distinction matters. If you are trying to explain the S&P 500 price index, use price-index-compatible inputs. If you are trying to explain total return, that is a different problem.
Why Approximate Contribution Analysis Is Still Useful
Approximate does not mean useless.
A transparent approximation can still help answer important questions:
- Is the index being carried by a small number of companies?
- Which names contributed most over a specific window?
- Are the biggest contributors the same as the biggest weights?
- Which stocks are offsetting the leading contributors?
- Does a bottom-up replication broadly track the supplied index?
- Where does replication error appear?
- Which dates or tickers deserve data-quality review?
That is often enough to move from vague commentary to concrete analysis.
Instead of saying:
“The S&P 500 is being driven by a handful of stocks.”
You can ask:
“Which stocks, by approximate contribution, drove the S&P 500 price-index return over this period?”
That is a better research question.
Example Outputs
open-spx writes CSV files and plots designed for inspection.
Example outputs include:
```
historical_constituents.csv
membership_date_ranges.csv
prices.csv
market_caps_prior_timeseries.csv
weights_prior_timeseries.csv
replication_prior_weights.csv
return_contributions_prior_weights.csv
weights_model_implied.csv
effective_exposures_model_fit.csv
market_cap_equivalent_exposure_gap.csv
returns.csv
return_contributions.csv
cumulative_top_return_contributors.csv
cumulative_top_return_bleeders.csv
replication_vs_sp500.csv
replication_metrics.csv
replication_metrics_by_model.csv
anomaly_report.csv
input_usage_report.csv
spx_vs_replicated_spx.png
largest_market_cap_difference_case.png
```
The most useful files for contribution analysis are:
- return_contributions.csv
- return_contributions_prior_weights.csv
- cumulative_top_return_contributors.csv
- cumulative_top_return_bleeders.csv
- replication_vs_sp500.csv
- replication_metrics_by_model.csv
- anomaly_report.csv
The anomaly report is especially useful because strange contribution results often come from data issues: split handling, stale shares outstanding, ticker mappings, missing membership transitions, spin-offs, special dividends, or other corporate actions.
How to Run open-spx
Install the project:
```
git clone https://github.com/wgeul/open-spx.git
cd open-spx
pip install -r requirements.txt
pip install -e . --no-deps
```
Then run it with local CSV inputs:
```
open-spx \
  --start 2024-01-01 \
  --index data/sp500_index.csv \
  --local-data-dir data/inputs \
  --out data/run
```
For quieter logs or CI usage:
```
open-spx --start 2024-01-01 --quiet
```
You can also override the constituent input folders independently:
```
open-spx \
  --start 2024-01-01 \
  --index data/sp500_index.csv \
  --local-prices-dir data/prices \
  --local-market-caps-dir data/market_caps \
  --out data/run
```
Required Data Inputs
open-spx expects plain CSV inputs.
S&P 500 price-index series
```
Date,Close
2024-01-02,4742.83
2024-01-03,4704.81
```
Accepted value column names include Close, sp500_index, index, or level.
Historical constituents
```
date,ticker
2024-01-01,A
2024-01-01,B
2024-01-02,A
2024-01-02,C
```
The default constituent source points to an open historical S&P 500 component dataset. It is useful, but it is not an official S&P constituent feed. Serious use still requires validation.
Constituent prices
```
Date,Open,High,Low,Close,Volume
2024-01-02,101.0,103.0,100.5,102.2,1234567
2024-01-03,102.2,104.1,101.7,103.6,1456789
```
Daily close data is strongly recommended.
Market caps or shares outstanding
Market-cap example:
```
date,market_cap
2024-01-02,12345678900
2024-01-03,12400000000
```
Shares-outstanding example:
```
date,shares_outstanding
2024-01-02,123456789
```
If only shares outstanding are provided, open-spx builds the market-cap prior as:
```
market cap prior = close price × shares outstanding
```
How This Complements Visual Capitalist and Slickcharts
Visual Capitalist is excellent for seeing the S&P 500 in one chart. It makes concentration visually obvious.
Slickcharts is useful for checking current S&P 500 companies by weight.
But both are mostly snapshot-oriented resources. They help answer:
What does the S&P 500 look like now?
open-spx is aimed at a different question:
Which constituents approximately contributed to S&P 500 price-index returns through time?
That distinction is important.
Current weight is not the same as historical contribution. A stock can have a large current weight because it performed well in the past. Contribution analysis tries to show how that performance accumulated.
Why This Matters for Investors, Researchers, and Developers
S&P 500 concentration is not just a portfolio-management topic. It is also a data-transparency topic.
If a small group of companies drives a large share of index performance, then understanding the index requires more than looking at the headline return.
You need to inspect the drivers.
For investors, that can clarify how much passive exposure depends on a few mega-cap names.
For researchers, it creates a reproducible way to study concentration and return decomposition.
For developers, it provides a concrete Python workflow for working with point-in-time membership, constituent returns, prior weights, and replication diagnostics.
For market commentators, it creates a more precise alternative to broad claims about “narrow leadership.”
The Main Takeaway
The S&P 500 may contain around 500 companies, but its returns are not produced equally by 500 companies.
As concentration rises, the question becomes more important:
Which stocks are actually driving the index?
open-spx does not claim to provide official S&P 500 weights or exact index replication. Instead, it provides open Python tooling for approximate, inspectable, bottom-up S&P 500 price-index contribution analysis using user-provided CSV inputs.
That is the missing middle ground between high-level concentration charts and proprietary index attribution systems.
If you want to move beyond “the S&P 500 is concentrated” and start inspecting approximate stock-level contribution directly, open-spx is built for that.
Repository
Find the project on GitHub:
github.com/wgeul/open-spx
Code is licensed under Apache-2.0. Users are responsible for ensuring they have the rights to use and distribute the CSV inputs and generated outputs they create with the project.
This project is independent and is not affiliated with, endorsed by, or sponsored by S&P Dow Jones Indices, S&P Global, or CME Group.
Source Links
06/09/2026