<script>!function(){functiont(t){document.documentElement.setAttribute("data-theme",t)}vare=function(){try{returnnewURLSearchParams(window.location.search).get("docusaurus-theme")}catch(t){}}()||function(){try{returnlocalStorage.getItem("theme")}catch(t){}}();t(null!==e?e:"light")}(),function(){try{constc=newURLSearchParams(window.location.search).entries();for(var[t,e]ofc)if(t.startsWith("docusaurus-data-")){vara=t.replace("docusaurus-data-","data-");document.documentElement.setAttribute(a,e)}}catch(t){}}()</script><divid="__docusaurus"><divrole="region"aria-label="Skip to main content"><aclass="skipToContent_fXgn"href="#__docusaurus_skipToContent_fallback">Skip to main content</a></div><navaria-label="Main"class="navbar navbar--fixed-top"><divclass="navbar__inner"><divclass="navbar__items"><buttonaria-label="Toggle navigation bar"aria-expanded="false"class="navbar__toggle clean-btn"type="button"><svgwidth="30"height="30"viewBox="0 0 30 30"aria-hidden="true"><pathstroke="currentColor"stroke-linecap="round"stroke-miterlimit="10"stroke-width="2"d="M4 7h22M4 15h22M4 23h22"></path></svg></button><aclass="navbar__brand"href="/"><bclass="navbar__title text--truncate">mf</b></a><divclass="navbar__item dropdown dropdown--hoverable"><ahref="#"aria-haspopup="true"aria-expanded="false"role="button"class="navbar__link">Additional FI MU materials</a><ulclass="dropdown__menu"><li><aaria-current="page"class="dropdown__link dropdown__link--active"href="/algorithms/">Algorithms</a></li><li><aclass="dropdown__link"href="/c/">C</a></li><li><aclass="dropdown__link"href="/cpp/">C++</a></li></ul></div><aclass="navbar__item navbar__link"href="/contributions/">Contributions</a><aclass="navbar__item navbar__link"href="/talks/">Talks</a></div><divclass="navbar__items navbar__items--right"><aclass="navbar__item navbar__link"href="/blog/">Blog</a><divclass="toggle_vylO colorModeToggle_DEke"><buttonclass="clean-btn toggleButton_gllP toggleButtonDisabled_aARS"type="button"disabled=""title="Switch between dark and light mode (currently light mode)"aria-label="Switch between dark and light mode (currently light mode)"aria-live="polite"><svgviewBox="0 0 24 24"width="24"height="24"class="lightToggleIcon_pyhR"><pathfill="currentColor"d="M12,9c1.65,0,3,1.35,3,3s-1.35,3-3,3s-3-1.35-3-3S10.35,9,12,9 M12,7c-2.76,0-5,2.24-5,5s2.24,5,5,5s5-2.24,5-5 S14.76,7,12,7L12,7z M2,13l2,0c0.55,0,1-0.45,1-1s-0.45-1-1-1l-2,0c-0.55,0-1,0.45-1,1S1.45,13,2,13z M20,13l2,0c0.55,0,1-0.45,1-1 s-0.45-1-1-1l-2,0c-0.55,0-1,0.45-1,1S19.45,13,20,13z M11,2v2c0,0.55,0.45,1,1,1s1-0.45,1-1V2c0-0.55-0.45-1-1-1S11,1.45,11,2z M11,20v2c0,0.55,0.45,1,1,1s1-0.45,1-1v-2c0-0.55-0.45-1-1-1C11.45,19,11,19.45,11,20z M5.99,4.58c-0.39-0.39-1.03-0.39-1.41,0 c-0.39,0.39-0.39,1.03,0,1.41l1.06,1.06c0.39,0.39,1.03,0.39,1.41,0s0.39-1.03,0-1.41L5.99,4.58z M18.36,16.95 c-0.39-0.39-1.03-0.39-1.41,0c-0.39,0.39-0.39,1.03,0,1.41l1.06,1.06c0.39,0.39,1.03,0.39,1.41,0c0.39-0.39,0.39-1.03,0-1.41 L18.36,16.95z M19.42,5.99c0.39-0.39,0.39-1.03,0-1.41c-0.39-0.39-1.03-0.39-1.41,0l-1.06,1.06c-0.39,0.39-0.39,1.03,0,1.41 s1.03,0.39,1.41,0L19.42,5.99z M7.05,18.36c0.39-0.39,0.39-1.03,0-1.41c-0.39-0.39-1.03-0.39-1.41,0l-1.06,1.06 c-0.39,0.39-0.39,1.03,0,1.41s1.03,0.39,1.41,0L7.05,18.36z"></path></svg><svgviewBox="0 0 24 24"width="24"height="24"class="darkToggleIcon_wfgR"><pathfill="currentColor"d="M9.37,5.51C9.19,6.15,9.1,6.82,9.1,7.5c0,4.08,3.32,7.4,7.4,7.4c0.68,0,1.35-0.09,1.99-0.27C17.45,17.19,14.93,19,12,19 c-3.86,0-7-3.14-7-7C5,9.07,6.81,6.55,9.37,5.51z M12,3c-4.97,0-9,4.03-9,9s4.03,9,9,9s9-4.03,9-9c0-0.46-0.04-0.92-0.1-1.36 c-0.98,1.37-2.58,2.26-4.4,2.26c-2.98,0-5.4-2.42-5.4-5.4c0-1.81,0.89-3.42,2.26-4.4C12.92,3.04,12.46,3,12,3L12,3z"></path></svg></button></div><divclass="navbarSearchContainer_Bca1"><buttontype="button"class="DocSearch DocSearch-Button"aria-label="Search"><spanclass="DocSearch-Button-Container"><svgwidth="20"height="20"class="DocSearch-Search-Icon"viewBox="0 0 20 20"><pathd="M14.38614.386l4.08774.0877-4.0877-4.0877c-2.
only make it better, we cannot guarantee the ideal time complexity…</p>
<p>For the sake of simplicity (and referencing an article by <em>Neal Wu</em> on the same
topic; in references below) I will use the C++ to describe the mitigations.</p>
<h2class="anchor anchorWithStickyNavbar_LWe7"id="random-seed">Random seed<ahref="#random-seed"class="hash-link"aria-label="Direct link to Random seed"title="Direct link to Random seed"></a></h2>
<p>One of the options how to avoid this kind of an attack is to introduce a random
seed to the hash. That way it is not that easy to choose the <em>nasty</em> numbers.</p>
<p>In this case the hash is using a high-precision clock to shift the number, which
is much harder to break.</p>
<h2class="anchor anchorWithStickyNavbar_LWe7"id="better-random-seed">Better random seed<ahref="#better-random-seed"class="hash-link"aria-label="Direct link to Better random seed"title="Direct link to Better random seed"></a></h2>
<p>Building on the previous solution, we can do some <em>bit magic</em> instead of the
<p>This not only shifts the number, it also manipulates the underlying bits of the
hash. In this case we're also applying the <code>XOR</code> operation.</p>
<h2class="anchor anchorWithStickyNavbar_LWe7"id="adjusting-the-hash-function">Adjusting the hash function<ahref="#adjusting-the-hash-function"class="hash-link"aria-label="Direct link to Adjusting the hash function"title="Direct link to Adjusting the hash function"></a></h2>
<p>Another option is to switch up the hash function.</p>
<p>For example Rust uses <ahref="https://en.wikipedia.org/wiki/SipHash"target="_blank"rel="noopener noreferrer"><em>SipHash</em></a> by
default.</p>
<p>On the other hand, you can usually specify your own hash function, here we will
follow the article by <em>Neal</em> that uses so-called <em><code>splitmix64</code></em>.</p>
<divclass="language-java codeBlockContainer_Ckt0 theme-code-block"style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><divclass="codeBlockContent_biex"><pretabindex="0"class="prism-code language-java codeBlock_bY9V thin-scrollbar"style="color:#393A34;background-color:#f6f8fa"><codeclass="codeBlockLines_e6Vv"><spanclass="token-line"style="color:#393A34"><spanclass="token comment"style="color:#999988;font-style:italic">/**</span><br></span><spanclass="token-line"style="color:#393A34"><spanclass="token comment"style="color:#999988;font-style:italic"> * Computes key.hashCode() and spreads (XORs) higher bits of hash</span><br></span><spanclass="token-line"style="color:#393A34"><spanclass="token comment"style="color:#999988;font-style:italic"> * to lower. Because the table uses power-of-two masking, sets of</span><br></span><spanclass="token-line"style="color:#393A34"><spanclass="token comment"style="color:#999988;font-style:italic"> * hashes that vary only in bits above the current mask will</span><br></span><spanclass="token-line"style="color:#393A34"><spanclass="token comment"style="color:#999988;font-style:italic"> * always collide. (Among known examples are sets of Float keys</span><br></span><spanclass="token-line"style="color:#393A34"><spanclass="token comment"style="color:#999988;font-style:italic"> * holding consecutive whole numbers in small tables.) So we</span><br></span><spanclass="token-line"style="color:#393A34"><spanclass="token comment"style="color:#999988;font-style:italic"> * apply a transform that spreads the impact of higher bits</span><br></span><spanclass="token-line"style="color:#393A34"><spanclass="token comment"style="color:#999988;font-style:italic"> * downward. There is a tradeoff between speed, utility, and</span><br></span><spanclass="token-line"style="color:#393A34"><spanclass="token comment"style="color:#999988;font-style:italic"> * quality of bit-spreading. Because many common sets of hashes</span><br></span><spanclass="token-line"style="color:#393A34"><spanclass="token comment"style="color:#999988;font-style:italic"> * are already reasonably distributed (so don't benefit from</span><br></span><spanclass="token-line"style="color:#393A34"><spanclass="token comment"style="color:#999988;font-style:italic"> * spreading), and because we use trees to handle large sets of</span><br></span><spanclass="token-line"style="color:#393A34"><spanclass="token comment"style="color:#999988;font-style:italic"> * collisions in bins, we just XOR some shifted bits in the</span><br></span><spanclass="token-line"style="color:#393A34"><spanclass="token comment"style="color:#999988;font-style:italic"> * cheapest possible way to reduce systematic lossage, as well as</span><br></span><spanclass="token-line"style="color:#393A34"><spanclass="token comment"style="color:#999988;font-style:italic"> * to incorporate impact of the highest bits that would otherwise</span><br></span><spanclass="token-line"style="color:#393A34"><spanclass="token comment"style="color:#999988;font-style:italic"> * never be used in index calculations because of table bounds.</span><br></span><spanclass="token-line"style="color:#393A34"><spanclass="token comment"style="color:#999988;font-style:italic"> */</span><spanclass="token plain"></span><br></span><spanclass="token-line"style="color:#393A34"><spanclass="token plain"></span><spanclass="token keyword"style="color:#00009f">static</span><spanclass="token plain"></span><spanclass="token keyword"style="color:#00009f">final</span><spanclass="token plain"></span><spanclass="token keyword"style="color:#00009f">int</span><spanclass="token plain"></span><spanclass="token function"style="color:#d73a49">hash</span><spanclass="token punctuation"style="color:#393A34">(</span><spanclass="token class-name">Object</span><spanclass="token plain"> key</span><spanclass="token punctuation"style="color:#393A34">)</span><spanclass="token plain"></span><spanclass="token punctuation"style="color:#393A34">{
<p>You can notice that they try to include the upper bits of the hash by using
<code>XOR</code>, this would render our attack in the previous part helpless.</p>
<h2class="anchor anchorWithStickyNavbar_LWe7"id="combining-both">Combining both<ahref="#combining-both"class="hash-link"aria-label="Direct link to Combining both"title="Direct link to Combining both"></a></h2>
<p>Can we make it better? Of course! Use multiple mitigations at the same time. In
our case, we will both inject the random value <strong>and</strong> use the <em><code>splitmix64</code></em>:</p>
<h2class="anchor anchorWithStickyNavbar_LWe7"id="fallback-for-extreme-cases">Fallback for extreme cases<ahref="#fallback-for-extreme-cases"class="hash-link"aria-label="Direct link to Fallback for extreme cases"title="Direct link to Fallback for extreme cases"></a></h2>
<p>As we have mentioned above, Python resolves the conflicts by probing (it looks
for empty space somewhere else in the table, but it's deterministic about it, so
it's not “<em>oops, this is full, let's go one-by-one and find some spot</em>”). In the
case of C++ and Java, they resolve the conflicts by linked lists, as is the
usual text-book depiction of the hash table.</p>
<p>However Java does something more intelligent. Once you go over the threshold of
conflicts in one spot, it converts the linked list to an RB-tree that is sorted
by the hash and key respectively.</p>
<divclass="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><divclass="admonitionHeading_Gvgb"><spanclass="admonitionIcon_Rf37"><svgviewBox="0 0 12 16"><pathfill-rule="evenodd"d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><divclass="admonitionContent_BuS1"><p>You may wonder what sense does it make to define an ordering on the tree by the
hash, if we're dealing with conflicts. Well, there are less buckets than the
range of the hash, so if we take lower bits, we can have a conflict even though
the hashes are not the same.</p></div></div>
<p>You might have noticed that if we get a <strong>really bad</strong> hashing function, this is
not very helpful. It is not, <strong>but</strong> it can help in other cases.</p>
<divclass="theme-admonition theme-admonition-danger admonition_xJq3 alert alert--danger"><divclass="admonitionHeading_Gvgb"><spanclass="admonitionIcon_Rf37"><svgviewBox="0 0 12 16"><pathfill-rule="evenodd"d="M5.05.31c.81 2.17.41 3.38-.52 4.31C3.55 5.67 1.98 6.45.9 7.98c-1.45 2.05-1.7 6.53 3.53 7.7-2.2-1.16-2.67-4.52-.3-6.61-.61 2.03.53 3.33 1.94 2.86 1.39-.47 2.3.53 2.27 1.67-.02.78-.31 1.44-1.13 1.81 3.42-.59 4.78-3.42 4.78-5.56 0-2.84-2.53-3.22-1.25-5.61-1.52.13-2.03 1.13-1.89 2.75.09 1.08-1.02 1.8-1.86 1.33-.67-.41-.66-1.19-.06-1.78C8.18 5.31 8.68 2.45 5.05.32L5.03.3l.02.01z"></path></svg></span>danger</div><divclass="admonitionContent_BuS1"><p>As the ordering on the keys of the hash table is not required and may not be
implemented, the tree may be ordered by just the hash.</p></div></div>
<hr>
<h2class="anchor anchorWithStickyNavbar_LWe7"id="references">References<ahref="#references"class="hash-link"aria-label="Direct link to References"title="Direct link to References"></a></h2>
<ol>
<li>Neal Wu.
<ahref="https://codeforces.com/blog/entry/62393"target="_blank"rel="noopener noreferrer">Blowing up <code>unordered_map</code>, and how to stop getting hacked on it</a>.</li>