Given two strings text (the text) and pattern (the pattern), consisting of lowercase English alphabets, find all 0-based starting indices where pattern occurs as a substring in text.
Examples:
Input: text = "geeksforgeeks", pattern = "geeks"
Output: [0, 8]
Explanation: The string "geeks" occurs at index 0 and 8 in text.Input: text = "aabaacaadaabaaba", pattern = "aaba"
Output: [0, 9, 12]
Explanation:
In the Naive String Matching algorithm, we check whether every substring of the text of the pattern's size is equal to the pattern or not one by one.
Like the Naive Algorithm, the Rabin-Karp algorithm also check every substring. But unlike the Naive algorithm, the Rabin Karp algorithm matches the hash value of the pattern with the hash value of the current substring of text. So Rabin Karp algorithm has the following step
- Compute the hash of the pattern
- Compare pattern's hash with the hashes of all substrings (of same length as pattern) of the text. If the hashes match, we do a character-by-character check to confirm (to avoid errors due to hash collisions).
How is Hash Value calculated in Rabin-Karp?
The hash value is calculated using a rolling hash function, which allows you to update the hash value for a new substring by efficiently removing the contribution of the old character and adding the contribution of the new character. This makes it possible to slide the pattern over the text and calculate the hash value for each substring without recalculating the entire hash from scratch.
For a string s of length m, the simple hash is
hash(s) = (s[0] × d(m−1) + s[1] × d(m−2) + ... + s[m−1] × d0) % q
Here,
s[i] represents the ASCII value of the character ('a' = 97, 'b' = 98, ..., 'z' = 122)
d is the size of the input alphabet (commonly 256 for ASCII characters)
q is a prime number used as modulus (commonly 101, 1000000007, etc.)Using modulo helps prevent integer overflow and reduces hash collisions.
How do we compute rolling hash?
So we know hash values s[i-1, j-1] and now we need to compute for s[i..j].
Previous value of hash
The hash value of substring s[i−1 … j−1] is: (s[i-1] * d(m-1) + s[i] * d(m-2) + ... + s[j-1] * d0) mod q
New value of hash
The hash value of the next substring s[i … j] is: (s[i] * d(m-1) + s[i+1] * d(m-2) + ... + s[j] * d0 ) mod q
From the above expressions, it is clear that we can get new value from old value by removing the term s[i-1] * d(m-1) and adding the term s[j] * d0)
hash(i, j) = (d * (hash(i−1, j−1) − s[i−1] * d(m−1) )+ s[j]) mod q
Here's how the hash value is typically calculated in Rabin-Karp:
- Choose a base d for the hash function (commonly 31 for lowercase characters or 256 for ASCII). Also select a large prime number q as the modulus to prevent overflow and reduce hash collisions.
- Precompute the value of the term d(m-1) as we are going to repeatedly multiply with this term in rolling hash calculations as explained above.
- Calculate the hash for the pattern and the first substring of text matching. Let the pattern hash function be p and text hash function be t. We use the formula (s[0] * d(m-1) + s[1] * d(m-2) + ... + s[m-1] * d0 ) mod q for the first computations.
- Update the hash by removing the leftmost char and adding the next one using hash(i, j) = ( (hash(i-1, j-1) - s[i-1] * d(m-1) ) mod q + s[j]) mod q
- If the current text hash equals the pattern hash, verify character by character. This is needed because different substrings can have the same hash (collision).
#include <vector>
using namespace std;
// Function to search for all occurrences of 'pat' in 'txt' using Rabin-Karp
vector<int> search(const string &pat, const string &txt)
{
// Number of characters in the input alphabet (ASCII)
int d = 256;
// A prime number for modulo operations to reduce collisions
int q = 101;
// Length of the pattern
int m = pat.length();
// Length of the text
int n = txt.length();
// Hash value for pattern
int p = 0;
// Hash value for current window of text
int t = 0;
// High-order digit multiplier
int h = 1;
vector<int> ans;
// Precompute h = pow(d, M-1) % q
for (int i = 0; i < m - 1; i++)
h = (h * d) % q;
// Compute initial hash values for
// pattern and first window of text
for (int i = 0; i < m; i++)
{
p = (d * p + pat[i]) % q;
t = (d * t + txt[i]) % q;
}
// Slide the pattern over text one by one
for (int i = 0; i <= n - m; i++)
{
// If hash values match, check characters one by one
if (p == t)
{
bool match = true;
for (int j = 0; j < m; j++)
{
if (txt[i + j] != pat[j])
{
match = false;
break;
}
}
if (match)
ans.push_back(i);
}
// Calculate hash value for the next window
if (i < n - m)
{
t = (d * (t - txt[i] * h) + txt[i + m]) % q;
// Ensure hash value is non-negative
if (t < 0)
t += q;
}
}
return ans;
}
int main()
{
string txt = "geeksforgeeks";
string pat = "geeks";
vector<int> res = search(pat, txt);
for (auto it : res)
{
cout << it << " ";
}
cout << "\n";
return 0;
}
import java.util.ArrayList;
public class GFG {
// Function to search for all occurrences of 'pat' in
// 'txt' using Rabin-Karp
static ArrayList<Integer> search(String pat, String txt)
{
// Number of characters in the input alphabet
// (ASCII)
int d = 256;
// A prime number for modulo operations to reduce
// collisions
int q = 101;
// Length of the pattern
int m = pat.length();
// Length of the text
int n = txt.length();
// Hash value for pattern
int p = 0;
// Hash value for current window of text
int t = 0;
// High-order digit multiplier
int h = 1;
ArrayList<Integer> ans = new ArrayList<>();
// Precompute h = pow(d, m-1) % q
for (int i = 0; i < m - 1; i++)
h = (h * d) % q;
// Compute initial hash values for pattern and first
// window of text
for (int i = 0; i < m; i++) {
p = (d * p + pat.charAt(i)) % q;
t = (d * t + txt.charAt(i)) % q;
}
// Slide the pattern over text one by one
for (int i = 0; i <= n - m; i++) {
// If hash values match, check characters one by
// one
if (p == t) {
boolean match = true;
for (int j = 0; j < m; j++) {
if (txt.charAt(i + j)
!= pat.charAt(j)) {
match = false;
break;
}
}
if (match)
ans.add(i);
}
// Calculate hash value for the next window
if (i < n - m) {
t = (d * (t - txt.charAt(i) * h)
+ txt.charAt(i + m))
% q;
if (t < 0)
t += q;
}
}
return ans;
}
public static void main(String[] args)
{
String txt = "geeksforgeeks";
String pat = "geeks";
ArrayList<Integer> res = search(pat, txt);
for (int idx : res)
System.out.print(idx + " ");
System.out.println();
}
}
# Function to search for all occurrences of 'pat' in 'txt' using Rabin-Karp
def search(pat, txt):
# Number of characters in the input alphabet (ASCII)
d = 256
# A prime number for modulo operations to reduce collisions
q = 101
# Length of the pattern
m = len(pat)
# Length of the text
n = len(txt)
# Hash value for pattern
p = 0
# Hash value for current window of text
t = 0
# High-order digit multiplier
h = 1
ans = []
# Precompute h = pow(d, m-1) % q
for i in range(m - 1):
h = (h * d) % q
# Compute initial hash values for pattern and first window of text
for i in range(m):
p = (d * p + ord(pat[i])) % q
t = (d * t + ord(txt[i])) % q
# Slide the pattern over text one by one
for i in range(n - m + 1):
# If hash values match, check characters one by one
if p == t:
match = True
for j in range(m):
if txt[i + j] != pat[j]:
match = False
break
if match:
ans.append(i)
# Calculate hash value for the next window
if i < n - m:
t = (d * (t - ord(txt[i]) * h) + ord(txt[i + m])) % q
if t < 0:
t += q
return ans
if __name__ == "__main__":
txt = "geeksforgeeks"
pat = "geeks"
res = search(pat, txt)
for idx in res:
print(idx, end=" ")
print()
using System;
using System.Collections.Generic;
class GFG {
// Function to search for all occurrences of 'pat' in
// 'txt' using Rabin-Karp
public static List<int> search(string pat, string txt)
{
// Number of characters in the input alphabet
// (ASCII)
int d = 256;
// A prime number for modulo operations to reduce
// collisions
int q = 101;
// Length of the pattern
int m = pat.Length;
// Length of the text
int n = txt.Length;
// Hash value for pattern
int p = 0;
// Hash value for current window of text
int t = 0;
// High-order digit multiplier
int h = 1;
List<int> ans = new List<int>();
// Precompute h = pow(d, m-1) % q
for (int i = 0; i < m - 1; i++)
h = (h * d) % q;
// Compute initial hash values for pattern and first
// window of text
for (int i = 0; i < m; i++) {
p = (d * p + pat[i]) % q;
t = (d * t + txt[i]) % q;
}
// Slide the pattern over text one by one
for (int i = 0; i <= n - m; i++) {
// If hash values match, check characters one by
// one
if (p == t) {
bool match = true;
for (int j = 0; j < m; j++) {
if (txt[i + j] != pat[j]) {
match = false;
break;
}
}
if (match)
ans.Add(i);
}
// Calculate hash value for the next window
if (i < n - m) {
t = (d * (t - txt[i] * h) + txt[i + m]) % q;
if (t < 0)
t += q;
}
}
return ans;
}
public static void Main()
{
string txt = "geeksforgeeks";
string pat = "geeks";
List<int> res = search(pat, txt);
Console.WriteLine(string.Join(" ", res));
}
}
// Function to search for all occurrences of 'pat' in 'txt'
// using Rabin-Karp
function search(pat, txt)
{
// Number of characters in the input alphabet (ASCII)
let d = 256;
// A prime number for modulo operations to reduce
// collisions
let q = 101;
// Length of the pattern
let m = pat.length;
// Length of the text
let n = txt.length;
// Hash value for pattern
let p = 0;
// Hash value for current window of text
let t = 0;
// High-order digit multiplier
let h = 1;
let ans = [];
// Precompute h = pow(d, m-1) % q
for (let i = 0; i < m - 1; i++)
h = (h * d) % q;
// Compute initial hash values for pattern and first
// window of text
for (let i = 0; i < m; i++) {
p = (d * p + pat.charCodeAt(i)) % q;
t = (d * t + txt.charCodeAt(i)) % q;
}
// Slide the pattern over text one by one
for (let i = 0; i <= n - m; i++) {
// If hash values match, check characters one by one
if (p === t) {
let match = true;
for (let j = 0; j < m; j++) {
if (txt[i + j] !== pat[j]) {
match = false;
break;
}
}
if (match)
ans.push(i);
}
// Calculate hash value for the next window
if (i < n - m) {
t = (d * (t - txt.charCodeAt(i) * h)
+ txt.charCodeAt(i + m))
% q;
if (t < 0)
t += q;
}
}
return ans;
}
// driver code
let txt = "geeksforgeeks";
let pat = "geeks";
let res = search(pat, txt);
console.log(res.join(" "));
Output
0 8
Time Complexity:
- Average case: O(n + m) - Computing the hash of the pattern and the first substring of text takes O(m), and sliding the window over the text with hash comparisons takes O(n) on average. Character-by-character verification is rarely needed.
- Worst case: O(n × m) - If many hash collisions occur, each substring may require a full character comparison.
Auxiliary Space: O(1)
Limitations of Rabin-Karp
- Hash collisions may occur - different substrings can have the same hash.
- Requires character-by-character check to confirm matches.
- Risk of modulo overflow if not handled with care.
- Performance depends on good hash function and prime modulus.
- Slightly higher constant factors compared to simpler algorithms like KMP.
Types of Problems Rabin-Karp Can Solve
- Pattern matching - Find all occurrences of a pattern in a large text
- Plagiarism detection - Compare documents by checking for common substrings
- Multiple pattern search - Efficiently search for several patterns at once
- Substring comparison - Quickly compare substrings using hashes
- Palindrome and DP problems - Used for hashing-based optimizations
- Detect duplicate substrings - Find repeated sequences in strings
- Longest common prefix/suffix - In constant time using precomputed hashes
