Can You Find Substring with Concatenation of All Words?
A common question in tech interviews involves finding a substring that is a concatenation of a list of words. This scenario helps evaluate a candidate’s problem-solving skills and understanding of string manipulation and data structures. Let’s explore this problem in detail and provide a clear solution.
Problem Statement
You are given a string s
and an array of strings words
. The goal is to determine if s
contains a substring that is a concatenation of each word in words
exactly once, without any intervening characters. Each word's length is fixed, and they will appear together as specified by the length of the words.
For instance, if s = "barfoothefoobarman"
and words = ["foo", "bar"]
, the solution should return true
since the substring "barfoo" is indeed a concatenation of both "bar" and "foo".
Approach
To solve this problem, an efficient method involves the use of a sliding window technique along with a hash map (or dictionary) for counting the occurrences of the words. The following steps outline the approach:
- Set Up Variables: Calculate the total length of all words combined and determine the length of each word.
- Create a Word Count Map: Use a dictionary to count how many times each word appears in the
words
list. - Sliding Window: Iterate through the string
s
using a fixed-size window of total length for the words. For each position, check if the substring can be broken down into valid words by comparing it to the dictionary.
Code Example
Below is a Python solution that illustrates this approach:
Python
Explanation of the Code
- Input Validation: The function begins with basic checks for empty input.
- Initialize Parameters: We calculate the lengths we need for words, the word count, and create a dictionary to count the words.
- Main Loop: The outer loop moves the starting point of the sliding window across
s
. The inner loop extracts words based on their lengths and checks against the dictionary. - Word Matching: It maintains a count of how many times each word is found. If a word exceeds its required count, the inner loop breaks. When all words match exactly, the start index of the substring is stored in the results.
This solution efficiently identifies valid starting indices for substrings that match the criteria, making it a practical choice for similar interview questions.