Pangram (holoalphabetic) detection

A pangram (Greek: pan gramma, “every letter”) or holoalphabetic sentence for a given alphabet is a sentence using every letter of the alphabet at least once.  For example: “The quick brown fox jumps over the lazy dog.”

pangram

Pangram

In this post, I will be discussing two methods to detect whether a given string is a pangram or not. The first method to detect a pangram would be employing an array to store the occurrence status and the second method to detect a pangram would use a single integer variable to store the occurrence status in its individual bits using bit-wise operators.

Pangram detection using an array

class Pangram{
    public static boolean isPangram( String str ){
        boolean status[] = new boolean[26];
        char ch;
        for(int i=0; i<str.length();i++){
            ch=str.charAt(i);
            if(Character.isLetter(ch)){
                ch=Character.toUpperCase(str.charAt(i));
                status[ch-65]=true;
            }
        }
        for(int i=0;i<status.length;i++){
            if(!status[i]) return false;
        }
        return true;
    }
    public static void main( String args[] ){
        System.out.println( isPangram1("abcABC") );
        System.out.println( isPangram1("ABCDEFGHIJKLMNOPQRSTUVWXYZ") );
        System.out.println( isPangram1("The Quick Brown Fox Jumps Over The Lazy Dog") );
    }
}

The concept in this approach is to use a boolean array as a collection of flag variables to store the occurrence status of each alphabet. The string would not be a pangram if there is any flag location storing false if all the flags locations store true, then the string would be a pangram.

Pangram detection using an integer variable through the bit-wise operator

class Pangram{
    public static boolean isPangram( String str ){
        int status = 0;
        char ch;
        for( int i = 0; i < str.length(); i++ ){
             ch=str.charAt( i );
             if( ch >= 'A' && ch <= 'Z' ) {
                status |= 1 << ( ch - 'A' );
             }else if( ch >= 'a' && ch <= 'z' ){
                status |= 1 << ( ch - 'a' );
            }
            if( status == 0b11111111111111111111111111 ) return true;
        }
        return false;
    }
    public static void main( String args[] ){
        System.out.println( isPangram1("abcABC") );
        System.out.println( isPangram1("ABCDEFGHIJKLMNOPQRSTUVWXYZ") );
        System.out.println( isPangram1("The Quick Brown Fox Jumps Over The Lazy Dog") );
    }
}

In this approach of finding pangram, we are using the individual bits of an integer variable (integer variables are of 4 bytes or 32 bits) to store the occurrence status of each alphabet. Last bit is being used for storing the presence (or absence) of ‘A’ or ‘a’, second last bit is being used for storing the presence (or absence) of ‘B’ or ‘b’ and so on. The for loop is for extracting each character of a string sequentially. The extracted character is then checked for being an uppercase or lowercase character. We are subtracting the value of ‘A’ (65) or ‘a’ (97) so as the get a value 0 for both ‘A’ and ‘a’, value 1 for both ‘B’ and ‘b’ and so on. The left shift operator (<<) is then used to shift the one to its proper place; meaning 1 for ‘A’ or ‘a’, 10 for ‘B’ or ‘b’, 100 for ‘C’ or ‘c’ and so on. We then bitwise OR this value into the status variable so as to set the corresponding bit 1 at the required position. As soon as the status variable is equal to 0b11111111111111111111111111 (0b  for binary) that is last twenty-six bit become 1 we return true since that indicates that all letters have occurred at least once. In case 0b11111111111111111111111111 is not encountered and the string is over it means that the sentence is not a pangram. The complexity of this algorithm is O(N). Also, note that the memory requirement is significantly lower than the earlier algorithm using a boolean array.

2 thoughts on “Pangram (holoalphabetic) detection

  1. Ryan Wotsits

    what about characters which are missing? If ‘z’ or ‘Z’ is missing then bitcount of status will be 25 and not 26. However for other characters which are missing the bit count will still be 26. Strange eh? I think this is because the most significant bit is signed…but I still don’t understand why it is not set to 0 giving a total bit count of 26 when ‘z’ (or Z) is missing.

    Reply
    1. Vinay Singh Post author

      Sorry for responding late.
      If the character is missing the corresponding bit position would be 0.
      We are not counting the number of bits, the logic is that is all alphabets are present the variable ‘status’ will be holding 11111111111111111111111111 (in binary, the decimal equivalent would be 67108863) and the ‘if’ statement towards the end of the function test for the same. Note that integer consonants prefixed with 0b represent a binary numbers.

      Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.