Heap

Oscar

May 22, 2020 17:05 Technology

A short introduction of the essential concepts of the heap data structure.

(Part of this blog is adapted from Wikipedia)

1. Introduction

In computer science, a heap is a specialized tree-based data structure which is essentially an almost complete tree that satisfies the heap property: in a max heap, for any given node C, if P is a parent node of C, then the key (the value) of P is greater than or equal to the key of C. In a min heap, the key of P is less than or equal to the key of C. The node at the "top" of the heap (with no parents) is called the root node.

The heap is one maximally efficient implementation of an abstract data type called a priority queue, and in fact, priority queues are often referred to as "heaps", regardless of how they may be implemented. In a heap, the highest (or lowest) priority element is always stored at the root. However, a heap is not a sorted structure; it can be regarded as being partially ordered. A heap is a useful data structure when it is necessary to repeatedly remove the object with the highest (or lowest) priority.

A common implementation of a heap is the binary heap, in which the tree is a binary tree (see figure). The heap data structure, specifically the binary heap, was introduced by J. W. J. Williams in 1964, as a data structure for the heapsort sorting algorithm. Heaps are also crucial in several efficient graph algorithms such as Dijkstra's algorithm. When a heap is a complete binary tree, it has a smallest possible height—a heap with N nodes and for each node a branches always has loga N height.

Note that, as shown in the graphic, there is no implied ordering between siblings or cousins and no implied sequence for an in-order traversal (as there would be in, e.g., a binary search tree). The heap relation mentioned above applies only between nodes and their parents, grandparents, etc. The maximum number of children each node can have depends on the type of heap.

2. Implement of basic functionalities

Heaps are usually implemented with an implicit heap data structure, which is an implicit data structure consisting of an array (fixed size or dynamic array) where each element represents a tree node whose parent/children relationship is defined implicitly by their index. After an element is inserted into or deleted from a heap, the heap property may be violated and the heap must be balanced by swapping elements within the array.

In an implicit heap data structure, the first (or last) element will contain the root. The next two elements of the array contain its children. The next four contain the four children of the two child nodes, etc. Thus the children of the node at position n would be at positions 2n and 2n + 1 in a one-based array, or 2n + 1 and 2n + 2 in a zero-based array. Computing the index of the parent node of n-th element is also straightforward. For one-based arrays is the parent on n/2 position, similarly for zero-based arrays is parent on (n-1)/2 position (floored). This allows moving up or down the tree by doing simple index computations. Balancing a heap is done by sift-up or sift-down operations (swapping elements which are out of order). As we can build a heap from an array without requiring extra memory (for the nodes, for example), heapsort can be used to sort an array in-place.

Different types of heaps implement the operations in different ways, but notably, insertion is often done by adding the new element at the end of the heap in the first available free space. This will generally violate the heap property, and so the elements are then shifted up until the heap property has been reestablished. Similarly, deleting the root is done by removing the root and then putting the last element in the root and sifting down to rebalance. Thus replacing is done by deleting the root and putting the new element in the root and sifting down, avoiding a sifting up step compared to pop (sift down of last element) followed by push (sift up of new element).

Construction of a binary (or d-ary) heap out of a given array of elements may be performed in linear time using the classic Floyd algorithm, with the worst-case number of comparisons equal to 2N − 2s2(N) − e2(N) (for a binary heap), where s2(N) is the sum of all digits of the binary representation of N and e2(N) is the exponent of 2 in the prime factorization of N. This is faster than a sequence of consecutive insertions into an originally empty heap, which is log-linear.

Python has a nice library "binarytree" which print user-friendly binary trees.

import binarytree

array = [100, 19, 36, 17, 3, 25, 1, 2, 7]
tree = binarytree.build(array)
print(tree)

Output:  

         ___100___
        /         \
    ___19         _36
   /     \       /   \
  17      3     25    1
 /  \
2    7

 

2.1 Validation

def heap_validation(array, i, flag='max'):
    if i < len(array):
        print('visit:', array[i])
        ileft = i*2 + 1
        iright = i*2 + 2
        im = i
        if ileft < len(array):
            if ( flag == 'min' and array[ileft] < array[im]) or \
                (flag == 'max' and array[ileft] > array[im]):
                im = ileft
        if iright < len(array):
            if ( flag == 'min' and array[iright] < array[im]) or \
                (flag == 'max' and array[iright] > array[im]):
                im = iright
        if im == i:
            return  heap_validation(array, ileft, flag=flag) and \
                    heap_validation(array, iright, flag=flag)
        else:
            return False
    else:
        return True

heap_validation(array, 0, flag='max')

Output: 

visit: 100
visit: 19
visit: 17
visit: 2
visit: 7
visit: 3
visit: 36
visit: 25
visit: 1
True

 

2.2 Heapify 

2.2.1 Bottom-up method

def heapify_bottomup(array, i, flag='max'):
    if i > 0:
        iparent = int((i-1)//2)
        im = i
        if (flag == 'min' and array[iparent] > array[i]): 
            im = iparent
        if (flag == 'max' and array[iparent] < array[i]): 
            im = iparent
        if i != im:
            array[im], array[i] = array[i], array[im]
            heapify_bottomup(array, iparent, flag=flag)

array = [100, 19, 36, 17, 3, 25, 1, 2, 999]
print(binarytree.build(array))
heapify_bottomup(array, len(array)-1, 'max')
print(binarytree.build(array))

Output:

           ___100___
          /         \
    _____19         _36
   /       \       /   \
  17_       3     25    1
 /   \
2    999

           ___999___
          /         \
    ____100         _36
   /       \       /   \
  19        3     25    1
 /  \
2    17

2.2.2 Top-down method

def heapify_topdown(array, i, flag='max'):
    if i < len(array):
        ileft = i*2 + 1
        iright = i*2 + 2
        im = i
        if ileft < len(array):
            if ( flag == 'min' and array[ileft] < array[im]) or \
                (flag == 'max' and array[ileft] > array[im]):
                im = ileft
        if iright < len(array):
            if ( flag == 'min' and array[iright] < array[im]) or \
                (flag == 'max' and array[iright] > array[im]):
                im = iright
        if im != i:
            array[im], array[i] = array[i], array[im]
            heapify_topdown(array, im, flag=flag)

array = [0, 19, 36, 17, 3, 25, 1, 2, 7]
print(binarytree.build(array))
heapify_topdown(array, 0, 'max')
print(binarytree.build(array))

Output:

         ___0___
        /       \
    ___19       _36
   /     \     /   \
  17      3   25    1
 /  \
2    7

         ___36__
        /       \
    ___19        25
   /     \      /  \
  17      3    0    1
 /  \
2    7

 

2.3 Insertion

def heap_insertion(array, extra, flag='max'):
    array.append(extra)
    heapify_bottomup(array, len(array)-1, flag=flag)
    return

array = [100, 19, 36, 17, 3, 25, 1, 2, 7]
print( binarytree.build(array) )
heap_insertion(array, 999, 'max')
print( binarytree.build(array) )

Output:

         ___100___
        /         \
    ___19         _36
   /     \       /   \
  17      3     25    1
 /  \
2    7

          ______999___
         /            \
    ___100__          _36
   /        \        /   \
  17         19     25    1
 /  \       /
2    7     3

2.4 Deletion

def heap_deletion(array, flag='max'):
    array[0], array[-1] = array[-1], array[0]
    array.pop()
    heapify_topdown(array, 0, flag=flag)
    return

array = [100, 19, 36, 17, 3, 25, 1, 2, 7]
print( binarytree.build(array) )
heap_deletion(array, 'max')
print( binarytree.build(array) )

Output: 

         ___100___
        /         \
    ___19         _36
   /     \       /   \
  17      3     25    1
 /  \
2    7

       ___36__
      /       \
    _19        25
   /   \      /  \
  17    3    7    1
 /
2

2.5 Update the root

def heap_update(array, new_root, flag='max'):
    array[0] = new_root
    heapify_topdown(array, 0, flag=flag)
    return

array = [100, 19, 36, 17, 3, 25, 1, 2, 7]
print( binarytree.build(array) )
heap_update(array, 0, 'max')
print( binarytree.build(array) )

Output:

         ___100___
        /         \
    ___19         _36
   /     \       /   \
  17      3     25    1
 /  \
2    7

         ___36__
        /       \
    ___19        25
   /     \      /  \
  17      3    0    1
 /  \
2    7

 

3. Applications of heap 

3.1 Heap sort

Procedure:

  • Step 1. Build a heap (bottom-up)
  • Step 2. Sorting (top-down)

This animation from Wikipedia nicely shows how the heap sort works:

def heapify_topdown(array, i, N, flag='max'):
    if i < N:
        ileft = i*2 + 1
        iright = i*2 + 2
        im = i
        if ileft < N:
            if ( flag == 'min' and array[ileft] < array[im]) or \
                (flag == 'max' and array[ileft] > array[im]):
                im = ileft
        if iright < N:
            if ( flag == 'min' and array[iright] < array[im]) or \
                (flag == 'max' and array[iright] > array[im]):
                im = iright
        if im != i:
            array[im], array[i] = array[i], array[im]
            heapify_topdown(array, im, N, flag=flag)

def heap_sort(array):
    if len(array) > 1:
        # step 1: build a heap
        for i in range(2, len(array)):
            heapify_bottomup(array, i-1, 'max')
        # step 2: sorting
        for i in range(len(array)-1, 0, -1):
            array[0], array[i] = array[i], array[0]
            heapify_topdown(array, 0, i, 'max')
    return

import random
array = list(range(10))
random.shuffle(array)
print(array)
heap_sort(array)
print(array)

Output:

[9, 4, 5, 2, 1, 7, 3, 6, 0, 8]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

 

3.2 Find the medium in a data stream

This question is from LeetCode (#295) and can be solved by heaps: https://www.aphanti.com/blog/24/


References:

https://en.wikipedia.org/wiki/Heap_(data_structure) 

https://en.wikipedia.org/wiki/Heapsort

Share this blog to:

951 views,
0 likes, 0 comments

Login to comment