Monday 12 January 2015

Incremental sub-sequence out of random stream, a new C++ algorithm

This article is all about a problem that I've read on "hackerrank", it was a difficult problem, where I tried to solve it using a strange data structure.

PROBLEM: 
A sub-sequence of a "random" sequence/list/stream, is obtained by deleting zero or more elements from the list

Suppose, a given a list A, where every element is a pair of integers ( i.e  A = [(a1, w1), (a2, w2),..., (aN, wN)] ), then, a sub-sequence ( B = [(b1, v1), (b2, v2), ...., (bM, vM)] is called increased sequence, and it is increasing if (bi < bi+1for every i (1 <= i < M).

It is required to determine the maximum weight among all weights of all increased sub-sequences "Weight(B)", where (Weight(B) = v1 + v2 + ... + vM).




Click "here" to read more about the problem.


SOLUTION:
I have chosen "graph data structure" to help me finding all possibilities of all increment sub-sequences of the given random list/stream.




I put a directed relationship "same, lower or next" between each two nodes in the graph, the relationship between any two nodes can be considered as the relationship between the pronouns “I and it” - where “I” is any node in the graph and “it” is the next-iterative-node picked-up from the random-list loop -, this relationship is only one of the following:

It is called same if "I.s == it.s" My sequence equals its sequence
It is called lower if "I.s > it.s" My sequence greater than its sequence
It is called greater if "I.s < it.s" My sequence lower than its sequence




To construct the graph based on those relations, I used the next insertion algorithm:




Insertion algorithm:
If same, then, link it as my last same, and, give it to my lowers to recursively insert it to themselves using this algorithm.
If I don’t have any next and it is greater: then add it to my "nexts" list.
If it is linked as lower/same to my-direct-next: then link it as next to myself and also to my all "sames".
If it is my direct next: then insert it to my all "lowers" if only I don’t have any "same".




This algorithm is constructing the graph in O(n) complexity. Where it is O(M) to retrieve all elements of any sub-sequence from the graph, where M is constant and it is the longest length of a sub-sequence.




The C++ code has an additional step to determine maximum-weight directly during inserting in a fast way.




Graph Node:
The following figure illustrates graph node as a concept:













Example:
Suppose a list A={5 4 2 3 6} the next figure illustrates the resulted graph.













Suppose a list A={1 5 4 2 3 5 6} the next figure illustrates the resulted graph.






C++ Code:
#include <cmath>
#include <cstdio>
#include <vector>
#include <iostream>
#include <algorithm>
#include <string.h>
using namespace std;

//----------------------------------------------
//Generic pointer
#define _PTR void*

//----------------------------------------------
//Generic pointer
#define G_PTR void*
//----------------------------------------------
//signed data types
typedef long long int64;
typedef unsigned long long u_int64;
#define G_CHAR char
#define G_SHORT int
#define G_INT long
#define G_LONG int64
//----------------------------------------------
//unsigned data types
#define G_BYTE unsigned char
#define G_WORD unsigned int
#define G_DWORD unsigned long
#define G_QWORD u_int64
//----------------------------------------------
//Bitwise flages
#define IS_NEXT 0x01
#define IS_SAME 0x02
#define IS_LOWER 0x04


//----------------------------------------------
//Graph node to represent each element in the provided sequence
typedef struct GRAPH
{
       //b: sequnce number, v: its weight
       G_QWORD b, v; //pair of (seq,wei)


       //Node default constructors
       GRAPH() :listID(0), b(0), v(0), maxAccumSteps(1), maxAccumWei(0), maxAccumStr("") {}
       //Node copy constructors, no need for assignment operator
       GRAPH(G_QWORD ii, G_QWORD bb, G_QWORD vv)
       {
              listID = ii;
              b = bb;


              //Quick finding technique
              maxAccumWei = v = vv;
              maxAccumSteps = 1;
       }
       //Linkes to tied other graph-nodes
       G_QWORD listID; //index in the pGraph array
       GRAPH* same = NULL;//pointer to graph object that carry same sequence number but comes later
       GRAPH* lower = NULL;
       vector<GRAPH*> next;//list of pointers to allow different possible subsequences


       //Flages to stop graph-infinity and graph-creeping
       G_BYTE bFlages = 0;


       inline void SetNext(bool b){ bFlages = (b ? (bFlages | IS_NEXT) : (bFlages & ~IS_NEXT)); }
       inline void SetSame(bool b){ bFlages = (b ? (bFlages | IS_SAME) : (bFlages & ~IS_SAME)); }
       inline void SetLower(bool b){ bFlages = (b ? (bFlages | IS_LOWER) : (bFlages & ~IS_LOWER)); }


       inline bool IsNext(){ return (bFlages & IS_NEXT) == IS_NEXT; }
       inline bool IsSame(){ return (bFlages & IS_SAME) == IS_SAME; }
       inline bool IsLower(){ return (bFlages & IS_LOWER) == IS_LOWER; }


       //Max values, quick finding
       G_QWORD maxAccumSteps;
       G_QWORD maxAccumWei;
       string maxAccumStr;
       inline void PushNext(GRAPH* gNode, G_QWORD& maxSteps, G_QWORD& maxWei, string& maxStr)
       {
              //gNode is the most recent GRAPH node, it may be linked before to other nodes
              next.push_back(gNode);
              //Determine  each graph node
              if (maxAccumWei + gNode->v > gNode->maxAccumWei)
              {
                     gNode->maxAccumSteps = maxAccumSteps + 1;
                     gNode->maxAccumWei = maxAccumWei + gNode->v;//it may give me lower value than the past value inside gNode->maxAccumWei .. and that is correct decision because of gNode->maxAccumSteps is now greater
              }


              //Adjust maximum
              if (gNode->maxAccumWei > maxWei)
              {
                     maxSteps = gNode->maxAccumSteps;
                     maxWei = gNode->maxAccumWei;
                     maxStr = gNode->maxAccumStr;
              }
       }




}GRAPH;


//----------------------------------------------
//Sequence node to represent input data and needed functionality
typedef struct SEQUENCE
{
       G_QWORD len = 0; //len=N: it is input sequence length .. 1 <= N <= 150,000
       G_QWORD* seq = NULL;//seq=a: 1 <= a[i] <= 1,000,000,000, where i = [1..N]
       G_QWORD* wei = NULL; //wei=w: 1 <= w[i] <= 1,000,000,000, where i = [1..N]


       G_PTR* pGraphs;


       G_QWORD maxSubWeight = 0; //maximum is: 1,000,000,000 x 150,000 = 150,000,000,000,000
       G_QWORD maxSubSeqCount = 0;//maximum is 150,000
       string maxSubStr = "";//maximum length: 11 * 150,000 = 22,500,000,000 ~= 20.9GB


       G_QWORD weight()
       {
              AnalyzeBy_Posibility_Graph();
              return maxSubWeight;
       }


       void AnalyzeBy_Posibility_Graph()
       {
              //#define GENERATE_STR
              pGraphs = new G_PTR[len];
              memset(pGraphs, 0, sizeof(G_PTR)* len);//take care ... this function is defined in <string.h>


              GRAPH* root = NULL;


              if (len > 0)
              {
                     maxSubWeight = 0;
                     maxSubSeqCount = 0;


                     root = new GRAPH(0, seq[0], wei[0]);
                     pGraphs[0] = root;


                     for (G_QWORD i = 1; i < len; i++)
                     {
                           pGraphs[i] = new GRAPH(i, seq[i], wei[i]);
                           insert(root, (GRAPH*)pGraphs[i]);
                     }


                     //cout << maxSubWeight << endl;


              }


              remove(pGraphs, len);


              delete[] pGraphs;
       }


       void getMax(GRAPH* grp, G_QWORD subCount = 0, G_QWORD accumWei = 0, string accumStr = "")
       {
              if (grp == NULL)
              {
                     if (subCount > maxSubSeqCount)
                     {
                           maxSubSeqCount = subCount;
                           maxSubWeight = 0;
                     }


                     if (subCount == maxSubSeqCount && accumWei > maxSubWeight)
                     {
                           maxSubWeight = accumWei;
                     }


                     return;
              }




              if (grp->next.size() <= 0)
              {
                     getMax(NULL, subCount, grp->v + accumWei);
              }
              else
              {
                     for (long n = 0; n < grp->next.size(); n++)
                     {
                           getMax(grp->next[n], subCount + 1, grp->v + accumWei);
                     }
              }


              if (grp->listID == 0)
              {
                     GRAPH* temp = grp->lower;
                     while (temp)
                     {
                           getMax(temp, 1);
                           temp = temp->lower;
                     }


                     temp = grp->same;
                     while (temp)
                     {
                           getMax(temp, 1);
                           temp = temp->same;
                     }
              }


       }


       void remove(G_PTR* pLst, G_QWORD len)
       {
              for (G_QWORD g = 0; g < len; g++)
              {
                     delete ((GRAPH*)pLst[g]);
                     pLst[g] = NULL;
              }
       }


       void insert(GRAPH* root, GRAPH* gNode, GRAPH* fireWallNode = NULL)
       {
              //case1: empty nodes .. will not happen at all
              if (root == NULL || gNode == NULL)
                     return;


              //case2: same sequence
              if (root->b == gNode->b)
              {
                     if (root->same == NULL)
                     {
                           root->same = gNode;
                           gNode->SetSame(true);
                     }
                     else
                     {
                           if (root->same->listID < gNode->listID) //stop graph infinite same circulation
                                  insert(root->same, gNode);
                     }


                     insert(root->lower, gNode, root);
                     return;
              }


              //case3: the next sequence case
              if (root->b < gNode->b)
              {
                     //Me, same and lower: we are all in the same posibility-level
                     //e.g.: 1,2,3,4, 1,2,3,4, 1,2,3,4,5,6
                     G_QWORD id = 0;
                     GRAPH* next = root->next.size() <= id ? NULL : root->next[id];
                     //Make a lower firwall .. that case comes if a lower to lower will be inserted into
                     //a firewall's next .. to complex I know ..[e.g. 1 5 3 2 7 5 6 3 1 2 4 5 8 9] last 4 and first 7 has that problem
                     while (fireWallNode == NULL ? false : (next != NULL && fireWallNode->b <= next->b))
                     {
                           id++;
                           next = root->next.size() <= id ? NULL : root->next[id];
                     }
                     //decent deeper
                     if (next == NULL)
                     {
                           root->PushNext(gNode, maxSubSeqCount, maxSubWeight, maxSubStr);
                           gNode->SetNext(true);


                           if (root->same == NULL)//may lower exists
                           {
                                  if (root->lower != NULL)
                                         insert(root->lower, gNode, root);
                           }
                           else
                           {
                                  GRAPH* temp = root->same;
                                  while (temp)
                                  {
                                         temp->PushNext(gNode, maxSubSeqCount, maxSubWeight, maxSubStr);
                                         temp = temp->same;
                                  }
                           }


                           return;
                     }
                     else
                     {
                           //I have next .. give my next the control to decide for himself
                           insert(next, gNode);


                           //but wait .. it may be lower or same for my next: so...
                           if (next->b >= gNode->b)
                           {
                                  //It is lower and not a child of one of the other lowers of my next, so give it a next-link
                                  if (next->b > gNode->b && !gNode->IsNext())//walk to left .. but not deeper in that left
                                         root->PushNext(gNode, maxSubSeqCount, maxSubWeight, maxSubStr);


                                  //Take care of the longer thread and neglect the shorter
                                  if (next->b == gNode->b && next->lower == NULL)//wake to right .. no branch in left
                                         //!!!!!!!! todays short .. tomorrows long !!!!!!!!!! e.g 1 5 3 2 7 5 6 3 1 2 4 5 8 9 >> x x x x x x x x 1 2 4 5 8 9
                                  {
                                         //My next has inserted his same under his lowers, I've already linked to my next's lowers
                                         root->PushNext(gNode, maxSubSeqCount, maxSubWeight, maxSubStr);
                                         GRAPH* temp = root->same;
                                         while (temp)
                                         {
                                                temp->PushNext(gNode, maxSubSeqCount, maxSubWeight, maxSubStr);
                                                temp = temp->same;
                                         }
                                  }
                           }
                           else
                           {
                                  //If I has same with no next, don't let him unlinked for that greatest next
                                  GRAPH* temp = root->same;
                                  while (temp)
                                  {
                                         if (temp->next.size() <= 0)
                                                temp->PushNext(gNode, maxSubSeqCount, maxSubWeight, maxSubStr);
                                         temp = temp->same;
                                  }
                           }
                           return;
                     }
              }


              //case4: the lower sequence case
              if (root->b > gNode->b)
              {
                     //e.g.: 1,2,3,5,1,4,2,3,6,4,2,7,5,6,7,8,9
                     if (root->lower == NULL)
                     {
                           if (!root->IsSame())//don't put any lower for any same, sames only in right and lowers only in left
                           {
                                  root->lower = gNode;
                                  gNode->SetLower(true);
                           }
                     }
                     else
                     {
                           if (root->lower->listID < gNode->listID)//stop graph infinite lower circulation
                                  insert(root->lower, gNode, root);
                     }


                     //previous of lower msut be preserved if it has less sequence.
                     if (gNode->listID > 1 && gNode->b > ((GRAPH*)pGraphs[gNode->listID - 1])->b)
                     {
                           GRAPH& prv = *(GRAPH*)pGraphs[gNode->listID - 1];
                           prv.PushNext(gNode, maxSubSeqCount, maxSubWeight, maxSubStr);
                     }
              }


              return;


       }
}SEQUENCE;


void SubSequence()
{
       //How many test cases
       G_QWORD N = 0;
       cin >> N;
       if (N <= 0)
              return;




       for (G_QWORD x = 0; x < N; x++)
       {
              SEQUENCE seq;
              cin >> seq.len;


              if (seq.len <= 0)
                     continue;


              seq.seq = new G_QWORD[seq.len];
              for (G_QWORD s = 0; s < seq.len; s++)
                     cin >> seq.seq[s];


              seq.wei = new G_QWORD[seq.len];
              for (G_QWORD s = 0; s < seq.len; s++)
                     cin >> seq.wei[s];


              cout << seq.weight() << endl;


              delete[] seq.seq;
              delete[] seq.wei;
       }


}




int main() {
       /* Enter your code here. Read input from STDIN. Print output to STDOUT */
       SubSequence();
       return 0;
}



No comments:

Post a Comment