Workflow meta in graph database
Graph database is emerging as a couture for storing data. If you haven’t caught up, please run up to catch with latest trends. Like with many new approaches the one thing that daunts this space is modelling of domain in the graph parlance. You can connect with that challenge if you recollect how, we struggled when we sat for the first time to understand Object Oriented Programming in C++. Wasn’t that quite a hurdle to cross over? How can a code written in a computer create a Car and when we call Honk will it emit the honk sound? Forget the concepts of OOP the very fact that real life examples were used it led to confusions of various kind. This is just one type; I am sure you; our readers will have experienced many other kinds of confusions.
To overcome this unsurmountable challenge, we will take an example. We will be using Azure Cosmos DB. We have used the graph API and created a Graph with name property as partition. We use our local developer machine to connect to the remote server. So needless to say, we have installed the Apache Tinkerpop Gremlin console in the local machine.
We hope you configure the console for working with the remote server using Microsoft documentation on that topic.
In any IT Services environment this is a bread-and-butter activity for the engineers to execute. They execute a workflow to accomplish a client’s goal and thereby earn.
Many IT companies will have standardized certain workflow as run book which is to be followed by every engineer who wishes to accomplish a specific goal.
Workflow is an interesting topic to model in graph database as well. It is sequential with branches at any stage. If you understand the base concepts of graph database; namely Vertex and Edge; you will have intuitively generated a model by now.
Let us go with the intuition. Each stage is a Vertex and the transition between the stages is modelled as Edge. To make it concrete let us model the workflow that is easily accessible to you.
Documentation of how to connect to AWS EC2 Linux Server from a windows client. We will model two workflows to encounter the tough decisions that a database modeller encounters. We will model for connecting via SSH Client and via WSL from windows client.
The sequence of steps to connect using SSH client has a flow like this –
When you compare to the documentation, we have skipped the pre-requisite of installing the SSH client in the windows client. We feel it is obvious that required software should be present before this workflow is followed. That doesn’t mean it is not required in the docs in AWS. It is required that an explicit mention is made from education perspective but from our context that is kind ofgiven so we skip.
The sequence of steps to connect using WSL has a flow like this –
Outright you will have noticed much of the steps are same. There are couple of this that we should take note of in these workflows at this stage.
Armed with those observations we will now go about creating the graph using Gremlin language.
The graph
|
g.addV(‘WORKFLOW’).property(‘id’, ‘ConnectUsingSSH’).property(‘name’, ‘Connect to Linux EC2 instance from windows machine using SSH client’) |
From modelling perspective, we leveraged the context as much as possible. This is a workflow so why not label the vertex so?We do that in the section – g.addV(‘WORKFLOW’). This is a convention we follow that label be capitalized. We use ‘- ‘for separation of words just in case there are two words required to label a vertex. The rest of sections in the line above are self-explanatory.
One thing that we want to call out is; since, we are using CosmosDB we have ready access to the graph in static variable g. This will not be the in case you are connecting to local instance of a graph database which implements the Tinkerpopframework. You will have to initialize the variable in such case. But let us continue with CosmosDB for now and not much bother about the local instance of Tinkerpop implementation.
Also, since you are working with Gremlin console the moment you press enter the action gets executed in server.
We have also created the head of the workflow. The acts kind of title in a document if had you documented this workflow in a document. Let us speed up the process and add all the steps for this workflow –
|
g.addV(‘WORKFLOW-PREREQUISITE’).property(‘id’, ‘CTL-SSH-Windows-PreStep1’).property(‘name’, ‘Check instance status’)
g.addV(‘WORKFLOW-PREREQUISITE’).property(‘id’, ‘CTL-SSH-Windows-PreStep2’).property(‘name’, ‘Get public DNS name and user name to connect to instance’)
g.addV(‘WORKFLOW-STEP’).property(‘id’, ‘CTL-SSH-Windows-Step1’).property(‘name’, ‘Issue SSH command using public DNS name’).property(‘command’,’ssh -i /path/my-key-pair.pem my-instance-user-name@my-instance-public-dns-name‘)
g.addV(‘WORKFLOW-STEP’).property(‘id’, ‘CTL-SSH-Windows-Step11’).property(‘name’, ‘Issue SSH command using IPv6 address’).property(‘command’,’ssh -i /path/my-key-pair.pem my-instance-user-name@my-instance-IPv6-address’) |
One thing that might stand out first if you are pick for the names will be the repetition of the word WORKFLOW. However, it comforts us to not interpret the PREREUQISITE as a prerequisite for something else in the graph as it evolves. The other interesting thing is to capture the information available from the website in the essence of it where we have added a property to the Vertex that goes as Step#1. This is the command that needs to be executed. Modelling this command as separate node will require one additional hop plus the separation of the title i.e., the description of what happens and the action i.e., the command is superfluous.
If you squint your eye a further, you will notice we do not have Step#2. This is because of the observation we made earlier where both the Step#1 and Step#2 both will never executed together or in sequence. Thus, we call it Step11 instead of Step2. We have our vertices ready. The fun of linking them begins now –
|
g.V(‘ ConnectUsingSSH’).addE(‘NEXT-STEP’).to(g.V(‘CTL-SSH-Windows-PreStep1’)) g.V(‘ConnectToLinuxUsingSSH-Windows’).addE(‘NEXT-STEP’).to(g.V(‘CTL-SSH-Windows-PreStep2’)) g.V(‘CTL-SSH-Windows-PreStep1’).addE(‘NEXT-STEP’).to(g.V(‘CTL-SSH-Windows-Step1’)).property(‘workflow’, ‘ConnectUsingSSH‘) g.V(‘CTL-SSH-Windows-PreStep1’).addE(‘NEXT-STEP’).to(g.V(‘CTL-SSH-Windows-PreStep2’)).property(‘workflow’, ‘ConnectUsingSSH‘) g.V(‘CTL-SSH-Windows-PreStep2’).addE(‘NEXT-STEP’).to(g.V(‘CTL-SSH-Windows-Step1’)).property(‘workflow’, ‘ConnectUsingSSH‘) g.V(‘CTL-SSH-Windows-PreStep2’).addE(‘NEXT-STEP’).to(g.V(‘CTL-SSH-Windows-PreStep1’)).property(‘workflow’, ‘ConnectUsingSSH‘) g.V(‘CTL-SSH-Windows-Step1’).addE(‘ALTERNATE-STEP’).to(g.V(‘CTL-SSH-Windows-Step11’)).property(‘workflow’, ‘ConnectUsingSSH‘) |
There is a bit of gremlin standard which will skip narrating here. You can gather about creating edges in their documentation. From modelling perspective, the interesting things that stand out are –
For now, our one part of the graph is ready. We will create the next graph in our next graph. And show you the wonders of retrieving data.
image courtesy- PxHere
Recent post
Archives
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- October 2023
- June 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- January 2021
- December 2020
- October 2020
- August 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
May 29, 2021