Skip to main content
Solved

Import Data from csv and excel files and do the header mapping automatically and and findout difference and stored it into 3rd table of database Export the report


Hi Team,

I want to explore the pipefy as this is low code no code application where we can do the multiple things automate. It would be really great if it will guide me how I can achieve my self learning assignment though this Pipefy Tool.

  1. Import CSV and Excel file in pipefy : I tried to do this with importer application However, It only allow to import the Excel files.
  2. We need to do the Header mapping as headers are not in same order so, how can we achieve this through Pipefy.
  3. Once header mapping done then we need to store the each files of data into 2 tables in 1 Database : I am able to store data into Database through Excel sheet. however, for csv it is still question and header mappping
  4. once data is stored into 2 tables I have to compare 2 tables and findout the difference and stored it into the 3rd Table of that database.
  5. Once 3rd table created I need to export the 3rd table records.
  6. Can Anyone of your community will let me know how I can achieve this as I am exploring this things on google but not findout any article on the same mainly header mapping and comparison of 2 tables.
  7. Or else can you suggest be best way to achieve this? 
  8. If there any video is available or article then please let me know to grab knowledge on this.
  9. My Email Id: khatkechetana@gmail.com

 

Hi Team,

Is it possible into pipefy application? 

 

Thanks,

Chetana


That sounds like a problem that is not as directly solvable with Pipefy only as you describe it. My suggestion would be to either (a) contact Pipefy or another provider of external solutions to code up the solution in an external tool that uses Pipefy’s API, or (b) try to do it on you own. As you are looking for a no-code solution, Zapier and Integromat (now Make) have an amazing integration of Pipefy and would therefore be my suggestion.

I do not fully understand what you mean by “Header mapping”, so I cannot give you concrete examples of how you could do it.


Hi Genietim,


Thank you so much for your response It’s really appreciated.  at least it gives me an idea that we can integrate or write our own code to do it. also below are my understanding about Header mapping.
Example you can consider we have 2 (Excel and CSV) files with different name of the fields

1) Excel file : Employee_ID, Employee_Name, Employee_Email

2) CSV file : Emp_ID, Emp_Name, Emp_Email

so, for comparing this will be difficult as the name of the fields are different so, we need to do the header_mapping 

Like Excel-> Employee_ID csv:->Emp_ID===> will be consider as a E_ID.

That is one standard format

Thanks,

Chetana.


This header mapping is not a problem with the importer, though, right (except that you want the CSV to become Excel files, which can readily be done with various tools)?


Thank you so much for the information Genietim.

Let me check what best I can do with various tools.

 

Thanks & Regards,

Chetana Khatke


hi genietim,

Regarding This header mapping is not a problem with the importer.
May I know  do we have any example or artical or video?


When you use the importer, you specify how you want the columns to be mapped to your Pipefy database.

You simply have two of these mappings, one for the CSV and one for the Excel.


Thank you so much for your Quick response. It’s really appreciated.


Hi Genietim,

We are trying to implement custom code through Graphql to achieve compare 2 tables and findout the difference and stored it in 3rd table.

We have done with below steps :

  1. Import the Excel sheet data and stored it into 1 table of pipefy.
  2. by using integration of Zapier, EasyCSV tool with Pipefy we have imported data and stored it into 2nd table of pipefy.
  3. Now we are stuck with comparison with header mapping with Graphql so, It would be really great if you can help me with the article or example of comparison with Graphql and connect with pipefy. 

I thought you would do header mapping at steps 1 and 2, such that the two resulting tables have the same headers.

In any case, say, you have two Pipefy tables and want to find the difference to put it into another table. The first question is: what is a difference for you? Do you want to put every row of the two tables in there, which is not in the other table? Or just if it is in one and not in the other? Or only if they differ in certain fields? Does the order/index matter or just the values?

 

 


Also, how often does this (have to) happen? Is it frequent?

I somehow cannot imagine a process where this requirement exists, so I would be thankful for some more info before sharing a (without the info) potentially wrong solution. Do you like always empty and re-fill the table? Would it make more sense to take the CSV and Excel from the beginning, finding the difference, without introducing the other two table, or do you need those as well?

I am a bit confused when trying to think this through, sorry, please enlighten me about your process.


Hi genietim,

 

Actually Now we are done with the import excel sheet through Importer app and CSV files through EasyCSV tool connected with Zapier we are able to import both the files data to 2 tables of Pipefy.

We are able to fetch the record from pipefy 2 tables through graphql query.

Now we are trying to do the header mapping from 2 tables and findout the difference in https://app.pipefy.com/graphiql

We need to map the header like this 

// Map the headers of the two tables for comparison
const headerMapping = {
  'Table1 Header A': 'Table2 Header X',
  'Table1 Header B': 'Table2 Header Y',
  // Add more mappings as needed
};

however we feel that this complexity is not possible to compare directly via api. so, we are thinking we can use Apollo server and try to code in graphql and merge with pipefy with all credentials and authorization tokens.

Is it possible that Apollo server can connect with pipefy and do the header mapping and compare 2 tables data and find the difference?

If yes then can anyone from your team guide us how we can achieve this or is there any article and video related to this.

 


Hi @khatkechetana 

 

Again, yes, with some Code (be it JavaScript in your Apollo server, or a short Python script), this is an easy exercise. However, I still have the feeling that I do not actually help you if I just plainly paste the code here, as it seems to me that whatever process you have, this can’t be the most sensible solution. That’s why, before I send a description, I would still prefer if you could answer my questions:

  1. you have two Pipefy tables and want to find the difference to put it into another table. The first question is: what is a difference for you? Do you want to put every row of the two tables in there, which is not in the other table? Or just if it is in one and not in the other? Or only if they differ in certain fields? Does the order/index matter or just the values?
  2. Where are the original CSV/Excel coming from, how often will they “come again”, how often does this duplicate detection (have to) happen? Is it frequent?
  3. Do you intend to always empty and re-fill the table? I.e., when you get new CSV/Excel data, will they append to the first two tables? Will then the script have to make sure that the differences from the previously existing data are not added to the differences-table again? Or do you empty all the tables/the third table/… before adding the new differences? Or will you create three new tables, over and over?
  4. Would it make more sense to take the CSV and Excel from the beginning, finding the difference, without introducing the other two table, or do you need those as well?
  5. Please describe the process a bit more, why CSV and Excel, why finding differences.

1) you have two Pipefy tables and want to find the difference to put it into another table. The first question is: what is a difference for you? Do you want to put every row of the two tables in there, which is not in the other table? Or just if it is in one and not in the other? Or only if they differ in certain fields? Does the order/index matter or just the values?

Answer: 
a) We need to compare 2 tables A and B and find the difference in C table and also compare B and A table and find the difference in D table.("HEAD COUNT RECONCILIATION").

    present in A table not Not in B table --> difference in C table  and present in B table not in A table ---> difference in D table.

b) Order/index not matters here.

 

  Table A     Table B  
Employee_ID Status  salary Emp_id Status  salary
1 Active  10K 2 Active  20K
      3 Active  30K

 

2) Where are the original CSV/Excel coming from, how often will they “come again”, how often does this duplicate detection (have to) happen? Is it frequent?

Answer: It will come from SFTP folder and Frequency could be monthly, Weekly etc.

 

3)Do you intend to always empty and re-fill the table? I.e., when you get new CSV/Excel data, will they append to the first two tables? Will then the script have to make sure that the differences from the previously existing data are not added to the differences-table again? Or do you empty all the tables/the third table/… before adding the new differences? Or will you create three new tables, over and over?
Answer: We will Always use new table to accept excel and csv data comparison.

4)Would it make more sense to take the CSV and Excel from the beginning, finding the difference, without introducing the other two table, or do you need those as well?
Answer: If there is solution present then It will be a great to achieve this. 

5)Please describe the process a bit more, why CSV and Excel, why finding differences.

 Answer: To find HEAD COUNT RECONCILIATION.

 


I see. Please give me some time, I will come back to you in a day or two with a suggestion.

Assuming I would just write a script that does the combination of two SFTP uploaded files, and create a Pipefy Table from their difference – what would be the trigger, how would the script know that it should run and do its job now?


Regarding Trigger When 2 files drop to SFTP folder or we can schedule a time at what time it can run.


Alright. Here is a first draft of a simple Python script that does check an SFTP folder, finds one excel and one csv file that are more recent than a specified age, and finds their difference. Then, it uploads that difference again to the SFTP server. That way, you can keep using all the workflows for the creation of the tables in Pipefy you already have. In theory, it would also be possible to do the table creation etc. in this Python script as well.

To run this regularly, use e.g. `cron` on a unix-system or the `Task scheduler` on Windows.

import datetime
import json
import tempfile
import time
import warnings

import pandas as pd
import pysftp
import requests

########################################################################
# Configuration
########################################################################
# SFTP
sftp_hostname = ""
sftp_username = ""
sftp_password = ""
sftp_directory = ""
max_file_age = 60000 # in ms
# Data
header_mapping = {
# format: "headername in one file": "header name in another file"
"Example": "E.G."
}

# Pipefy
pipefy_token = "..."
organization_id = 0

########################################################################
# Actual script: Query the SFTP files
########################################################################
excel_file = None
csv_file = None
with pysftp.Connection(sftp_hostname, username=sftp_username, password=sftp_password) as sftp:
with sftp.cd(sftp_directory):
files = sftp.listdir_attr()
for file in files:
# change to st_mtime if want to check modification date rather than date added (st_atime)
file_date = file["st_atime"]
if (time.time() - file_date > max_file_age):
if file.filename.endswith(".csv"):
# found a valid csv file
if (csv_file is not None):
warnings.warn(
"Found more than one possible csv file to import. Using the last one...")
csv_file = os.path.join(
tempfile.gettempdir(), file.filename)
sftp.get(file.filename, csv_file) # download the csv file
if (time.time() - file_date > max_file_age):
if file.filename.endswith(".xlsx"):
# found a valid csv file
if (excel_file is not None):
warnings.warn(
"Found more than one possible csv file to import. Using the last one...")
excel_file = os.path.join(
tempfile.gettempdir(), file.filename)
# download the xlsx file
sftp.get(file.filename, excel_file)

########################################################################
# Compare the two files
########################################################################

excel_data = pd.read_xlsx(excel_file)
csv_data = pd.read_csv(csv_file)

excel_data.rename(columns=header_mapping, inplace=True)
csv_data.rename(columns=header_mapping, inplace=True)

file_difference = pd.concat([csv_data, excel_data]).drop_duplicates(
keep=False, inplace=False)

difference_file = "{}-{}-diff.xlsx".format(
os.path.basename(excel_file), os.path.basename(csv_file))
file_difference.to_excel(difference_file)

########################################################################
# Upload the resulting file
########################################################################

with pysftp.Connection(sftp_hostname, username=sftp_username, password=sftp_password) as sftp:
with sftp.cd(sftp_directory):
sftp.put(difference_file)

Two libraries are used that must be installed first:

`python -m pip install pandas` and `python -m pip install pysftp`


Hi Genietim,

 

Thank you so much for your reply. we will create setup and try this python script to execute for expected result

 


You have a complex workflow that involves importing CSV and Excel files, performing header mapping, storing data in tables, comparing tables, and exporting results using Pipefy. While Pipefy is primarily a workflow management platform, some parts of your workflow may require integration with other tools or services. Here's a general guide on how you might approach this:

  1. Importing CSV and Excel Files:

    • Pipefy itself might not be the best tool for importing CSV files. Consider using a dedicated tool or script for data import. You can use Zapier to automate this process.
  2. Header Mapping:

    • Perform header mapping before importing data into Pipefy. You can use tools like Microsoft Excel or Google Sheets to rearrange or map headers. Ensure that the headers align correctly with the fields in Pipefy.
  3. Storing Data in Tables:

    • Once you have the data in the correct format, use Pipefy to create cards or entities for each record. Pipefy has databases that can store structured information. You can customize the fields in your cards to match the columns in your CSV or Excel files.
  4. Comparing Tables and Storing Differences:

    • Unfortunately, Pipefy might not be the best tool for directly comparing two tables or storing differences. Consider using a database or a tool like Google Sheets or Excel for this step. Write a script or use formulas to compare data in different tables and identify differences.
  5. Creating a 3rd Table:

    • If the differences need to be stored in a 3rd table, you may need to use an external database or spreadsheet tool. You can set up a separate table or database to capture the differential data.
  6. Exporting 3rd Table Records:

    • Once the 3rd table is populated, use the export functionalities of the external tool (e.g., database export, CSV export) to get the data out. You can also manually download or use automation tools depending on where your 3rd table is stored.

 


Reply