Commit 22210a53 authored by Prateek Lal's avatar Prateek Lal

final code 27 jan

parent 4b11311a
File added
prayasjain12sci@gmail.com,Shrastigupta0207@gmail.com,muskanjain74170@gmail.com
"PRAYAS JAIN prayasjain12sci@gmail.com +91-9540785749 Professional Summary: ● Having 5.6+ years of experience in all phases of software application development, maintenance & enhancement in web-based applications. ● Experience in developing web applications using HTML5, CSS3, JavaScript, Typescript, ● Angular 2+, React Js, Redux, Node JS, SCSS, Ajax, JSON, JEST. ● I Have good Experience in web development using Angular, React Js and have basic knowledge in VueJs. ● Having experience on Testing frameworks like Jasmine using a Karma runner. ● Debugged and troubleshoot JavaScript code using tools such as Firebug, Chrome Dev Tools. ● Followed Agile development and consistently delivered new features on time during sprints. ● Keep updated with the upcoming new technologies/trends in UI development. ● Self-motivated and can quickly grasp and implement new technologies. Technical Skills: UI Technologies HTML5, CSS3, JavaScript, Bootstrap, Ajax, SCSS, Typescript, Tailwind CSS, SCSS Frameworks NxJs, Angular 2+ Library React Js, Redux, Redux Toolkit, Next JS (Basics) Tools Eclipse, VSCode, Chrome dev tool Testing Tool Jest, Jasmine and Karma Database MySQL, MongoDB Version Control Tool Git Others NodeJs(Basics), GIT Hooks, Husky, Cypress (E2E Testing), StoryBook Work Experience: Company Name Nisum Consulting Pvt. Ltd. Designation Software Engineer. Start date July 2018 End date till dateProfessional Experience: PROJECTS #: PROJECT#1: Project Title: UPP (Albertsons) Period: Feb 2023 – Till Date Technologies: NxJS, React JS, Redux Toolkit, GraphQL, React Testing Library, Tailwind CSS Description: MEUPP (Merchandising Unified Promotions Platform) is a promotional platform creating a seamless user experience across Merchants, Vendors from planning, forecasting, and detailing to executing promotions across store and ecommerce channels. The idea is to provide a one-stop shop for the merchants and vendors. Currently, they have multiple applications such as APEX, Edeal, etc to move around to create and manage promotions and allowance, once MEUPP is fully developed, they can perform everything on this one platform. PROJECT#2: Project Title: Offer Management System (Safeway) Period: April 2020 – Feb 2023 Technologies: HTML5, CSS3, Angular, Typescript Description: Offer Management System (OMS) is a tool used by Safeway to create different type of offers- Store Coupons, Manufacturer Coupons, Rewards which can be used either in Safeway stores, Digitally (J4U) or both. Offer can be created, edited, processed, cancelled with OMS. These Offers will be provided to customers based on their purchase history either digitally or In stores. PROJECT#3: Project Title: CCUI (Gap) Period: June 2019 – March 2020 Technologies: HTML5, CSS3,React JsDescription: CCUI is a call center user Interface used by Gap call center people. With the help of this application users can create, modify or can make a return to an order. It consists of four main modules- Order Capture, Order Maintenance, Blind Returns and Alert Management. PROJECT#4: Project Name: GAP Shopping Bag (Gap) Period: Feb 2019 to April 2019 Technologies: HTML5, CSS3, Angular, Bootstrap Description: GAP is an apparel store with a market across 90 countries. GAP.com is the ecommerce application for the store with more than 9L unique visitors per day. PROJECT#5: Project Title: My Team (In House) Period: Nov 2018 – Jan 2019 Technologies: HTML5, CSS3, React Js. Description: The Application helps the organization to monitor Employees login and logout timing , Daily Efforts of Employees, Employee Management, Project Management , Employees Allocation Details to different Projects, Employees Billing status in Organization, Work Anniversary Mail Notification , Leave Notification to Managers and HR and many more. Education: • B. Tech (ECE) from AKGEC, Ghaziabad with 74.08% • 12th Standard from J.M.P.S, Agra with 86.04%. • 10th Standard from J.M.P.S, Agra with 9.6 CGPA.",,
,"SHRASTI GUPTA +91-9756838981 Shrastigupta0207@gmail.com linkedin.com/in/shrastigupta PROFILE SUMMARY Experienced Software Consultant with a strong background in automation testing using Nightwatch.js and Scala for development, with expertise in test case design, API testing, and CI/CD pipeline management. EXPERIENCE NASHTECH Software Consultant (FEB-2023 to Present) DISTRIBUTED MANAGEMENT SYSTEM Automation Consultant (DEC-2023 to Present) Details: Managed the agency onboarding process, including sending invitations, collecting and processing agency information forms (e.g., contact details, bank information), and completing the onboarding. Handled and maintained comprehensive data on agency locations, market information, and contacts. Roles and Responsibility : • Converted Selenium scripts to Nightwatch.js, enhancing test accuracy by 25% and improving maintainability. • Identified and assigned over 30 bugs using Azure DevOps, reducing resolution time by 15%. • Enhanced test coverage by 30% through additional test cases in Nightwatch.js. • Reduced bug resolution time by 15% by optimizing assignment workflows in Azure DevOps. • Addressed gaps in Selenium test coverage by writing additional test cases in Nightwatch.js, enhancing overall test coverage. • Located and implemented stored procedures to display data on the UI, ensuring accurate data presentation. • Optimized the automation process, reducing manual testing efforts by 20%. • Executed test pipelines on ADO Board, facilitating continuous integration and delivery. GO1 % Software Intern (Sep-2023 to Nov 2023) • Developed and executed automated test scripts for the Go1 Percent website using Nightwatch.js, improving test coverage and reliability. • Conducted API testing to ensure seamless communication between the web application and backend services. 55 RUSH Software Intern (Feb-2023 to Aug 2023) • Gained hands-on experience with Scala programming, focusing on Akka Actors and Akka HTTP. • Worked on troubleshooting and resolving coding issues, applying critical thinking to ensure project execution. EDUCATION ABES ENGINEERING COLLEGE Ghaziabad, UP MCA (76%) Nov 2021-July 2023 TECHNICAL SKILLS • Fundamentals: Data Structures, Algorithms, Object-oriented programming (OOP). • Programming Languages: Java, Scala , JavaScript, TypeScript. • Framework: Selenium, Nightwatch.js, TestNG, JUnit, OWSAP ZAP. • Database: MYSQL, SQL, Postgress. • Web Development: Html, CSS. • Tools & Software: Postman (API Testing), TestRail, GIT (Version Control), Jenkins (Continuous Integration, Continuous Deployment, IntelliJ IDEA (IDE), VS code, Azure Devops board, JIRA. • Salesforce Technologies: Lightning Components (Aura and LWC), Data Loader, SOQL, Process Builder, Workflow Rules, Visualforce, Reports and Dashboards. ACHIEVEMENTS AND CERTIFICATES • Received client appreciation for excellent work completed in two sprints. • Awarded ""The Conquerors - One team, One dream!"" during the annual day celebration. • Received praise for being a team player from Manager on Keka portal. • Secure certification in Java Fundamental by Great Learning Academy. CODING PROFILES leetcode.com/shrasti_0207/ auth.geeksforgeeks.org/user/shrastigupta0207/ hacckerrank.com/shrastigupta0207",
,,"MUSKAN JAIN muskanjain74170@gmail.com +91-7906778545 Career Objective: Seeking a challenging position in a reputable organization to expand and utilize my learning, skills and knowledge. Possess good communication skills with MS excel, SEO work, Operation work and have an eye for detail. Flexible to work in any environment as required. Skills: ● Good communication skills. ● MS Excel ● SEO ● Ability to handle work pressure. ● Good skills in operation work. ● Time management. ● Work ethics. Work Experience: SEO Intern for 6 Months for an internal Application in an Organization. Education: • M.COM from AK College, Shikohabad in 2021. • B.COM from AK College, Shikohabad in 2019. • 12th Standard from Jain Inter College, Karhal in 2016. • 10th Standard from Jain Inter College, Karhal in 2014. Hobbies: • Dancing • Internet Surfing Disclaimer: I hereby declare that all the above information is true to the best of my knowledge. Muskan Jain"
[
{
"prayasjain12sci@gmail.com": "PRAYAS JAIN prayasjain12sci@gmail.com +91-9540785749 Professional Summary: \u25cf Having 5.6+ years of experience in all phases of software application development, maintenance & enhancement in web-based applications. \u25cf Experience in developing web applications using HTML5, CSS3, JavaScript, Typescript, \u25cf Angular 2+, React Js, Redux, Node JS, SCSS, Ajax, JSON, JEST. \u25cf I Have good Experience in web development using Angular, React Js and have basic knowledge in VueJs. \u25cf Having experience on Testing frameworks like Jasmine using a Karma runner. \u25cf Debugged and troubleshoot JavaScript code using tools such as Firebug, Chrome Dev Tools. \u25cf Followed Agile development and consistently delivered new features on time during sprints. \u25cf Keep updated with the upcoming new technologies/trends in UI development. \u25cf Self-motivated and can quickly grasp and implement new technologies. Technical Skills: UI Technologies HTML5, CSS3, JavaScript, Bootstrap, Ajax, SCSS, Typescript, Tailwind CSS, SCSS Frameworks NxJs, Angular 2+ Library React Js, Redux, Redux Toolkit, Next JS (Basics) Tools Eclipse, VSCode, Chrome dev tool Testing Tool Jest, Jasmine and Karma Database MySQL, MongoDB Version Control Tool Git Others NodeJs(Basics), GIT Hooks, Husky, Cypress (E2E Testing), StoryBook Work Experience: Company Name Nisum Consulting Pvt. Ltd. Designation Software Engineer. Start date July 2018 End date till dateProfessional Experience: PROJECTS #: PROJECT#1: Project Title: UPP (Albertsons) Period: Feb 2023 \u2013 Till Date Technologies: NxJS, React JS, Redux Toolkit, GraphQL, React Testing Library, Tailwind CSS Description: MEUPP (Merchandising Unified Promotions Platform) is a promotional platform creating a seamless user experience across Merchants, Vendors from planning, forecasting, and detailing to executing promotions across store and ecommerce channels. The idea is to provide a one-stop shop for the merchants and vendors. Currently, they have multiple applications such as APEX, Edeal, etc to move around to create and manage promotions and allowance, once MEUPP is fully developed, they can perform everything on this one platform. PROJECT#2: Project Title: Offer Management System (Safeway) Period: April 2020 \u2013 Feb 2023 Technologies: HTML5, CSS3, Angular, Typescript Description: Offer Management System (OMS) is a tool used by Safeway to create different type of offers- Store Coupons, Manufacturer Coupons, Rewards which can be used either in Safeway stores, Digitally (J4U) or both. Offer can be created, edited, processed, cancelled with OMS. These Offers will be provided to customers based on their purchase history either digitally or In stores. PROJECT#3: Project Title: CCUI (Gap) Period: June 2019 \u2013 March 2020 Technologies: HTML5, CSS3,React JsDescription: CCUI is a call center user Interface used by Gap call center people. With the help of this application users can create, modify or can make a return to an order. It consists of four main modules- Order Capture, Order Maintenance, Blind Returns and Alert Management. PROJECT#4: Project Name: GAP Shopping Bag (Gap) Period: Feb 2019 to April 2019 Technologies: HTML5, CSS3, Angular, Bootstrap Description: GAP is an apparel store with a market across 90 countries. GAP.com is the ecommerce application for the store with more than 9L unique visitors per day. PROJECT#5: Project Title: My Team (In House) Period: Nov 2018 \u2013 Jan 2019 Technologies: HTML5, CSS3, React Js. Description: The Application helps the organization to monitor Employees login and logout timing , Daily Efforts of Employees, Employee Management, Project Management , Employees Allocation Details to different Projects, Employees Billing status in Organization, Work Anniversary Mail Notification , Leave Notification to Managers and HR and many more. Education: \u2022 B. Tech (ECE) from AKGEC, Ghaziabad with 74.08% \u2022 12th Standard from J.M.P.S, Agra with 86.04%. \u2022 10th Standard from J.M.P.S, Agra with 9.6 CGPA."
},
{
"Shrastigupta0207@gmail.com": "SHRASTI GUPTA +91-9756838981 Shrastigupta0207@gmail.com linkedin.com/in/shrastigupta PROFILE SUMMARY Experienced Software Consultant with a strong background in automation testing using Nightwatch.js and Scala for development, with expertise in test case design, API testing, and CI/CD pipeline management. EXPERIENCE NASHTECH Software Consultant (FEB-2023 to Present) DISTRIBUTED MANAGEMENT SYSTEM Automation Consultant (DEC-2023 to Present) Details: Managed the agency onboarding process, including sending invitations, collecting and processing agency information forms (e.g., contact details, bank information), and completing the onboarding. Handled and maintained comprehensive data on agency locations, market information, and contacts. Roles and Responsibility : \u2022 Converted Selenium scripts to Nightwatch.js, enhancing test accuracy by 25% and improving maintainability. \u2022 Identified and assigned over 30 bugs using Azure DevOps, reducing resolution time by 15%. \u2022 Enhanced test coverage by 30% through additional test cases in Nightwatch.js. \u2022 Reduced bug resolution time by 15% by optimizing assignment workflows in Azure DevOps. \u2022 Addressed gaps in Selenium test coverage by writing additional test cases in Nightwatch.js, enhancing overall test coverage. \u2022 Located and implemented stored procedures to display data on the UI, ensuring accurate data presentation. \u2022 Optimized the automation process, reducing manual testing efforts by 20%. \u2022 Executed test pipelines on ADO Board, facilitating continuous integration and delivery. GO1 % Software Intern (Sep-2023 to Nov 2023) \u2022 Developed and executed automated test scripts for the Go1 Percent website using Nightwatch.js, improving test coverage and reliability. \u2022 Conducted API testing to ensure seamless communication between the web application and backend services. 55 RUSH Software Intern (Feb-2023 to Aug 2023) \u2022 Gained hands-on experience with Scala programming, focusing on Akka Actors and Akka HTTP. \u2022 Worked on troubleshooting and resolving coding issues, applying critical thinking to ensure project execution. EDUCATION ABES ENGINEERING COLLEGE Ghaziabad, UP MCA (76%) Nov 2021-July 2023 TECHNICAL SKILLS \u2022 Fundamentals: Data Structures, Algorithms, Object-oriented programming (OOP). \u2022 Programming Languages: Java, Scala , JavaScript, TypeScript. \u2022 Framework: Selenium, Nightwatch.js, TestNG, JUnit, OWSAP ZAP. \u2022 Database: MYSQL, SQL, Postgress. \u2022 Web Development: Html, CSS. \u2022 Tools & Software: Postman (API Testing), TestRail, GIT (Version Control), Jenkins (Continuous Integration, Continuous Deployment, IntelliJ IDEA (IDE), VS code, Azure Devops board, JIRA. \u2022 Salesforce Technologies: Lightning Components (Aura and LWC), Data Loader, SOQL, Process Builder, Workflow Rules, Visualforce, Reports and Dashboards. ACHIEVEMENTS AND CERTIFICATES \u2022 Received client appreciation for excellent work completed in two sprints. \u2022 Awarded \"The Conquerors - One team, One dream!\" during the annual day celebration. \u2022 Received praise for being a team player from Manager on Keka portal. \u2022 Secure certification in Java Fundamental by Great Learning Academy. CODING PROFILES leetcode.com/shrasti_0207/ auth.geeksforgeeks.org/user/shrastigupta0207/ hacckerrank.com/shrastigupta0207"
},
{
"muskanjain74170@gmail.com": "MUSKAN JAIN muskanjain74170@gmail.com +91-7906778545 Career Objective: Seeking a challenging position in a reputable organization to expand and utilize my learning, skills and knowledge. Possess good communication skills with MS excel, SEO work, Operation work and have an eye for detail. Flexible to work in any environment as required. Skills: \u25cf Good communication skills. \u25cf MS Excel \u25cf SEO \u25cf Ability to handle work pressure. \u25cf Good skills in operation work. \u25cf Time management. \u25cf Work ethics. Work Experience: SEO Intern for 6 Months for an internal Application in an Organization. Education: \u2022 M.COM from AK College, Shikohabad in 2021. \u2022 B.COM from AK College, Shikohabad in 2019. \u2022 12th Standard from Jain Inter College, Karhal in 2016. \u2022 10th Standard from Jain Inter College, Karhal in 2014. Hobbies: \u2022 Dancing \u2022 Internet Surfing Disclaimer: I hereby declare that all the above information is true to the best of my knowledge. Muskan Jain"
}
]
\ No newline at end of file
File added
import re
from fastapi import APIRouter, HTTPException
from fastapi.responses import FileResponse
from pydantic import BaseModel
import os
from pinecone import Pinecone, ServerlessSpec
router = APIRouter()
# Initialize a Pinecone client with your API key
pc = Pinecone(api_key="pcsk_9Svxf_Kx6t97J54GiC9UNFjLC3U3ozJJ7cmvP4cXGsQBJVk4xHXCSwZ9FodjUABGifKdA")
# Define a Pydantic model for the data we expect in the POST request
class InputData(BaseModel):
jobTitle: str
......@@ -14,8 +21,16 @@ class InputData(BaseModel):
# POST endpoint
@router.post("/submit/")
async def submit_data(data: InputData):
results = process_jd(data.jobDescription)
final_data = [{"score": obj["score"] * 100, "email": extract_email(obj["text"])} for obj in results]
# Here, you can process the data (e.g., save it to a database)
return {"message": "Data received successfully!", "data": data.dict()}
return {"message": "Data received successfully!!!!", "data": final_data }
def extract_email(text):
email_regex = r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+'
match = re.search(email_regex, text)
return match.group(0) if match else None
@router.get("/read-file")
async def read_file(file_path: str):
......@@ -34,4 +49,26 @@ async def read_file(file_path: str):
@router.get("/hello")
async def hello_world():
return {"message": "Hello, World!"}
\ No newline at end of file
return {"message": "Hello, World!"}
def process_jd(query: str):
embedding = pc.inference.embed(
model="multilingual-e5-large",
inputs=[query],
parameters={
"input_type": "query"
}
)
index = pc.Index('cvparser')
results = index.query(
namespace="ns1",
vector=embedding[0].values,
top_k=3,
include_values=False,
include_metadata=True
)
print(results)
response = [{'score': match['score'], 'text': match['metadata']['text']} for match in results.get('matches', [])]
return response
{"web":{"client_id":"968403591300-6arc2s8c05h9d0utvvk5aded26l07ruv.apps.googleusercontent.com","project_id":"cv-drive-project","auth_uri":"https://accounts.google.com/o/oauth2/auth","token_uri":"https://oauth2.googleapis.com/token","auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs","client_secret":"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"}}
\ No newline at end of file
{
"type": "service_account",
"project_id": "cv-drive-project",
"private_key_id": "f79de400a99a528d17abf75c96bd8e28eeae1cc5",
"private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQCyxBFR5YXGDV3P\ng0P9h6AXUrK1kbw+ICwmeXpHp/iOTzojv3bCtz0hnqGI/PZZY9xYDCPg6hJmFnHr\n+wHWsWlAr4e3VMy5GcVNVwngDDsF8bEItZdOCKeEF7kvTPZxDWlBPg9M7NgrmwnR\nA7BGFh6FSG5UmOm39bqkQ7I6Q0O5f7FS0FHfB/9jAswl4oro4XxuozX6VC8MY+f8\njry1Gzx3+lBcg/s21I9cSj3chnfc2mIzJ9BOzvorPzuqVM3LiDKdgaBD/XFY9oUO\nX4JZLGVxN8vvW87BjZrL3uBB/CGWxL2QAe4xx9KxP6UcjcMs7yKylqudDWiOMhD+\nYSTNSOY/AgMBAAECggEANTyYn9gHj6SZZFwGGnGSZn+1QsdE1QeFvmt80+wc5FyJ\nxu2U84XYSbPRDVewEYzgqMPsx2VN12MlyA33TTWGT8I2W42AbjtTe6XJBhT+WtXQ\nT/SsO4vuPzNFbeWwmphQ2SIfMlxyQAIq2TkM/MJHko+wc7caGzOtwo96e0le2NXD\nwAB9FIVAyICEEYlgQRm/Qm8j9f/Vwn7EI+WV+qxHgxRcRlOgv1CwREL+d+tVwoXH\niT9afjcmWAsxJfx32waxvAhNcMpF2aHyJEsR5/pkSAtu9r/oLkx4uVEQkB/7VTXd\nKuar7lEBQdv3PXddKvAfDJIum8vs8vWjOKRvq8uIkQKBgQDdCpnNuB0QVWYefXok\nwxiCqTTMYSfUHnnnaiWckWKzOzhuFjIu4YyRG4aaRPRCt/2B4kFpQNJHG2sO9wHZ\nYVLORzZ+xJ4aEqnONi+2brWAa5XkX4m4DpvaaglM+RYqb0f9LekFmNf7jD92ccSB\nE/nTv2LO3NfSdXevvHobLEr2NwKBgQDPCdbu6Md5pe7jBCQCy4CXtTa2qFNoL6Kj\nf+7GIP055wPDyETFsmDa3VXvFImD8DR7fK9ro1besuOzzQZa2bGjGrrQXn4QqxQX\nk9h2ZNcJgdNk3SipBujK3YCAZOU87tVZrpVSxx8ykfjPMpba8XkUYz41hoBrux58\nFzY+5J6MOQKBgHihbmqczg+56288X0psxpWYLl5Tr17N+w1WGoyls03JRfSeXGjF\nYudWNFFAzGUU8F7P0Y7Vd2yjA/w4xWOS/5PfvVfVicsE4HLPk55mMNuS20y07v3p\nyxiZwdWmIniqHomHYqJRVZ3MXl4PnIQtGKx8yDnIwGf0/4qCu8jUlVxlAoGBAKYp\nqXy5Ck9Ro3ZDfntXxG79m1nhon89q/Q42vPcyh9MjzL0am2Yii9d8HgfLXbcs+Jf\n4ZIr166IOEO3yt/jU7Qp4cEV9Wt/QnLhz5rFt+gDcBvFe7qctv0J5PYnA+xxan29\npk53TUyS5vO0EGrL3ndEZ0siFbNgzEifgIPdjHnRAoGAavScTmjVeZgCx4YJlCV6\nBJ3qAZucgdGXjbAgdfjPGb+CtZs81b/1xMvIC6mUJex+bldSTYniCOqhTAa1a5gs\nkVkUZF9cexJox4odg349ZuAPP6kheNweoeJJwfQ+gYpsyLQVO+/J3q+iWNVVcpph\nc1J6hjuB6MfCIDOq7HSE7t4=\n-----END PRIVATE KEY-----\n",
"client_email": "drive-api-desktop-app@cv-drive-project.iam.gserviceaccount.com",
"client_id": "110249800134962904815",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/drive-api-desktop-app%40cv-drive-project.iam.gserviceaccount.com",
"universe_domain": "googleapis.com"
}
# Import the Pinecone library
from pinecone import Pinecone, ServerlessSpec
from pdf import extracted_data
# Initialize a Pinecone client with your API key
pc = Pinecone(api_key="pcsk_9Svxf_Kx6t97J54GiC9UNFjLC3U3ozJJ7cmvP4cXGsQBJVk4xHXCSwZ9FodjUABGifKdA")
index_name = "cv_parser"
# pc.create_index(
# name=index_name,
# dimension=1024, # Replace with your model dimensions
# metric="cosine", # Replace with your model metric
# spec=ServerlessSpec(
# cloud="aws",
# region="us-east-1"
# )
# )
\ No newline at end of file
import nltk
import PyPDF2
from sklearn.metrics.pairwise import cosine_similarity
from transformers import BertTokenizer, BertModel
import torch
import string
# Download NLTK resources (you can comment these out if already done)
nltk.download('stopwords')
nltk.download('punkt')
# Initialize the BERT tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
# Function to preprocess text (remove stopwords, punctuation, etc.)
def preprocess_text(text):
# Convert text to lowercase
text = text.lower()
# Remove punctuation
text = text.translate(str.maketrans('', '', string.punctuation))
# Tokenize text
words = nltk.word_tokenize(text)
# Remove stopwords
stopwords = set(nltk.corpus.stopwords.words('english'))
words = [word for word in words if word not in stopwords]
return ' '.join(words)
# Function to extract text from PDF
def extract_pdf_text(file_path):
with open(file_path, 'rb') as file:
reader = PyPDF2.PdfReader(file)
text = ''
for page in reader.pages:
text += page.extract_text()
return text
# Function to get BERT embeddings for text
def get_bert_embeddings(text):
# Preprocess the text
text = preprocess_text(text)
# Tokenize and get BERT inputs
inputs = tokenizer(text, return_tensors='pt', truncation=True, padding=True, max_length=512)
# Get embeddings from BERT
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1).squeeze()
return embeddings
# Function to calculate similarity between CV and JD
def match_cv_and_jd(cv_text, jd_text):
# Get BERT embeddings for both CV and JD
cv_embeddings = get_bert_embeddings(cv_text)
jd_embeddings = get_bert_embeddings(jd_text)
# Calculate cosine similarity
similarity = cosine_similarity(cv_embeddings.unsqueeze(0), jd_embeddings.unsqueeze(0))
return similarity[0][0]
# Main function to test the matcher
if __name__ == "__main__":
# Example: Load CV and JD from PDF or string input
cv_text = extract_pdf_text('DevendraChaturvediAI_ML.pdf') # Replace with actual CV PDF file
jd_text = extract_pdf_text('jd.pdf') # Replace with actual JD PDF file
# Match CV and JD
similarity_score = match_cv_and_jd(cv_text, jd_text)
# Print the similarity score
print(f"Similarity Score: {similarity_score:.4f}")
# Define a threshold to determine if the CV matches the JD (e.g., 0.7 means 70% similarity)
threshold = 0.7
if similarity_score > threshold:
print("The CV is a good match for the job description.")
else:
print("The CV is not a good match for the job description.")
......@@ -28,4 +28,5 @@ if st.button("Submit"):
st.write("Received Data:", result['data'])
else:
st.error(f"Error: {response.status_code}")
else:
\ No newline at end of file
else:
st.warning("Please fill in all the fields!")
\ No newline at end of file
import os
import pickle
from urllib.request import Request
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from googleapiclient.http import MediaIoBaseDownload
import PyPDF2
# Set up your Google Drive API access
SCOPES = ['https://www.googleapis.com/auth/drive.readonly']
def authenticate_drive():
creds = None
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file('src/credentials.json', SCOPES)
creds = flow.run_local_server(port=8080)
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
service = build('drive', 'v3', credentials=creds)
return service
def download_pdf(service, file_id, destination_path):
request = service.files().get_media(fileId=file_id)
fh = open(destination_path, 'wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while not done:
status, done = downloader.next_chunk()
print(f"Download {int(status.progress() * 100)}%.")
print("Download completed.")
def list_files_in_folder(service, folder_id):
query = f"'{folder_id}' in parents and mimeType = 'application/pdf'"
results = service.files().list(q=query, fields="files(id, name)").execute()
print(results)
# files = results.get('files', [])
# return files
def extract_text_from_pdf(pdf_path):
with open(pdf_path, 'rb') as file:
reader = PyPDF2.PdfReader(file)
text = ""
for page in reader.pages:
text += page.extract_text()
return text
def process_pdfs_from_folder(service, folder_id, download_dir):
# List all PDF files in the folder
pdf_files = list_files_in_folder(service, folder_id)
print(pdf_files)
# if not pdf_files:
# print("No PDF files found.")
# return
#
# for pdf_file in pdf_files:
# print(f"Downloading {pdf_file['name']}...")
# file_id = pdf_file['id']
# destination_path = os.path.join(download_dir, pdf_file['name'])
#
# # Download the PDF
# download_pdf(service, file_id, destination_path)
#
# # Extract text from the downloaded PDF
# text = extract_text_from_pdf(destination_path)
# print(f"Text from {pdf_file['name']}:\n")
# print(text[:500]) # Print the first 500 characters of the text for preview
# Main execution
if __name__ == '__main__':
service = authenticate_drive()
folder_id = 'https://drive.google.com/drive/folders/1PMPieO5sCGedjx9iaww_IRbC5m1cz2YM' # Replace with your folder ID
download_dir = 'downloads' # Directory to save the PDFs
if not os.path.exists(download_dir):
os.makedirs(download_dir)
process_pdfs_from_folder(service, folder_id, download_dir)
import os
import time
import pdfplumber
import re
import pandas as pd
from pinecone import Pinecone, ServerlessSpec
# Initialize a Pinecone client with your API key
pc = Pinecone(api_key="pcsk_9Svxf_Kx6t97J54GiC9UNFjLC3U3ozJJ7cmvP4cXGsQBJVk4xHXCSwZ9FodjUABGifKdA")
index_name = "cvparser"
# Path to the directory containing your PDF files (replace with your actual path)
pdf_folder_path = "./pdfs" # Example: './cv_pdfs' or use absolute path like 'C:/Users/John/cv_pdfs'
pdf_folder_path = "./src/pdfs" # Example: './cv_pdfs' or use absolute path like 'C:/Users/John/cv_pdfs'
# List all PDF files in the folder
pdf_files = [f for f in os.listdir(pdf_folder_path) if f.endswith('.pdf')]
......@@ -52,22 +62,60 @@ for idx, pdf_file in enumerate(pdf_files):
email: full_text # Full extracted text from the CV
})
# Now extracted_data contains an array of objects with 'id', 'name', 'phone', 'email', and 'text'
# Example of the result:
# extracted_data = [
# {"id": 1, "name": "John Doe", "phone": "123-456-7890", "email": "john.doe@example.com", "text": "Full text of CV 1..."},
# {"id": 2, "name": "Jane Smith", "phone": "987-654-3210", "email": "jane.smith@example.com", "text": "Full text of CV 2..."},
# {"id": 3, "name": "Alice Johnson", "phone": "555-123-4567", "email": "alice.johnson@example.com", "text": "Full text of CV 3..."}
# ]
# Print the resulting dataset
print(extracted_data)
# print(extracted_data)
formatted_data = [
{"id": f"vec{i+1}", "text": list(data.values())[0]} for i, data in enumerate(extracted_data)
]
# Function to sanitize and create index name
def create_pinecone_index():
# Check if index already exists
if index_name not in pc.list_indexes().names():
pc.create_index(
name=index_name,
dimension=1024, # Modify this based on your data's dimensionality
metric='cosine', # You can also use cosine or others
spec=ServerlessSpec(cloud='aws', region='us-east-1')
)
# connect to index
index = pc.Index(index_name)
return index
# pc.create_index(
# name=index_name,
# dimension=1024, # Replace with your model dimensions
# metric="cosine", # Replace with your model metric
# spec=ServerlessSpec(
# cloud="aws",
# region="us-east-1"
# )
# )
embeddings = pc.inference.embed(
model="multilingual-e5-large",
inputs=[d['text'] for d in formatted_data],
parameters={"input_type": "passage", "truncate": "END"}
)
print(embeddings[0])
# Wait for the index to be ready
while not pc.describe_index(index_name).status['ready']:
time.sleep(1)
index = create_pinecone_index()
vectors = []
for d, e in zip(formatted_data, embeddings):
vectors.append({
"id": d['id'],
"values": e['values'],
"metadata": {'text': d['text']}
})
# Optionally: Save to a CSV or JSON file
import json
with open("extracted_cvs.json", "w") as json_file:
json.dump(extracted_data, json_file, indent=4)
index.upsert(
vectors=vectors,
namespace="ns1"
)
# Or save it as a CSV (each row has an 'id', 'name', 'phone', 'email', 'text' column)
df = pd.DataFrame(extracted_data)
df.to_csv("extracted_cvs.csv", index=False)
print(index.describe_index_stats())
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment