Find Jobs
Hire Freelancers

Scrape information from web pages -- 2

$30-250 USD

Anuluar
Postuar over 7 years ago

$30-250 USD

Paguhet në dorëzim
I need this project to be completed as soon as possible. It requires a programmer with well-developed web scrapping skills. If interested, please send me: (i) A bid; (ii) An estimate of how long this will take you; and (iii) A very brief explanation of how you will execute this task. These are the instructions in detail: 1. The comma-delimited text file “[login to view URL]” is a list of 12977 names with 4 columns: ROWID, NOMBRES, APELLIDO_PATERNO, and APELLIDO_MATERNO. 2. For each row in [login to view URL], go to [login to view URL] and enter the NOMBRES, APPELIDO_PATERNO, and APELLIDO_MATERNO in the search engine. Then click on “buscar”. 3. Click on the person that EXACTLY matches the information entered in the step above. (see [login to view URL] for more information on this). 4. Click on the PROCESOS ELECTORALES tab. (URL finishes in “IdTab=1”). Check if the politician was a mayoral candidate (i.e., either “ALCALDE DISTRITAL” or “ALCALDE PROVINCIAL”) for the election “ELECCIONES REGIONALES Y MUNICIPALES 2014”. You will see these in the sub-table (see [login to view URL]). If yes, go to 5. If not, move on to the next name. 5. Click on the “HOJA DE VIDA” of that corresponds to the 2014 election “ELECCIONES REGIONALES Y MUNICIPALES 2014”. This link is embedded in the PROCESOS ELECTORALRES sub-table. The link in the uppermost part of the webpage saying “ver hoja de vida” is NOT the one we want. 6. Scrape all the data found in the HOJA DE VIDA. The freelancer will need to make sure that his/her code extracts *all* the information available. Also, the freelancer will figure out the best way for him/her to report the scrapped data. I suggest a rectangular format (or several tables) where each row correspond to a politician and each column to an item of the HOJA DE VIDA. The key is that I will need to be able to link each piece of information to a rowid in [login to view URL] and the politician id that can be found in the URL of PROCESOS ELECTORALES (IdPolitico). 7. Save the PROCESOS ELECTORALES tab (URL finishes in “IdTab=1”) as HTML with the name “IdTab1_IdPolitico#.html, where # is the politician’s id number. Do the same for the HISTORIAL PARTIDARIO tab (URL finishes in “IdTab=0”). Save that web page as HTML with the name “IdTab0_IdPolitico#.html”. 8. Record all your steps in “[login to view URL]”. The idea is to save all the URLs from which information was downloaded and the corresponding file names. See the attached example for details. 9. I am attaching and example ([login to view URL]), the name list, and further clarifications. Please, do take a detailed look at each of these. Also, use the example logfile I provide as a template for yours. The deliverables for this project are: a) All downloaded files. b) Dataset(s) with the scraped information of the HOJAS DE VIDA (XLSX). c) A complete logfile (XLSX). d) The code you used to download the information. Thanks,
ID e Projektit: 12073602

Rreth projektit

5 propozime
Projekt në distancë
Aktive 7 yrs ago

Po kërkoni të fitoni para?

Përfitimet e ofertës për Freelancer

Vendosni buxhetin dhe afatin tuaj
Paguhuni për punën tuaj
Përshkruani propozimin tuaj
Është falas të regjistrohesh dhe të bësh oferta për punë
5 freelancers are bidding on average $74 USD for this job
Avatari i Përdoruesit
Hi there, I have read the project description.. I will write a scraper script/software to do the job. will provide both data and script. Let me know & we can discuss details.. Thanks..
$100 USD në 1 ditë
5,0 (118 përshtypje)
6,2
6,2
Avatari i Përdoruesit
Text me if you are OK with my bid
$111 USD në 2 ditë
0,0 (0 përshtypje)
0,0
0,0

Rreth klientit

Flamuri i UNITED STATES
Durham, United States
5,0
3
Mënyra e pagesës u verifikua
Anëtar që nga gush 6, 2016

Verifikimi i klientit

Faleminderit! Ne ju kemi dërguar me email një lidhje për të kërkuar kredinë tuaj falas.
Ndodhi një gabim gjatë dërgimit të email-it tuaj. Ju lutemi provoni përsëri.
Përdorues të regjistruar Punë të postuara
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Po ngarkohet shikimi paraprak
Leja u dha për Geolocation.
Seanca e hyrjes ka skaduar dhe ke dalë. Hyr sërish.