Software to compare PDF files

  • Statusi: Closed
  • Çmimi: $600
  • Kandidaturat e marra: 7
  • Fituesi: carlquist

Përmbledhja e konkursit

This contest is to compare multiple PDF files based on the similarities of bounding boxes. This is not an easy contest and will require understanding of PDF libraries.
There are many PDF libraries available and it is not important which one is used.

Features required:
Upload multiple PDF files (many).
Converts PDFs to PNGs with bounding box squares
PNG with bounding boxes shown - user selects which bounding boxes are of interest. Can select multiple bounding boxes.
Software then searches ALL the original PDFs - to find which files have the same bounding boxes.

Matches must be based on either:-
1. Approximate co-ordinates of the bounding boxes and the respective page number. Leaving room for 3% error in placement of bounding boxes.
OR
2. Image match the area of the bounding box. It means for each match from (1) that another step must also convert that bounding box to a PNG file and do an image comparison - if almost identical images then it returns as a match.

The end result is the software shows a list of links to download that contain the PNGs\PDFs of the files with ONLY the same bounding boxes.

The winner will be asked to add a module to:-
-Enable the placement of another PNG image over any PDF image and re-write the PDF image. Many github libraries can do this.

-Put the bounding box through tessarect and do OCR text search in addition to the simple bounding box co-ordinate comparison. This would produce another criteria to match on.

So the winner can earn total $800+ from this Contest through the add on module.

Good Luck.

Please serious entries only. I have zero patience so only do this once it is fully working! I suggest you first message me your proposed methodology and I can then confirm your ideas will succeed.

Be quick!




I recommend using https://blueimp.github.io/jQuery-File-Upload/ to save time.

Some other ideas would be to convert the bounding boxes to SVG format and use an existing SVG comparison library.

Aftësi të rekomanduara

Punimet mё tё mira nga ky konkurs

Shiko mё shumё vepra

Këndi publik i sqarimeve

  • Asianexperts
    Asianexperts
    • 5 vite më parë

    hehehe all thought to get this prize and disspointed

    • 5 vite më parë
  • sunnyguptahotels
    Krijuesi i Konkursit
    • 5 vite më parë

    Please do not enter this contest! One contestant is extremely close to winning.

    • 5 vite më parë
    1. danielvz96
      danielvz96
      • 5 vite më parë

      :( How close? I've already implemented a bounding box finder (can find from the smallest detail to whole paragraphs), the bulk compare function and was working on the frontend when I saw this.

      • 5 vite më parë
  • teachartdevteam
    teachartdevteam
    • 5 vite më parë

    Hey there! I have an slightly different idea and I will be happy to discuss it with you. Basically what you think, does it make sense if the user draws the bounding boxes. Rendering a box to each object over the pdf might not be 100% useful, I saw tons of pdf's in the past with bad structure and arrangement which contain overlapping objects. This will result into overlapping bounding boxes. With the current way a recursive lookup must be implemented, each object must be extracted from the pdf and parsed. Each object must be parsed with different internal parser (itextsharp and pdfsharp work on that way) just to take the details like size and position.

    • 5 vite më parë
    1. sunnyguptahotels
      Krijuesi i Konkursit
      • 5 vite më parë

      I see what you are saying. So which library do you propose to use for image comparison? And how would you extract the corresponding area from the other PDFs? Or does it need to compare the selected area in png against the entire png full pages of every PDF ?

      • 5 vite më parë
    2. sunnyguptahotels
      Krijuesi i Konkursit
      • 5 vite më parë

      Speed is a big consideration. To do what you are describing - it may be neccesary to overlay the page with a 12x16 grid - and then find all 'touched' grid-boxes that the hand-drawn bounding box touches - so that it does the comparison more efficiently. but that seems to add more complexity to the exercise. adobe acrobat reader seems to get the bounding boxes right without much overlap.

      • 5 vite më parë
  • ITPyramid85
    ITPyramid85
    • 5 vite më parë

    At first, I want to see the pdf quality if it is possible for image processing or not. Can you provide pdf files you have?

    • 5 vite më parë
    1. sunnyguptahotels
      Krijuesi i Konkursit
      • 5 vite më parë

      Assume that all the pdfs are generated from the same creation utility. The most obvious example is a bank statement. But - I think image comparison is missing the point - we want comparison by bounding box co-ordinates. So the 1st step is to find the alogirithm that Adobe uses to obtain the bounding-boxes. Most of the open-source utility treat ever character as a separate co-ordinate.

      • 5 vite më parë
  • sunnyguptahotels
    Krijuesi i Konkursit
    • 5 vite më parë

    Hi Everyone.. please ask your questions here for everyone. If you don't know what a bounding box is in a PDF document then you should not attempt this contest. I don't have time to educate, sorry. No point explaining your experience - this is a guaranteed contest - if you understand the concepts in the brief then you may submit an entry. It's as simple as that. If you don't understand it then you do the basic work first and return with specific questions.

    • 5 vite më parë
  • sunnyguptahotels
    Krijuesi i Konkursit
    • 5 vite më parë

    Hi Everyone

    • 5 vite më parë
  • Codeitsmarts
    Codeitsmarts
    • 5 vite më parë

    Hi, I have read your project description. I have few queries before I can begin the work. Can we discuss the same through chat? I shall endeavor to exceed your expectations.

    I have 5 years of experience in PHP, mysql, Codeigniter, Wordpress, Jquery, HTML, CSS ,Python and many more . Please see my portfolio for art work samples and my clients feedback.

    1 . http://www.astrologyindubai.com/
    2 . http://www.sweetspace9.com/
    3 . http://www.ngotiator.com/
    4 . http://www.shypon.com/
    5 . https://www.pixbrand.in/
    6 . http://www.etfmodelsolutions.com/
    7 . http://wricitieshub.org/worldtodresource/

    And I'm confident that I can complete your project on time and within your budget. I can achieve the results that you are asking for
    Please initiate chat for further discussion. I will do my best for you , with a Positive Hope! Regards

    • 5 vite më parë
  • ITPyramid85
    ITPyramid85
    • 5 vite më parë

    Also If you want to do the image searching, It will be normallized by special size so that it is needed image quality, pdf page amounts and it will give effect for searching speed

    • 5 vite më parë
  • sprlabs9
    sprlabs9
    • 5 vite më parë

    Hi, I would like to discuss. Please drop me a message.

    • 5 vite më parë
  • dev681999
    dev681999
    • 5 vite më parë

    I am probably wrong fell free to correct me

    • 5 vite më parë
  • dev681999
    dev681999
    • 5 vite më parë

    By reading the description this is what I have understood - You want a website where people can upload PDFs files. Then the PDF is converted to PNG which contains bounding boxes. These bouding boxes matches any other boxes from uploaded files. Then user can select bouding boxes to download.

    • 5 vite më parë
  • sunnyguptahotels
    Krijuesi i Konkursit
    • 5 vite më parë

    It can be in PHP, Python, or C#. There must be a web-front end to accept the upload of the files so Java\VB are not suitable.

    • 5 vite më parë
  • a6jack
    a6jack
    • 5 vite më parë

    Dear,
    May we know which language (PHP, Python, C#, JAVA ...) this software should be written and is it will be a website or Desktop app?

    • 5 vite më parë
  • sunnyguptahotels
    Krijuesi i Konkursit
    • 5 vite më parë

    Please submit a blank entry then it will allow me to message you.

    • 5 vite më parë
  • desmondmile03
    desmondmile03
    • 5 vite më parë

    Hi, please message me so I can discuss my proposed methodology. Thanks

    • 5 vite më parë
  • ahsanfaheem3
    ahsanfaheem3
    • 5 vite më parë

    Dear contest holder, kindly message me so I can discuss my proposed methodology. Thanks.

    • 5 vite më parë

Shfaq më shumë komente

Si të fillosh me konkurset

  • Posto konkursin

    Posto konkursin Shpejt dhe thjeshtë

  • Merr shumë propozime

    Merr shumë propozime Nga e gjithë bota

  • Zgjidh kandidaturën më të mirë

    Zgjidh kandidaturën më të mirë Shkarko dokumentet - E thjeshtë!

Postoni një konkurs tani! ose bashkohu me ne sot!