PSA: Reporting Duplicates

    About half of all reports are for duplicates and a lot of these are reported wrongly these days, so to hopefully get us all on the same page and save people some time and effort... Wallpapers are merged if the resolutions are the same or almost the same (to a margin of about 5-10%, so a 1920x1280 version would usually be merged with a 1920x1200 one). This is really just to weed out versions like 1920x1081 which are technically not identical to 1080p ones, but are utterly unnecessary. When merging, the oldest wall that's the same or almost the same resolution is what other walls get merged with. There are exceptions to this if
    1. the older wall is of visibly lower quality
    2. the older wall has had its original author's watermark removed ( and user marks don't count, kids).
    In these cases, better quality or being watermarked takes precedence over age. If an upload of yours was deleted and you don't understand why, it might be because of this. Otherwise, walls should be grouped. When this needs to be done, just report under Other and put the URL of the original wall in the info field. Walls that are already grouped shouldn't be reported as dupes without good reason. Please refer back to this if you're unsure when reporting and discuss as well - we realise it's not fun taking the time to upload walls and having them taken down for being dupes, it's not an exact science, it's work in progress, etc...
    First, if I see two identical wallpapers and one of them has 1920x1080, and the other let's say 3840x2160, then I should report for 'Other' and provide a link to a different sized wallpaper? Second, if two or more wallpapers are grouped because of different resolutions, where can I see other sizes? What if I see a wallpaper with Full HD size, and I know it's grouped with 2K/4K and I want bigger sized image on my desktop?
    @dwemer, if it's grouped then right under the resolution it has a link for "1 more size"
    sannukas0016 thank you, now it's clear Added 2016-06-07 16:04:31 I have another question related to groups of wallpapers. Let's say I have some wallpaper in many different resolutions - 4K, 2K Full HD and so on. Should I upload all of them or the biggest one is the best?
    The biggest one would be the best option but you could upload all of them and just group them (which I believe is automatics, not entirely sure)
    You should not upload all resolutions but instead only the largest and, if it that is not already a standard resolution, one standard resolution (like 1920x1080). If you upload more than that we will have to delete them manual and will likely become annoyed.
    I found these two Wallpapers:
    [383001] They are minimal in Resolution. So is this a case of Duplicate ? Shall i report ?
    WallpaperManiac, there's an example of two walls of similar enough resolutions, so that's the kind of thing we'd like folks to report. I've already merged the second with the first.
    What about these?
    [377623] They are different in sizes, they are almost the same, but different in details, should I report something like these to be grouped, or they should be completely different?
    @dwemer, the differences are big enough that you can't count them as 1 image.
    What about low effort collages like this that are just other users' uploads in non-standard resolutions?
    KrimzinZV, these aren't dupes but we usually consider this kind of thing as a low-quality edit, so against the rules.
    Please look in the Best of the Worst Thread there you will find some really good kind-of-a-thing low-quality-edits
    You don't need the public for this you could easily create an algorithm using python to compare pixel to pixel image differences to search for duplicates.
  • 9447
    JCarlin6 said:
    You don't need the public for this you could easily create an algorithm using python to compare pixel to pixel image differences to search for duplicates.
    There's just so much wrong about that…
    Gandalf JCarlin6 Pixel-by-pixel or byte-by-byte is a really really bad idea. Wallhaven currently hosts more than 400K wallpapers. With an average of almost 700 MB for 1K wallpapers. Not to mention you'd have to retrieve each uploaded image ---> read its data--->Compare it with all wallpapers?!!! Also, one could simply convert the image. So, you have to spend more than 6 ~7days to compare a single WP with the other 400K this would result a huge impact on both disk(reading from disk) and memory(running the application) The best you can do is to categorize every uploaded image by uploader /size/author(png)/type/tags/date created/dimensions Long story short, it is tooooo late!
    Look, there are ways to find duplicates quickly. They are just a little more complicated than "compare all the pixels". We're going to be using IQDB, which should be able to find similar wallpapers quickly enough. It's just a bit annoying to implement because IQDB is a standalone program (not a library) and doesn't come with a handy PHP Interface. But we'll get there. ^^
  • 9474
    throated said:
    I can't find anything about detecting duplicate images on there, although I guess that could help fill out tags and purity.
    Gandalf , It doesn't have to be complicated. Besides, you can implement it after the alpha phase is over.
  • 9490
    Holy, nothing has to be complicated, but it often ends up being that way when you want to achieve several things within a specific system. We never came to a firm conclusion amongst ourselves about the extent to which we should tolerate dupes, and this thread is evidence that we don't all have the same ideas about what constitutes a dupe in the first place. That's a separate issue but informs what's implemented. Solid dupe detection is one of the niggles we'd like sorted in plenty time before alpha is over (and has actually been an issue since pretty much the first week). Like you said yourself, it's too late to go about this via certain methods, which is part of the reason we'll be starting fresh again in whatever capacity. Anyway, thanks to you guys for chipping in so far...
    AksumkA That's amazing! byebye alpha! cfunk
    it often ends up being that way when you want to achieve several things within a specific system.
    I wouldn't know.
    we don't all have the same ideas about what constitutes a dupe in the first place
    1) Not searching if an image already exists. 2) Increasing the uploaded images count.
    we'll be starting fresh again in whatever capacity.
    Mwahah!!! More png-24!
    I think that not all of these should be deleted, or, at least two should be preserved. [40800]
    [100216] 100216 (last one) is probably the worst copy, doesn't have the watermark and is of the smallest size. Maybe mark other three as "other size", that would be my suggestion. I even had this as a wallpaper for some time.
    Vozho, this is the kind of situation where we'd group, which is what I've done (but #2 and #3 are just barely different enough).