Replies: 21 comments 28 replies
-
oh yes, i'm 110% for pretty much all you wrote. i'll come back to this and write a more detailed reply in a bit... |
Beta Was this translation helpful? Give feedback.
-
Take your time. I suspect this will end up being a time-consuming
undertaking. Must pace.
…On Thu, Jun 1, 2023 at 10:46 AM Vladimir Mandic ***@***.***> wrote:
oh yes, i'm 110% for pretty much all you wrote. i'll come back to this and
write a more detailed reply in a bit...
—
Reply to this email directly, view it on GitHub
<#1246 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A2755J4EFYKAUY25KT6Y3JTXJC2PHANCNFSM6AAAAAAYW6IYZI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I love this idea, especially if it could be implemented in Settings. Perhaps a little ❓ in each section to pop up a overlay with a simple explanation, and tooltips for more info. Another option would be to have an "Expert Mode" toggle that would show/hide lesser used options. |
Beta Was this translation helpful? Give feedback.
-
ok, finally going back to this thread. simple uihow to have something simple for non-expert user? i'd love to spin up a project for this if there are interested contributors? cleanup ui and add helpthis applies to all labels and hints.
to start, open i'd love to have contributors to fill that file with all possible hints as well as to propose suggestions for better labels for some values. restructure settingsissue is that a lot of server settings are not really server settings and should not be there, they should be local settings. i've done the change so far to a single setting - i've moved clip_skip from global to local and i had to answer questions about it for the next week. this is a big one as its extremely messy from dev side, but needs to be done sooner or later or we're heading to a wall. |
Beta Was this translation helpful? Give feedback.
-
tagging few more ppl here: @Aptronymist @Woisek @TheOnlyHolyMoly @DirtyHamster @mark-wd |
Beta Was this translation helpful? Give feedback.
-
I believe it's pretty important to start creating more of an identity for this fork, seeing as it's hard to define right now what are its precise advantages. To me (and prob other active people) the advantage is being able to have an actual open source community around the software and participate in its evolution, but that's hard to quantify to most people. It needs to be more than "a fallback to 1111". So what could it be? Easy Diffusion is the "for babies" version, so could SDNext be a middle ground? That could be it. It needs to be as powerful as 1111 but have them better presented. To me, by far the quickest way to do that would be to actually add explanations to what things mean. I consider myself moderately experienced with generative AI at this point, yet I still many times come here to ask "what is a quadratic this and that"? A second thing would be to continue improving UI and UX towards something entirely new, because honestly even though SDNext is better, this whole gradio ecosystem seems to be a complete pain... it's the thing I see @vladmandic complaining the most about. Is there a way to abstract it more? Just some initial thoughts. |
Beta Was this translation helpful? Give feedback.
-
good point. perhaps someone less biased than myself can go over the release notes i've been writing and compile a top-ten(ish) that we can place in a readme and highlights?
nope! :) i want sdnext to be superset of a1111, but that doesn't mean that it should have "more options", it should have more features that matter. i've always been for picking best defaults and only exposing stuff that is actually useful, not every knob and button that nobody understands. but that doesn't mean less functionality. for example, project is already well underway to bring whole diffusers ecosystem into sdnext and that would open a whole new world of possibilities.
thats what i spoke about before. first, there is already a massive project ongoing to completely rehaul gradio ui. yes, its still massively complex and not-performant. but at least we can bring it to 21st century ui standards. fully dockable and customizable panels, theme editor, etc. so removing gradio would mean death of extension ecosystem and that is not something i want. but i also believe we need a simple native html ui for the masses. and i think stablestudio would be a good starting point. |
Beta Was this translation helpful? Give feedback.
-
Guys, I'd like to weigh in here, AND i'd like to help. I have over 20 years in Web and UI design so I sort of know what i'm talking about. IN GENERAL - SD/Invoke/VD, all of them use "tech speak" in the interface. I understand why but this is simply not fit for normal humans! As was already mentioned above.. What the hell is Cross-Attention Layer Optimization or Enable Flash Attention, or SDP Disable Memory Attention, or Sigma Churn, or Noise Seed Delta.. etc etc etc? I'm a fairly advanced user but should I even have access to that? I'm certain these are the technically correct terms, i'm simply suggesting they're not appropriate for normal humans. Certainly a pop up tip should be written where it makes sense but generally speaking very smart people like Vlad.. ESPECIALLY developers don't use natural language, even when it's entirely possible (Though some times it just isn't). Case in point... "Denoising". What is this really? At it's core it's a sensitivity slider.. as in "Hey Mr regular human.. if you slide this to the right you're gonna get VERY different image.. if you slide it to the left.. the image will stay rougly the same.. i,e, less sensitve". Same with Config Scale, and so on. I know, I know, this software is simply to complex to make it dummy proof.. but I do love the idea earlier about a simple mode. Wouldn't have to be a whole new fork.. just something that hides Sigma Churn, Attention Layer Optimization, that sort of thing. I'm quite certain Vlad knows what those may be. Generally speaking, if you don't know what it does, or how it might have a ripple affect on other parameters.. you probably shouldn't be touching it. I mean we already have a basic setup, i'm simply suggesting we have a Beginner mode where all the superflous stuff goes away but can be re-enabled later BY A NORMAL HUMAN. Simply hide it from the interface and make the default install simple mode so people don't become overwhelmed at first glance. A normal human is simply not going to go in and edit the Config.json file for example. Lots of us here know how to do that, and we think it's simple. But if VD is going to reach mass adoption.. and I certainly think it could.. it has to be more human friendly. Hell... most people don't even know how to use Git, or install Python, etc etc. Most people are just not that tech-minded. Please forgive if i'm over simplifying, i'm not even close to the intelligence and abilities of Vlad but would it be possible for example to create an .exe installer or is that problematic? My grandmother can run an msi installer. You get my point. This interface is far more slick than A1111, but it could be much much better. In fact.. i might mock something up just to give you an idea of how that might look. Ok enough out of me.. sorry for the long post. Vlad, I would be glad to help brother. And thank you so much for your hard work. |
Beta Was this translation helpful? Give feedback.
-
I don't want to over-promise but I'll try to contribute something useful. |
Beta Was this translation helpful? Give feedback.
-
I had looked through comfyui the other day and came across their newby tutorial (https://comfyanonymous.github.io/ComfyUI_tutorial_vn/). I am not saying I am recommending such thing as imho I found it too childish, but on a different page I kind of liked the idea of taking people through the parameters with increasing complexity in a guided manner. looping in @derspanier, good knowledge and sharp mind. |
Beta Was this translation helpful? Give feedback.
-
ok, so we all agree that we need better & more-detailed tooltips i've placed the copy in wiki so you can start with live editing right now without any special tools or thinking how to commit changes: regarding labels, i get what @VStudioAI is saying, but...in the main ui we should not change labels that have clear industry naming - for example, denoising/sampler/cfg scale/etc. regarding simplified ui, i see some ideas how to simplify existing one. my question (not decision) is - is that really the path to go? top-down approach:consider that existing ui is massively complex and creating simple mode out of it is likely equally complex task, yes, we could have a switch that throws css and how far can we take gradio when it comes to enabling scale? can i make it truly multi-user? can it be used for cloud/hosted solutions? sure also, any restyling proposal right now would be in conflict with development of new gradio ui which is currently under way. bottom-up approach:clean-up gradio ui with clear labels/hints, but don't try to massively oversimplify it. instead have second ui based on pure html/css/js - no gradio. and (possibly) stablestudio is a good starting point? this is pretty much it https://beta.dreamstudio.ai/generate think of this analogy - photoshop is best, but for most online version of canva is more than enough :) |
Beta Was this translation helpful? Give feedback.
-
Vlad, All good points. Your reference to software like Blender and Photoshop are spot on. I'm an advanced Photoshop user, but you can see how it would be overwhelming to a new user. I guess I've forgotten how long its taken me to master that... I'm a daily user for nearly 20 years - and its far more complex than VD. Oh. And I still don't know every single feature, now there's even more to know with the new Beta introducing AI features. VERY slick by the way if you haven't experienced that. (I'm talking about the native AI features, the stable diffusion plugin sucks). Blender and its like are at least 3X more complicated than Photoshop. I suppose I was thinking how to minimize the "Holy Crap" initial impression when a new user comes on board. Lose them there and they'll never adopt the software.. we just lost a user for life. Preventing that is the road to mass adoption. Think on how we can transition simple Midjourney/Lexica users from a stupid simple interface to VD. Those people could be users but currently it is like going from kindergarten to college in one move. Of course some of those people aren't that serious about it and that's okay but mass adoption opens the doors to all sorts of possibilities. Those type of sites are what we refer to in marketing speak as " The smell outside the bakery" it gets them in the door and interested in the tech. For those who are serious about it they're going to want something more and we need to be that something more. I would say that it almost never happens where somebody starts their AI journey with something like stable diffusion or VD. They need to understand the basics before they up their game. They first need to be interested enough to want to know more. That is exactly the path I took to get here and i'm guessing it's true of most of you here. Even Photoshop has a lite version; Photoshop Express. Google Adwords the same and I can think of countless others that provide a lighter, easier to use, less overwhelming version. To Vlads point though, there is no "Easy Mode" button, they are typically a completely different install. Theres probably a reason for that. Love the idea of a pure CSS interface, that opens the door to entirely new possibilities. C'mon Vlad, you're wasting 6 hrs a night sleeping brother, step it up and get that done? Lol.. just kidding. We've all been into the SD settings page, I dont know what a lot of those functions are - (which PROBABLY means I shouldnt be messing with them). Just a thought, could we in each section have a BASIC settings and ADVANCED that is hidden/collapsed by default? Again the idea is to minimize that initial holy crap moment a new user might experience as well as to prevent them from flipping a switch that breaks the platform and then they leave in frustration. Regardless, an extensive help/tool tips is an essential need, even for us propellor-heads. Of course, developers are the worst guys to task with that, they think every feature is essential. (Thats a programmer mentality, not specific to Vlad - no offense intended). They simply know too much to be objective, they KNOW how its "supposed" to work. Put on the glasses of your average Joe though, he doesnt understand that if i check this box here, it breaks that over there. Your average user has no experience with Beta testing. (They are typically used to working with mature ready to run software). This is almost never true with Photoshop as an example but of course its much more mature software. I suppose thats exactly what we're attempting to do here though isn't it? QUESTION: To that point, Vlad can you point me to the best, most up to date resources that explain in detail what every function is/does? I can start working through the settings categories and make an attempt at writing some simple explainers. I'm up for the challenge. I can gain a deeper understanding all while contibuting to this astonishing software. Last thought, let's not forget everybody the million and one things that Vlad has done right so far. It's easy to focus on just what's wrong. Speaking entirely for myself, I am freaking blown away with what he has built here. Well done Vlad! |
Beta Was this translation helpful? Give feedback.
-
that's exactly what i'm referring to.
ahhhh. would be nice, but...settings page is auto-generated from all possible settings, there is no distinction between them and there is no special rendering for any of them. and even if i introduce a basic/advanced flag and go over built-in settings, any extension that has its settings will go where? basic or advanced? i cannot decide that. so it would quickly deteriorate up to a point, is it worth it? and which one is advanced? i'd argue that things like changing cross-attention method is advanced since its not easy to explain to users what that is. but its one of the first things to direct users to when troubleshooting anything. so users would end up in advanced view in no time.
to my knowledge, no such resource. that's why ask for this community effort to fill all the hints. if there was a single source, i'd just fill everything right now. but its a question of searching through issues/prs/discussions/wikis here and original a1111. and another problem - sd is a community effort and a lot of settings are result of someone contributing. for example, "hey, i have a new method for xxx that works really cool and you can tweak every math parameter". for example, i love unipc, but do i know what are exact differences between bh1 and bh2 variants? no clue. i can probably take a look at the code and deduct that after some sturdying if i needed to, but it doesn't mean i know every setting nor that every setting is document anywhere. moving forward, i can ask anyone contributing new stuff to fill labels/hints nicely. but i cannot trace every contribution done during the past year. |
Beta Was this translation helpful? Give feedback.
-
Vlad, when working on the tool tips, modifying thr eson file, how can we reference the source info we use to write those tool tips? So any one who wants to can eyeball that? |
Beta Was this translation helpful? Give feedback.
-
Was just reviewing / filling in some wiki content, trying to make this a daily excercise. Noticed this in Text2Image Workflow The phrase "At 0, nothing will change" is correct for I2I but not quite fitting for T2I imho..., I was then looking for the label in I2I and it wasnt there..? |
Beta Was this translation helpful? Give feedback.
-
making some nice progress updating https://github.com/vladmandic/automatic/wiki/UI-JSON
|
Beta Was this translation helpful? Give feedback.
-
I've noticed the (bda28dc, win10, chrome) |
Beta Was this translation helpful? Give feedback.
-
Regarding the question if there's some single resource... Maybe this is helpful? I bookmarked the site two months ago, as it does cover quite a lot of info about SD. Well, and some other AI-related stuff like LLMs, links to some research papers etc. I have no clue how accurate or up to date the site actually is. But it looks like an awesome resource and should at least provide a few infos for tooltips and stuff. |
Beta Was this translation helpful? Give feedback.
-
More info on the various settings would be awesome, and I think @VStudioAI has some great points. While UX is an important part as a bad user experience can scare first-time users off for good, there will always be some apps that just don't work for you, whether you're part of the target group or not. I think I would take an approach where every UI block (e.g. seed + seed variation) has their own question icon. When clicked it will show a lightbox with a quick explanation what it does - if possible even with a screenshots to visualize the impact this setting can make and a link to the wiki. Kinda like in a game when you're introduced to a new game mechanic and an info card pops up. I started off with InvokeAI in late 2022. It had an easy interface and was a great "first contact" but it didn't take long until it felt a bit too limiting. So I switched to 1111 and used that for a few months. Well, and maybe something to kickstart first-time contributors. Like a first-timer-friendly label for easier issues that don't require much coding knowledge or some kind of list what exactly is needed and what you can actually do right now to contribute or how? |
Beta Was this translation helpful? Give feedback.
-
I'm getting some documentation done for the cli utilities written (especially train.py), with use examples, assuming I can keep on track, I'll submit those today. |
Beta Was this translation helpful? Give feedback.
-
i've just completed update to https://github.com/vladmandic/automatic/wiki/UI-JSON - lets do a push to add more hints? here are the current per-section stats:
|
Beta Was this translation helpful? Give feedback.
-
TL;DR -
For now, the biggest two things I had in mind were defaulting a checkbox to on somewhere around the prompt box to enforce token balancing in some way with a simple mouse over on its purpose.
Secondly, a tool tip on why a 99 step cap is optimal to consider, I certainly didn't know any better until inquiring.
Foreword: seems reasonable to continue to repeat that Invoke AI seems to have a lot of appeal for its simplicity of use. I was thinking maybe there's a way to bring an 'easy' mode to Vlad and it would be a useful selling point for new people or lazy people. Plus, I like the idea of new features that sets SD.next apart from A1111 if they're considered valuable and useful. And with the thread about it/s and best practices developing, not just incorporating an "easy mode," but a mode that perhaps reinforces best practices.
My first two thoughts were in regards to prompt balancing, perhaps a check box to artificially enforce prompt balancing perhaps with a specially developed token for this purpose that would have minimal effect on output, but keep tokens balanced. Furthermore, perhaps a tool tip?
Secondly, regarding the cap of 99 steps. Enforcing the limit by default is neat, but perhaps a tool tip for it why as well.
There's mouseovers sure, but maybe a complete robust system of tool tips, to help inform a user and reinforce best practices with their prompting and use. A lot of these things are ancient Eqyptian to a lot of people that wanna get into the space, myself included, and many normies just don't care to spend the time on digging in.
Kinda goes along with Vlad's design philosophy of making SD.next easy and straight forward to install without hassle perhaps encountered with other UIs.
I think down the road, we could come up with a very complete and informative tool-tip system and popups to help new users to get started and experienced users could kinda choose to enable/disable them at the start.
I wouldn't mind doing legwork here to get an idea of the best things people should be informed on and working with the community in the other thread to get them fleshed out and accurate or something.
Beta Was this translation helpful? Give feedback.
All reactions