Yeah that should work - it looks like the same pixel dimension image at smaller sizes has about the same token cost for 4.6 and 4.7, so the image cost increase only kicks in if you use larger images that 4.6 would have presumably resized before inspecting.
I’m a big fan of glitch as well, so it’s very impressive you have built the same yourself. Do you self host the server and are there any limitations on languages you can use?
As an able bodied person, I see potential in mapping some of these to specific regular actions, rather than individual key presses. Using this, along with dictation and a keyboard/mouse combo could save a lot of time.
reply